++++++++++++++++++ FME2001: - Review reports for paper 12 +++++++++++++++++++++ - Dear author(s) : this file was extracted automatically from a very large - mailbox. In case you find any problem concerning mail encodings (or any - other kind of anomaly disturbing your understanding of the reviews) please - email any of the PC cochairs (pamela@research.att.com,jno@di.uminho.pt). ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PAPER NUMBER: 12 CATEGORY: 2 TITLE: Reasoning about Failure Probabilities - formally and feasibly AUTHOR(S): Jan J=FCrjens -------------------------------------------------------------------------------- 1. Briefly SUMMARIZE the paper (2-3 lines): The integration of probabilistic analysis with formal analysis via a single model that address both issues. 2. RELEVANCE: Please provide a rating of the paper's relevance to the FME Symposium, using the scale: 0 =3D Out of scope 1 =3D Marginal interest 2 =3D Minority interest 3 =3D Majority interest 4 =3D Outstanding interest Numeric Rating: 2 Please comment your rating: 3. OVERALL RECOMMENDATION: Please provide a rating of the paper's acceptability, using the scale: 1 =3D Strong reject 2 =3D Weak reject 3 =3D Could go either way 4 =3D Weak accept 5 =3D Strong accept Numeric Rating: 2 NB: There should be a correlation between the two rates above. 4. CONFIDENCE LEVEL: Please provide a rating oof your expertise in the area addressed by the paper, using the scale: 1 =3D Know virtually nothing about this area 2 =3D Not too knowledgeable, but I know a bit 3 =3D Know a moderate amount, about average I'd say 4 =3D Not my main area of work, but I know a lot about it 5 =3D My area of work, I know it extremely well Numeric Rating: 3 NB: PC members are responsible for ensuring that 1 is not used here. I know alot about probablistic analysis of fault-tolerant computer systems, but found the mathematical formalisms in this paper beyond my expertise. 5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here? Comment: incorporating failure distributions into formalism 6. How WORTHWILE is the goal of the authors? Comment: I am not convinced that this is a good thing to do. Divide and conquer is an age-old means for attacking difficult problems. In= fault-tolerant computing, the following strategy has worked well Markov analysis: Calculate [Probability that during the mission we have enough working hardware] Formal analysis: Given that we have enough working= hardware establish the validity of specified= safety properties. Therefore, the burden of proof falls on the author to justify their new approach by (1) showing where the current approach is inadequate or at least inefficient (2) demonstrate how their new approach overcomes these inadequacies They fail to do this on both accounts. They offer a rationale but it is deeply flawed. I will go through it point-by-point. "Failure probabilities of underlying components (such as hardware) are considered (using Markov models) in isolation from the whole system, which is verified formally under the assumption that all failure probabilities are small enough. By doing this, one usually needs to make simplifying assumptions, e. g. that the failure distributions of different components are not correlated," It is true that the Markov analysis is simplified by assuming= that different component failures are not correlated, but it is not for lack of an integrated model. This is done because the experimental procedure to measure such correlations would be enormously expensive. The computation problem would be harder but not by much. The real problem is obtaining credible measurements of the parameters. "do not change over time or with use of the component," Probabilistic analysis can handle non-homogeneous processes. There are even automated tools that deal with Weibull distributions etc. Simpler models are used because they are usually adequate (i.e. digital circuitry failure distributions tend to be exponentially distributed) not because of the lack of an intergrated model. "that absence of failure is composable (in the sense that if two components in isolation exhibit acceptable failure behaviour, then they do so when composed)," This is standard design practice in fault-tolerant computing. It is called "fault-containment regions" so I don't know what they are talking about here. "that failures do not remain latent for a period of time to become effective only later etc." Once again the problem with including fault-latency in a model has nothing to do with the lack of integration with the formal model. Fault-latency can be included in semi-Markov models of a fault-tolerant system, but the difficulty is a design issue concering diagnosibility. One must remove permanently faulty processors as soon as possible, but one must not remove transiently faulty processors that will operate correctly in the future once the error is flushed. Latent faults make design decisions associated with reconfiguration more difficult. IT IS NOTEWORTHY THAT ALTHOUGH THIS LIST APPEARS IN THE RATIONALE, the author never shows how his approach solves any of these so-called problems. 7. How well is this goal EXPLAINED and JUSTIFIED? Comment: does not make the case---see previous section. 8. TECHNICAL QUALITY. Are the technical parts (definitions, statements, specifications, proofs, algorithms, etc.) SOUND? Comment: probably sound. 9. APPLICABILITY. If the work is primarily theoretical or conceptual, are the implications in terms of applicability adequately justified? If the paper is about a new formal technique, are satisfactory arguments presented in favor of the new technique? If a methodology is proposed, is it sound? If experimental results are presented, is their relevance justified? Comment: In my opinion this was an academic exercise that no one in industry will ever use. 10. PRESENTATION: Describe the QUALITY of the writing, including suggestions for changes where appropriate. Comment: writing quality is good. 11. Were there any formatting or mechanical problems with this paper?: no Are the figures and length acceptable?: yes Are the references correct?: ok 12. OTHER COMMENTS you believe would be useful to the author(s), including pointers to missing relevant work: +++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++ PAPER NUMBER: 12 CATEGORY: 2 TITLE: Reasoning about Failure Probabilities - formally and feasibly AUTHOR(S): Jan Jürjens -------------------------------------------------------------------------------- 1. Briefly SUMMARIZE the paper (2-3 lines): The paper introduces a specification language for nondeterministic processes, and describes two semantics based on (composable) stream-processing functions, one taking faults and their probabilities into account, one not. The probabilistic approach is intended to allow estimations of system reliability. 2. RELEVANCE: Please provide a rating of the paper's relevance to the FME Symposium, using the scale: 0 = Out of scope 1 = Marginal interest 2 = Minority interest 3 = Majority interest 4 = Outstanding interest Numeric Rating: 3 Please comment your rating: The FME´01 focuses on Software Productivity, and the paper offers no contribution to that. Nevertheless, papers concerning "all other areas of the use of formal methods" should be considered according to the call-for-papers. The integration of probabilistic failure estimations in the framework of formal methods appears to be an interesting field for research; so I would rate the relevance quite high. 3. OVERALL RECOMMENDATION: Please provide a rating of the paper's acceptability, using the scale: 1 = Strong reject 2 = Weak reject 3 = Could go either way 4 = Weak accept 5 = Strong accept Numeric Rating: 1 -2 (Reject) NB: There should be a correlation between the two rates above. The low rating (and therefore just a weak correlation to 2) derives from the deficiencies in the technical quality, described in 8. The paper was hard to read, but not because the contents are very difficult by themself, but because a lot of explanations is missing, clarifying the intention behind formulas. I had to spent a lot of time figuring out what _could_ be meant. 4. CONFIDENCE LEVEL: Please provide a rating oof your expertise in the area addressed by the paper, using the scale: 1 = Know virtually nothing about this area 2 = Not too knowledgeable, but I know a bit 3 = Know a moderate amount, about average I'd say 4 = Not my main area of work, but I know a lot about it 5 = My area of work, I know it extremely well Numeric Rating: 4 NB: PC members are responsible for ensuring that 1 is not used here. 5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here? Comment: The work appears quite new to me: To my knowledge, there are no detailed extensions of the framework of FOCUS and its stream-processing functions by notions of probabilities of failures. Together with the high relevance of the subject, I also rate the significance quite high. 6. How WORTHWILE is the goal of the authors? Comment: Due to the significance and relevance, the _goal_ of the author is sufficiently worthwhile. The paper gives me the idea that the author has some good ideas and worked a lot in the field of combining the theory of fault-tolerance, probabilities and stream-processing functions. Unfortunately, the results cannot be estimated in detail due to the sad presentation. 7. How well is this goal EXPLAINED and JUSTIFIED? Comment: The author explains and justifies his _goal_ in a satisfactory way. Section 1 of the paper containing the introduction, background and related work motivate the paper, clarify the intentions and the approach the author chose. Nevertheless, I missed a profound statement what was gained in the work presented. The conclusion is nearly just a cut-n-paste rephrasement of the introduction (in another order). 8. TECHNICAL QUALITY. Are the technical parts (definitions, statements, specifications, proofs, algorithms, etc.) SOUND? Comment: The technical quality is too low for a paper to be published. The system model is not explained. It is briefly sketched in the beginning of Section 2, but in other parts of the text "Servers" and "Clients" are mentioned that never appeared before. I cannot find a discussion what a system is, how it can be decomposed into components, and how the de-/composition is described. In the introduction (and the conclusion again), it is promised that the usual simplifying assumptions commonly made are not necessary in this approach, nevertheless there are the usual assumption for independence in the presented examples, both for crash/performance and value failure semantics. Very often I got unhappy with the overloaded and changing names of variables leading to extensive and unnecessary confusion (as e.g. "c" for "channels", "closed" expression, and "crash" failure semantics). They are pointed out in detail in the following remarks. Several further notes and question are given here: P.2, L. 8: do _this_ without some ... P.2, last paragraph: The explanation of "design decisions" are announced, that I never found. P.3, L. -2: 10^9 instead of 10^-9 Section 2: A "specification framework" is announced, but just a language is presented there. What makes it a framework? P.4, definition of E: It would be much easier to read the paper, if the reader would be informed what the intention of the definition is. It is confusing in the beginning, that variables, channel names and data are all summarized in expressions E. P.4, first two lines after definition of E: c represent an "input", while "\epsilon" represents an (empty) output: Can´t it also be the other way, e.g. c can also represent output? P.4, Definition of EXP_c: c is already used for channels, and confusing at this place. P.4, bottom: A formal BNF-notation of E is given, but then brackets illegally appear in an expression. P.5, top: It would be nicer if the terminals and non-terminals in this BNF would be better separated. The last case is not understandable without explanation. I just found out its meaning at the semantics. Why isn`t there an informal explanation? Is also infinite nondeterminism expressible? P.5, definition of process: L is undefined (but used for c), and should be part of the definition. Is I and O disjoint? Or I and L? Or O and L? These questions stayed open. From the example 6 pages later I guessed that L = I \cut O ? I suggest to place an example here. P.5, definition of pre-congruence: Are the ´ mixed up? I guess it should read: ... op(p,q) \refines op(p',q') P.6, definition of stream: The set of finite of infinite streams is denoted by Exp^\omega (not *). Again, the use of $c$ for channels and "closed" is unnecessarily confusing. The notion of "component" for a part of a tuple of streams can be mixed up with "components" of a system. Why not simply call it a stream, and \vector{s} a tuple of streams? P.6: Definition of stream processing functions: "d" is used as variable out of C(P), again easy to be confused with "d" as data as used before. P.6: Def. Composition: The index $D_i$ instead of $C(P)$ is used here. P.6. The meaning of the "restrict"-Operator has to be guessed. P.7. Example: What means the index 2? This should be a subset of CHANNELS! What means the dot (I though concatenation is denoted by :: )? P.7, line 3 of sect. 3.2: What means the index I for EXP? This was never explained. P.7. definition of t: Should t_0 denote the first element of t? This is never defined (again for t_{n+1}). P.7, above example: The \Bigotimes-operator is used, but never mentioned, even not informally. P.8, Section 4: The "correctors" of [AK99] are not used for error detection! For me, it stays unclear what \Pi means? Is it a function that is applied to a system and changes it from a fault-intolerant version to a fault-tolerant version? What is meant by "same failure distribution ... applied to the whole system" - same to what? What else can a safety parameter? It is always used as the number of replication copies, even though this use is just an example. The set FAILURE, is this meant as a infinite stream of {0,1}? A data access to a "server" is mentioned: Is the subject specialized to client/server-architectures? P.8: r_n = 1 for a unmasked failure, r_n = 0 for an absent failure. What happens if the third case, a masked failure appears? P.9: What means the R in the expression for Pr[...]? P.9: Def.1: What is the informal meaning of p(\ny,t)? The p(..)-safety is a property of a replication mechanism \Pi, so I expect \Pi (or parts of it) to appear in the formal definition. Why does this not happen? I cannot find my way to a convincing understanding of the definition. P.9: \Pi_{c/p} is an unlucky choice, since c is already often used for other purposes. P.10, L.4: Is [p] not only defined for closed expressions (Exp_c)? P.10, last item in definition of [p]: "or h = c for c\in I" can be substituted by "\cup I" P. 10, Explanation of substitution p[E/x] already defined on page 7. P.10: "r^n" is "defined" via "r^n_m", but how? Is this a special case? But for what m? P.11: Example: This formula desperately needs some explanations! P.12: Definition 2: What is the idea behind that definition? Is l intended to restrict the length of the streams? Why does it not occur in the formula? P.12: Theorem 1: What is the idea, the intention? I am unable to figure out what the three dots in " 1, ..., l \cdot n(P) " represent. P.12: Example: It should be $p_d$ instead $p_c$! P.14: Proof Proposition 9: The last "else" is superfluous. P.15. second item in itemize: "and to" at the end is a typo, or something is missing. I did not check the appendix. The latex counter are not reset and therefore do not match the numbers of propositions and theorems in the main part. 9. APPLICABILITY. If the work is primarily theoretical or conceptual, are the implications in terms of applicability adequately justified? If the paper is about a new formal technique, are satisfactory arguments presented in favor of the new technique? If a methodology is proposed, is it sound? If experimental results are presented, is their relevance justified? Comment: The applicability of the approach is not explored or mentioned, just two simpler examples are presented to illustrate the approach. The introduction mentions real-life examples from Air Traffic Control, but it stays open if the author expects such complex systems to be handled by his approach. 10. PRESENTATION: Describe the QUALITY of the writing, including suggestions for changes where appropriate. Comment: The paper gives me the impression to be "finished" in a hurry, with not too much care. Existing explanations are satisfying, but there are not enough. The author does not decide if he prefers to write "fault-tolerant" or "fault tolerant". 11. Were there any formatting or mechanical problems with this paper?: Are the figures and length acceptable?: Are the references correct?: I discovered the following deficiencies: The figure on page 2 is never referenced. The meaning or problem with the "?" stays unexplained. P.4: first three paragraphs: Several references to literature are mentioned, but not more: The relation of the paper to the cited work is not discussed. P.6: the figure is again not referenced and not labeled. The formula for I and O directly beneath it would be easier to understand with some labels in the picture. 12. OTHER COMMENTS you believe would be useful to the author(s), including pointers to missing relevant work: The bunch of open questions and problems above made it very uneasy to read the paper. I would suggest two options: 1) Re-formulate the paper, and get much more detailed. 2) Omit a lot of formulas, and just phrase the ideas behind them. This version (too much condensed by hard-to-understand formulas) is not yet ready to be published. I did not understand (it is not explained) why the author introduces his specific specification language, which seems to be restricted to finite nondeterminism. How can loops be expressed (if they can?)? FOCUS offers "state machines" that would be suitable to specify an operational view of a system, also including nondeterministic branching and conditionals. Their semantics is already defined in the literature, including refinement, and I could imagine to extend these by probabilities, too. +++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++ PAPER NUMBER: 12 CATEGORY: 2 TITLE: Reasoning about Failure Probabilities - formally and feasibly AUTHOR(S): Jan J\"urjens -------------------------------------------------------------------------------- 1. Briefly SUMMARIZE the paper (2-3 lines): The paper introduces a measure for reliability of systems in the setting of Broy's stream processeses. It shows this measure on a number of example and makes some excursions into a further theoretical workout of the idea. 2. RELEVANCE: Please provide a rating of the paper's relevance to the FME Symposium, using the scale: 0 = Out of scope 1 = Marginal interest 2 = Minority interest 3 = Majority interest 4 = Outstanding interest Numeric Rating: 2 Please comment your rating: An classification and better understanding, and also description technique for reliability of systems is useful. However, as there should be a relevance between this judgement and the next one I cannot give a high mark. 3. OVERALL RECOMMENDATION: Please provide a rating of the paper's acceptability, using the scale: 1 = Strong reject 2 = Weak reject 3 = Could go either way 4 = Weak accept 5 = Strong accept Numeric Rating: 2 (I suggest to reject the paper) NB: There should be a correlation between the two rates above. 4. CONFIDENCE LEVEL: Please provide a rating oof your expertise in the area addressed by the paper, using the scale: 1 = Know virtually nothing about this area 2 = Not too knowledgeable, but I know a bit 3 = Know a moderate amount, about average I'd say 4 = Not my main area of work, but I know a lot about it 5 = My area of work, I know it extremely well Numeric Rating: 4 NB: PC members are responsible for ensuring that 1 is not used here. 5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here? Comment: The attempt is significant. I find the way this paper has been worked out almost iritating. I can in no way verify the correctness of for instance most of the examples, but find those that I do rather trivial and flawed at times. 6. How WORTHWILE is the goal of the authors? Comment: Very worthwhile. 7. How well is this goal EXPLAINED and JUSTIFIED? Comment: I find this question irrelevant. 8. TECHNICAL QUALITY. Are the technical parts (definitions, statements, specifications, proofs, algorithms, etc.) SOUND? Comment: The problem of this paper is that I find it not sufficiently accessible to clearly understand what the author means. This is the hardest judgement I can give on a paper. For instance in proposition 5 a "4" drops out of the sky. Also the parameters eta, p and n(P) do not relate back to the unbounded buffer (shouldn't process p_c be called p_d?). There is actually nothing that relates to reliability or probabilistic processes here. 9. APPLICABILITY. If the work is primarily theoretical or conceptual, are the implications in terms of applicability adequately justified? If the paper is about a new formal technique, are satisfactory arguments presented in favor of the new technique? If a methodology is proposed, is it sound? If experimental results are presented, is their relevance justified? Comment: I think that a full rewriting of the paper is needed before I can answer this question. It appears to me that it is basically conceptual with a very thin theoretical workout. 10. PRESENTATION: Describe the QUALITY of the writing, including suggestions for changes where appropriate. Comment: Rewrite the paper such that the core of the presentation becomes clear and checkable. 11. Were there any formatting or mechanical problems with this paper?: Are the figures and length acceptable?: Are the references correct?: 12. OTHER COMMENTS you believe would be useful to the author(s), including pointers to missing relevant work: In proposition 3 and 4 it seems you are missing a factor (eta+1 out of 2*eta+1). +++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++