++++++++++++++++++ FME2001: - Review reports for paper 12 +++++++++++++++++++++
- Dear author(s) : this file was extracted automatically from a very large
- mailbox. In case you find any problem concerning mail encodings (or any
- other kind of anomaly disturbing your understanding of the reviews) please
- email any of the PC cochairs (pamela@research.att.com,jno@di.uminho.pt).
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

PAPER NUMBER:	12
CATEGORY:	2
TITLE:		Reasoning about Failure Probabilities - formally and feasibly
AUTHOR(S):	Jan J=FCrjens

--------------------------------------------------------------------------------

1. Briefly SUMMARIZE the paper (2-3 lines): 

      The integration of probabilistic analysis with formal analysis
      via a single model that address both issues.


2. RELEVANCE: Please provide a rating of the paper's relevance to the
   FME Symposium, using the scale:
      0 =3D Out of scope
      1 =3D Marginal interest
      2 =3D Minority interest
      3 =3D Majority interest
      4 =3D Outstanding interest

   Numeric Rating:	2 

   Please comment your rating: 


3. OVERALL RECOMMENDATION: Please provide a rating of the paper's
   acceptability, using the scale:
        1 =3D Strong reject
        2 =3D Weak reject
        3 =3D Could go either way
        4 =3D Weak accept
        5 =3D Strong accept

   Numeric Rating:	2
   NB: There should be a correlation between the two rates above.


4. CONFIDENCE LEVEL: Please provide a rating oof your expertise in the
   area addressed by the paper, using the scale:
        1 =3D Know virtually nothing about this area
        2 =3D Not too knowledgeable, but I know a bit
        3 =3D Know a moderate amount, about average I'd say
        4 =3D Not my main area of work, but I know a lot about it
        5 =3D My area of work, I know it extremely well

   Numeric Rating:	3

   NB:  PC members are responsible for ensuring that 1 is not used
   here.      

                I know alot about probablistic analysis of fault-tolerant
                computer systems, but found the mathematical formalisms
                in this paper beyond my expertise.


5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here?

   Comment:   incorporating failure distributions into formalism


6. How WORTHWILE is the goal of the authors?

   Comment: I am not convinced that this is a good thing to do.  Divide and
            conquer is an age-old means for attacking difficult problems. In=

            fault-tolerant computing, the following strategy has worked well

                  Markov analysis:  Calculate [Probability that during the
                                    mission we have enough working hardware]

                  Formal analysis:  Given that we have enough working=
 hardware
                                    establish the validity of specified=
 safety
                                    properties.

             Therefore, the burden of proof falls on the author to justify
             their new approach by
                 (1) showing where the current approach is inadequate or
                     at least inefficient
                 (2) demonstrate how their new approach overcomes these
                     inadequacies
             They fail to do this on both accounts.  They offer a rationale
             but it is deeply flawed.  I will go through it point-by-point.

		"Failure probabilities of underlying components (such as
		hardware) are considered (using Markov models) in
		isolation from the whole system, which is verified
		formally under the assumption that all failure
		probabilities are small enough. By doing this, one
		usually needs to make simplifying assumptions,
		e. g. that the failure distributions of different
		components are not correlated,"

             It is true that the Markov analysis is simplified by assuming=

             that different component failures are not correlated, but it
             is not for lack of an integrated model. This is done because
             the experimental procedure to measure such correlations
             would be enormously expensive.  The computation problem
             would be harder but not by much.  The real problem is
             obtaining credible measurements of the parameters.
            
                "do not change over time or with use of the component,"

             Probabilistic analysis can handle non-homogeneous processes.
             There are even automated tools that deal with Weibull
             distributions etc.  Simpler models are used because they are
             usually adequate  (i.e. digital circuitry failure distributions
             tend to be exponentially distributed) not because of the
             lack of an intergrated model.            

                "that absence of failure is composable (in the sense that
                if two components in isolation exhibit acceptable failure
                behaviour, then they do so when composed),"
      
             This is standard design practice in fault-tolerant computing.
             It is called "fault-containment regions" so I don't know what
             they are talking about here.

                "that failures do not remain latent for a period of time
                to become effective only later etc."            
           
             Once again the problem with including fault-latency in a model
             has nothing to do with the lack of integration with the
             formal model.  Fault-latency can be included in semi-Markov
             models of a fault-tolerant system, but the difficulty is
             a design issue concering diagnosibility.  One must remove
             permanently faulty processors as soon as possible, but one must
             not remove transiently faulty processors that will operate
             correctly in the future once the error is flushed.  Latent
             faults make design decisions associated with reconfiguration
             more difficult. 

             IT IS NOTEWORTHY THAT ALTHOUGH THIS LIST APPEARS IN THE
             RATIONALE, the author never shows how his approach solves
             any of these so-called problems.            
                           

7. How well is this goal EXPLAINED and JUSTIFIED?

   Comment: does not make the case---see previous section.


8. TECHNICAL QUALITY. Are the technical parts (definitions, statements,
   specifications, proofs, algorithms, etc.) SOUND?

   Comment: probably sound.


9. APPLICABILITY. If the work is primarily theoretical or conceptual,
   are the implications in terms of applicability adequately
   justified? If the paper is about a new formal technique, are
   satisfactory arguments presented in favor of the new technique?  If a
   methodology is proposed, is it sound? If experimental results are
   presented, is their relevance justified?

   Comment:   In my opinion this was an academic exercise that no
              one in industry will ever use.


10. PRESENTATION: Describe the QUALITY of the writing, including
   suggestions for changes where appropriate.

   Comment:  writing quality is good.


11. Were there any formatting or mechanical problems with this
    paper?: no
    Are the figures and length acceptable?: yes
    Are the references correct?: ok


12. OTHER COMMENTS you believe would be useful to the author(s),
    including pointers to missing relevant work:


+++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++


PAPER NUMBER:	12
CATEGORY:	2
TITLE:		Reasoning about Failure Probabilities - formally and feasibly
AUTHOR(S):	Jan Jürjens

--------------------------------------------------------------------------------

1. Briefly SUMMARIZE the paper (2-3 lines): 

The paper introduces a specification language for nondeterministic
processes, and describes two semantics based on (composable) stream-processing
functions, one taking faults and their probabilities into account, one
not. The probabilistic approach is intended to allow estimations of system
reliability. 
 
2. RELEVANCE: Please provide a rating of the paper's relevance to the
   FME Symposium, using the scale:
      0 = Out of scope 
      1 = Marginal interest
      2 = Minority interest
      3 = Majority interest
      4 = Outstanding interest

   Numeric Rating:	3    

   Please comment your rating: 
 
The FME´01 focuses on Software Productivity, and the paper offers no
contribution to that. Nevertheless, papers concerning "all other areas of
the use of formal methods" should be considered according to the
call-for-papers. The integration of probabilistic failure estimations in
the framework of formal methods appears to be an interesting field for
research; so I would rate the relevance quite high.


3. OVERALL RECOMMENDATION: Please provide a rating of the paper's
   acceptability, using the scale:
        1 = Strong reject
        2 = Weak reject
        3 = Could go either way
        4 = Weak accept
        5 = Strong accept

   Numeric Rating:	1
   -2 (Reject)

   NB: There should be a correlation between the two rates above.
 
The low rating (and therefore just a weak correlation to 2) derives from
the deficiencies in the technical quality, described in 8.

The paper was hard to read, but not because the contents are very difficult by
themself, but because a lot of explanations is missing, clarifying the
intention behind formulas. I had to spent a lot of time figuring
out what _could_ be meant.

4. CONFIDENCE LEVEL: Please provide a rating oof your expertise in the
   area addressed by the paper, using the scale: 
        1 = Know virtually nothing about this area
        2 = Not too knowledgeable, but I know a bit
        3 = Know a moderate amount, about average I'd say
        4 = Not my main area of work, but I know a lot about it
        5 = My area of work, I know it extremely well

   Numeric Rating:	4

   NB:  PC members are responsible for ensuring that 1 is not used
   here. 


5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here?

   Comment:

The work appears quite new to me: To my knowledge, there are no detailed
extensions of the framework of FOCUS and its stream-processing functions by 
notions of probabilities of failures. Together with the high relevance of
the subject, I also rate the significance quite high.
 

6. How WORTHWILE is the goal of the authors? 

   Comment: 
 
Due to the significance and relevance, the _goal_ of the author is
sufficiently worthwhile. The paper gives me the idea that the author has
some good ideas and worked a lot in the field of combining the theory of
fault-tolerance, probabilities and stream-processing
functions. Unfortunately, the results cannot be estimated in detail due to
the sad presentation.

7. How well is this goal EXPLAINED and JUSTIFIED?

   Comment: 

The author explains and justifies his _goal_ in a satisfactory way. Section 1 
of the paper containing the introduction, background and related work
motivate the paper, clarify the intentions and the approach the author
chose. Nevertheless, I missed a profound statement what was gained in the
work presented. The conclusion is nearly just a cut-n-paste rephrasement of the
introduction (in another order). 
 

8. TECHNICAL QUALITY. Are the technical parts (definitions, statements,
   specifications, proofs, algorithms, etc.) SOUND?

   Comment: 
 
The technical quality is too low for a paper to be published.

The system model is not explained. It is briefly sketched in the beginning
of Section 2, but in other parts of the text "Servers" and "Clients" are
mentioned that never appeared before. I cannot find a discussion what a
system is, how it can be decomposed into components, and how the
de-/composition is described.

In the introduction (and the conclusion again), it is promised that the
usual simplifying assumptions commonly made are not necessary in this
approach, nevertheless there are the usual assumption for independence in
the presented examples, both for crash/performance and value failure
semantics.

Very often I got unhappy with the overloaded and changing names of
variables leading to extensive and unnecessary confusion (as e.g. "c" for
"channels", "closed" expression, and "crash" failure semantics). They are
pointed out in detail in the following remarks.

Several further notes and question are given here:

P.2, L. 8: do _this_ without some ... 

P.2, last paragraph: The explanation of "design decisions" are announced,
that I never found.

P.3, L. -2: 10^9 instead of 10^-9

Section 2: A "specification framework" is announced, but just a language is
presented there. What makes it a framework?

P.4, definition of E: It would be much easier to read the paper, if the
reader would be informed what the intention of the definition is. It is
confusing in the beginning, that variables, channel names and data are all
summarized in expressions E.

P.4, first two lines after definition of E: c represent an "input", while
"\epsilon" represents an (empty) output: Can´t it also be the other way,
e.g. c can also represent output?

P.4, Definition of EXP_c: c is already used for channels, and confusing at
this place.

P.4, bottom: A formal BNF-notation of E is given, but then brackets
illegally appear in an expression.

P.5, top: It would be nicer if the terminals and non-terminals in this BNF
would be better separated. The last case is not understandable without
explanation. I just found out its meaning at the semantics. Why isn`t
there an informal explanation? Is also infinite nondeterminism expressible?

P.5, definition of process: L is undefined (but used for c), and should be
part of the definition. Is I and O disjoint? Or I and L? Or O and L? These
questions stayed open. From the example 6 pages later I guessed that L = I
\cut O ? I suggest to place an example here.

P.5, definition of pre-congruence: Are the ´ mixed up? I guess it should
read:

 ... op(p,q) \refines op(p',q')

P.6, definition of stream: The set of finite of infinite streams is denoted 
by Exp^\omega (not *). Again, the use of $c$ for channels and "closed" is
unnecessarily confusing. The notion of "component" for a part of a tuple of 
streams can be mixed up with "components" of a system. Why not simply call
it a stream, and \vector{s} a tuple of streams?

P.6: Definition of stream processing functions: "d" is used as variable out 
of C(P), again easy to be confused with "d" as data as used before.

P.6: Def. Composition: The index $D_i$ instead of $C(P)$ is used here.

P.6. The meaning of the "restrict"-Operator has to be guessed.

P.7. Example: What means the index 2? This should be a subset of CHANNELS!
What means the dot (I though concatenation is denoted by :: )?

P.7, line 3 of sect. 3.2: What means the index I for EXP? This was never
explained. 

P.7. definition of t: Should t_0 denote the first element of t? This is
never defined (again for t_{n+1}).

P.7, above example: The \Bigotimes-operator is used, but never mentioned,
even not informally.

P.8, Section 4: The "correctors" of [AK99] are not used for error detection!
For me, it stays unclear what \Pi means? Is it a function that is applied
to a system and changes it from a fault-intolerant version to a
fault-tolerant version? What is meant by "same failure distribution
... applied to the whole system" - same to what?

What else can a safety parameter? It is always used as the number of
replication copies, even though this use is just an example. The set
FAILURE, is this meant as a infinite stream of {0,1}? A data access to a
"server" is mentioned: Is the subject specialized to
client/server-architectures? 

P.8: r_n = 1 for a unmasked failure, r_n = 0 for an absent failure. What
happens if the third case, a masked failure appears?

P.9: What means the R in the expression for Pr[...]?

P.9: Def.1: What is the informal meaning of p(\ny,t)? The p(..)-safety is a 
property of a replication mechanism \Pi, so I expect \Pi (or parts of it)
to appear in the formal definition. Why does this not happen? I cannot find 
my way to a convincing understanding of the definition.

P.9: \Pi_{c/p} is an unlucky choice, since c is already often used for
other purposes.

P.10, L.4: Is [p] not only defined for closed expressions (Exp_c)?

P.10, last item in definition of [p]: "or h = c for c\in I" can be
substituted by "\cup I"

P. 10, Explanation of substitution p[E/x] already defined on page 7.

P.10: "r^n" is "defined" via "r^n_m", but how? Is this a special case? But
for what m?

P.11: Example: This formula desperately needs some explanations!

P.12: Definition 2: What is the idea behind that definition? Is l intended
to restrict the length of the streams? Why does it not occur in the formula?

P.12: Theorem 1: What is the idea, the intention? I am unable to figure
out what the three dots in " 1, ..., l \cdot n(P) " represent.

P.12: Example: It should be $p_d$ instead $p_c$!

P.14: Proof Proposition 9: The last "else" is superfluous.

P.15. second item in itemize: "and to" at the end is a typo, or something is 
missing.

I did not check the appendix. The latex counter are not reset and therefore 
do not match the numbers of propositions and theorems in the main part.

9. APPLICABILITY. If the work is primarily theoretical or conceptual,
   are the implications in terms of applicability adequately
   justified? If the paper is about a new formal technique, are
   satisfactory arguments presented in favor of the new technique?  If a
   methodology is proposed, is it sound? If experimental results are
   presented, is their relevance justified?

   Comment:

The applicability of the approach is not explored or mentioned, just two
simpler examples are presented to illustrate the approach. The introduction 
mentions real-life examples from Air Traffic Control, but it stays open if
the author expects such complex systems to be handled by his approach.

 
10. PRESENTATION: Describe the QUALITY of the writing, including
   suggestions for changes where appropriate.

   Comment: 

The paper gives me the impression to be "finished" in a hurry, with not too 
much care. Existing explanations are satisfying, but there are not enough.
The author does not decide if he prefers to write "fault-tolerant" or
"fault tolerant".

11. Were there any formatting or mechanical problems with this
    paper?: 
    Are the figures and length acceptable?: 
    Are the references correct?: 

I discovered the following deficiencies:

The figure on page 2 is never referenced. The meaning or problem with the 
"?" stays unexplained.

P.4: first three paragraphs: Several references to literature are
mentioned, but not more: The relation of the paper to the cited work is not
discussed. 

P.6: the figure is again not referenced and not labeled. The formula for I 
and O directly beneath it would be easier to understand with some labels in 
the picture.

12. OTHER COMMENTS you believe would be useful to the author(s),
    including pointers to missing relevant work: 
 
The bunch of open questions and problems above made it very uneasy to read the
paper. I would suggest two options:

1) Re-formulate the paper, and get much more detailed.

2) Omit a lot of formulas, and just phrase the ideas behind them.

This version (too much condensed by hard-to-understand formulas) is not yet
ready to be published.

I did not understand (it is not explained) why the author introduces his
specific specification language, which seems to be restricted to finite
nondeterminism. How can loops be expressed (if they can?)? FOCUS offers
"state machines" that would be suitable to specify an operational view of a 
system, also including nondeterministic branching and conditionals. Their
semantics is already defined in the literature, including refinement, and I 
could imagine to extend these by probabilities, too.


+++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++


PAPER NUMBER:	12
CATEGORY:	2
TITLE:		Reasoning about Failure Probabilities - formally and feasibly
AUTHOR(S):	Jan J\"urjens

--------------------------------------------------------------------------------

1. Briefly SUMMARIZE the paper (2-3 lines): 
The paper introduces a measure for reliability of systems in the
setting of Broy's stream processeses. It shows this measure on
a number of example and makes some excursions into a further
theoretical workout of the idea.

 
2. RELEVANCE: Please provide a rating of the paper's relevance to the
   FME Symposium, using the scale: 
      0 = Out of scope 
      1 = Marginal interest
      2 = Minority interest
      3 = Majority interest
      4 = Outstanding interest

   Numeric Rating:	2  

   Please comment your rating: 
   An classification and better understanding, and also description technique
   for reliability of systems is useful. However, as there should be a relevance
   between this judgement and the next one I cannot give a high mark. 
 

3. OVERALL RECOMMENDATION: Please provide a rating of the paper's
   acceptability, using the scale:
        1 = Strong reject
        2 = Weak reject
        3 = Could go either way
        4 = Weak accept
        5 = Strong accept

   Numeric Rating:	2
   (I suggest to reject the paper)

   NB: There should be a correlation between the two rates above.
 

4. CONFIDENCE LEVEL: Please provide a rating oof your expertise in the
   area addressed by the paper, using the scale: 
        1 = Know virtually nothing about this area
        2 = Not too knowledgeable, but I know a bit
        3 = Know a moderate amount, about average I'd say
        4 = Not my main area of work, but I know a lot about it
        5 = My area of work, I know it extremely well

   Numeric Rating:	4

   NB:  PC members are responsible for ensuring that 1 is not used
   here. 


5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here?

   Comment: The attempt is significant. I find the way this paper has
   been worked out almost iritating. I can in no way verify the correctness
   of for instance most of the examples, but find those that I do rather
   trivial and flawed at times.

 
6. How WORTHWILE is the goal of the authors? 

   Comment: Very worthwhile.
 

7. How well is this goal EXPLAINED and JUSTIFIED?

   Comment: I find this question irrelevant.
 

8. TECHNICAL QUALITY. Are the technical parts (definitions, statements,
   specifications, proofs, algorithms, etc.) SOUND?

   Comment: The problem of this paper is that I find it not sufficiently
   accessible to clearly understand what the author means. This is the
   hardest judgement I can give on a paper. For instance in proposition 5
   a "4" drops out of the sky. Also the parameters eta, p and n(P) do
   not relate back to the unbounded buffer (shouldn't process p_c be
   called p_d?). There is actually nothing that relates to reliability
   or probabilistic processes here.
 

9. APPLICABILITY. If the work is primarily theoretical or conceptual,
   are the implications in terms of applicability adequately
   justified? If the paper is about a new formal technique, are
   satisfactory arguments presented in favor of the new technique?  If a
   methodology is proposed, is it sound? If experimental results are
   presented, is their relevance justified?

   Comment: I think that a full rewriting of the paper is needed before
   I can answer this question. It appears to me that it is basically
   conceptual with a very thin theoretical workout.

 
10. PRESENTATION: Describe the QUALITY of the writing, including
   suggestions for changes where appropriate.

   Comment: Rewrite the paper such that the core of the presentation
   becomes clear and checkable.
 

11. Were there any formatting or mechanical problems with this
    paper?: 
    Are the figures and length acceptable?: 
    Are the references correct?: 
 

12. OTHER COMMENTS you believe would be useful to the author(s),
    including pointers to missing relevant work: 

    In proposition 3 and 4 it seems you are missing a factor
    (eta+1 out of 2*eta+1).
 

+++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++