++++++++++++++++++ FME2001: - Review reports for paper 41 +++++++++++++++++++++ - Dear author(s) : this file was extracted automatically from a very large - mailbox. In case you find any problem concerning mail encodings (or any - other kind of anomaly disturbing your understanding of the reviews) please - email any of the PC cochairs (pamela@research.att.com,jno@di.uminho.pt). ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ PAPER NUMBER: 41 CATEGORY: 2 TITLE: Translation Validation of Optimizing Compilers by Computational Induction AUTHOR(S): Amir Pnueli Lenore Zuck Paritosh Pandya -------------------------------------------------------------------------------- 1. Briefly SUMMARIZE the paper (2-3 lines): The paper describes an instantiation of translation validation, which attempts to verify the correctness of each run of a compiler by comparing the semantics of the program before and after each optimization. The paper focuses only on simple optimizations. 2. RELEVANCE: Please provide a rating of the paper's relevance to the FME Symposium, using the scale: 0 = Out of scope 1 = Marginal interest 2 = Minority interest 3 = Majority interest 4 = Outstanding interest Numeric Rating: 2.5 Please comment your rating: Verifying realistic optimizing compilers automatically has always been beyond the power of program verification. Translation validation promises to achieve the goal of eliminating the possibility of translation errors and is thus of key interest for anyone concerned with the correctness of compilers. And if it works, it might prove to be one of the major success stories of formal methods. 3. OVERALL RECOMMENDATION: Please provide a rating of the paper's acceptability, using the scale: 1 = Strong reject 2 = Weak reject 3 = Could go either way 4 = Weak accept 5 = Strong accept Numeric Rating: 3.5 NB: There should be a correlation between the two rates above. 4. CONFIDENCE LEVEL: Please provide a rating of your expertise in the area addressed by the paper, using the scale: 1 = Know virtually nothing about this area 2 = Not too knowledgeable, but I know a bit 3 = Know a moderate amount, about average I'd say 4 = Not my main area of work, but I know a lot about it 5 = My area of work, I know it extremely well Numeric Rating: 5 NB: PC members are responsible for ensuring that 1 is not used here. 5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here? Comment: It is somewhat hard to identify exactly what is new in this paper since the authors do not do a good enough job comparing it with related work. It appears that the use of the technique of computational induction for translation validation is new. On the other hand, this is hardly a surprising idea since it was used in various occasions in the past to verify that a system is a refinement of another. 6. How WORTHWILE is the goal of the authors? Comment: The goal is extremely worthwhile: to build a tool that can automatically validate each run of a state-of-the-art optimizing compiler. Such a tool will make it possible to enable otherwise-risky optimizations even in safety critical applications. Also it will help compiler debugging with the ultimate effect that compilers will become better more quickly. 7. How well is this goal EXPLAINED and JUSTIFIED? Comment: This goal is explained well. 8. TECHNICAL QUALITY. Are the technical parts (definitions, statements, specifications, proofs, algorithms, etc.) SOUND? Comment: I have a few reservations about the technical content of the paper. To start with, while the goal to build a translation validator for a state-of-the-art compiler is extremely worthwhile, the material presented in the paper is a long way from reaching that goal. But this is acceptable. What is not quite acceptable is that the paper does not try to clearly comment on the limitations of the approach and in what ways it will have to be extended to get closer to the goal. For example, the refinement mapping approach does not seem to address global and inter-procedural optimizations, nor any optimizations where the order of events in the source and target programs are changed significantly. In most optimizations it is not the case that the refinement relation holds at each intermediate execution step. I remember that one of the earlier papers by Amir Pnueli on this subject did allow for these optimizations. I am curious why did they abandon that feature? I was very dissatisfied with Section 6 in which the paper discusses the proving of verification conditions. First, with the method described there it is impossible to do the "Constant propagation between block" example of section 5. The reason is that the method of section 6 is not able to do any arithmetic while the mentioned example needs to be able to prove for example that "n = 4 and j = 2 * n => j = 8". This raises the question whether the examples presented in the paper are just manually constructed examples or are actually processed using the system described in the paper. The paper should motivate the choice of such a weak proving procedure. Why don't you use one of the well-understood decision procedures for equality and arithmetic (e.g., Shostak or Nelson-Oppen)? 9. APPLICABILITY. If the work is primarily theoretical or conceptual, are the implications in terms of applicability adequately justified? If the paper is about a new formal technique, are satisfactory arguments presented in favor of the new technique? If a methodology is proposed, is it sound? If experimental results are presented, is their relevance justified? Comment: This is a paper about a new formal technique. The technique appears to be sound and the soundness argument given in Section 4 is acceptable. 10. PRESENTATION: Describe the QUALITY of the writing, including suggestions for changes where appropriate. Comment: The paper needs a careful proofreading because it contains several typos and grammatical mistakes. I will point out here a few of them: - "the main tool we develop is the proof rule". Why is a proof rule a tool? - bottom of page 3, "where which" - page 4, "one of the basic blocks is identifies" - page 5, "the other statement that locations" - page 9, "presxnted" 11. Were there any formatting or mechanical problems with this paper?: Are the figures and length acceptable?: The figure on page 7 should be formatted to avoid rows longer than the width of the column. Are the references correct?: There is an error in the reference ZPG00. No details are given except the title, authors, the month and year. 12. OTHER COMMENTS you believe would be useful to the author(s), including pointers to missing relevant work: I found the discussion of related work lacking. Some references to related work are missing. For example the work of Rinard and Marinov from MIT (published in the Workshop for Run-Time Result Checking in 1999) or the work of Alessandro Cimatti and his colleagues at Trento. The only reference to related work that is not by the authors themselves is to Necula's paper in PLDI00. And I believe that the authors dismiss this reference too quickly. They just say that Necula's system attempts to do translation validation without requiring the compiler to provide information. This raises a few questions. Is the system described in this paper more powerful? Is it applicable to more optimizations? +++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++ PAPER NUMBER: 41 CATEGORY: 2 TITLE: Translation Validation of Optimizing Compilers by Computational Induction AUTHOR(S): Amir Pnueli Lenore Zuck Paritosh Pandya -------------------------------------------------------------------------------- 1. Briefly SUMMARIZE the paper (2-3 lines): This paper proposes a way for optimizing compilers to provide just enough information to a verifier to check that compiled code refines its source. 2. RELEVANCE: Please provide a rating of the paper's relevance to the FME Symposium, using the scale: 0 = Out of scope 1 = Marginal interest 2 = Minority interest 3 = Majority interest 4 = Outstanding interest Numeric Rating: 3 Please comment your rating: Although this reviewer thinks that translation validation is the wrong way to deal with compiler correctness, it has other potentially useful applications (e.g., third-party compilation) and is a generally hot topic. (Note: this rating is a grading of the relevance of the subject, not the relevance of the results achieved in this paper.) 3. OVERALL RECOMMENDATION: Please provide a rating of the paper's acceptability, using the scale: 1 = Strong reject 2 = Weak reject 3 = Could go either way 4 = Weak accept 5 = Strong accept Numeric Rating: 1 NB: There should be a correlation between the two rates above. Too bad. 4. CONFIDENCE LEVEL: Please provide a rating of your expertise in the area addressed by the paper, using the scale: 1 = Know virtually nothing about this area 2 = Not too knowledgeable, but I know a bit 3 = Know a moderate amount, about average I'd say 4 = Not my main area of work, but I know a lot about it 5 = My area of work, I know it extremely well Numeric Rating: 2 NB: PC members are responsible for ensuring that 1 is not used here. 5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here? Comment: Good question. The title suggests that the idea of using computational induction for translation verification is new, but in fact previous work (including [Nec00]) uses computational induction in essentially the same way. 6. How WORTHWILE is the goal of the authors? Comment: The goal of the authors is laudable. If they could in fact show that their proof rule is sound and produce evidence that its particular structure is of practical advantage, it could be a useful contribution. 7. How well is this goal EXPLAINED and JUSTIFIED? Comment: Reasonably well. 8. TECHNICAL QUALITY. Are the technical parts (definitions, statements, specifications, proofs, algorithms, etc.) SOUND? Comment: Two parts of the paper are fundamentally incorrect. First, the proof rule (as I interpret it - there are lots of little ambiguities in its description) is wrong. For example, if the delta functions for the source program are all false, the obligations are trivially satisfied, which says that any program refines the miracle. (I would have expected the last conjunct in the hypothesis of $C_{ij}$ (i.e., $\delta^S_{\kappa(i)\kappa(j)}$) on the right-hand side of the implication, but it is clearly not a typo, (the faulty rule is used in the examples).) The second problem has to do with the use of $\phi$-functions in static single assignment form for branching programs. These are not mathematical functions, because they depend implicitly on the history, and treating them as mathematical functions (as the authors do, e.g. in the example on pages 12-13) is unsound. I don't know if the validation approach of section 6 would work with a correct version of SSA form. 9. APPLICABILITY. If the work is primarily theoretical or conceptual, are the implications in terms of applicability adequately justified? If the paper is about a new formal technique, are satisfactory arguments presented in favor of the new technique? If a methodology is proposed, is it sound? If experimental results are presented, is their relevance justified? Comment: I do not see why the proposed proof rule, even if corrected, improves the usual proof rule for forward simulation (even though it is more restrictive, e.g. in forcing the concrete state variables to be deterministic functions of the source variables). 10. PRESENTATION: Describe the QUALITY of the writing, including suggestions for changes where appropriate. Comment: Lots of awkward writing, and a number of typos. Needs proofreading. 11. Were there any formatting or mechanical problems with this paper?: Are the figures and length acceptable?: Are the references correct?: various typos in the references. 12. OTHER COMMENTS you believe would be useful to the author(s), including pointers to missing relevant work: - p. 5, par 4 and elsewhere: The references to C are irrelevant and misleading, since you provide nothing like a semantics of C. The fact that variables are typed is also irrelevant. - p.6, par 2: 0 <= i +1 -> 0 <= i - The attempt to justify the proof rule is pitiful. Prove it with logic or not at all. +++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++ PAPER NUMBER: 41 CATEGORY: 2 TITLE: Translation Validation of Optimizing Compilers by Computational Induction AUTHOR(S): Amir Pnueli Lenore Zuck Paritosh Pandya -------------------------------------------------------------------------------- 1. Briefly SUMMARIZE the paper (2-3 lines): The authors describe a technique to automatically verify the correctness of compiler optimisations. The proof obligations are generated on the fly by the optimising compiler during the compilation process. 2. RELEVANCE: Please provide a rating of the paper's relevance to the FME Symposium, using the scale: 0 = Out of scope 1 = Marginal interest 2 = Minority interest 3 = Majority interest 4 = Outstanding interest Numeric Rating: 3 Please comment your rating: The paper addresses a problem which is a weak spot for the industrial applicability of formal methods, namely the correctness of support software - compilers in particular. The only reason I did not give a "4" rating is that the strong techical contents will likely lessen the interest of many practitioners in the paper itself. (I realise that this might be inevitable, but still...) 3. OVERALL RECOMMENDATION: Please provide a rating of the paper's acceptability, using the scale: 1 = Strong reject 2 = Weak reject 3 = Could go either way 4 = Weak accept 5 = Strong accept Numeric Rating: 3 NB: There should be a correlation between the two rates above. 4. CONFIDENCE LEVEL: Please provide a rating oof your expertise in the area addressed by the paper, using the scale: 1 = Know virtually nothing about this area 2 = Not too knowledgeable, but I know a bit 3 = Know a moderate amount, about average I'd say 4 = Not my main area of work, but I know a lot about it 5 = My area of work, I know it extremely well Numeric Rating: 3 NB: PC members are responsible for ensuring that 1 is not used here. 5. ORIGINALITY. What is NEW and SIGNIFICANT in the work reported here? Comment: The authors present a methodology and proof method for generation of proof obligations on the fly and efficient proof of these obligations. 6. How WORTHWILE is the goal of the authors? Comment: Very. 7. How well is this goal EXPLAINED and JUSTIFIED? Comment: Very well. 8. TECHNICAL QUALITY. Are the technical parts (definitions, statements, specifications, proofs, algorithms, etc.) SOUND? Comment: This is unclear. While I do believe that the method is fundamentally sound, there are some obvious problems. Part of the reason for these problems may be the informal nature of the description so that I might not have interpreted things in the way the authors intended. I have found two main problems, the use of the $\phi$-functions and the verification condition itself. The $\phi$-functions are not functions in the mathematical sense since they depend not only on their arguments but also on implicit control flow information. This means that the reasoning e.g. on page 12 is unsound in principle, even if it happens to work in this case. A way out of this problem could be to define the $\delta_ij$ conditions so that the $\phi$-functions of the target state are taken into account. I.e. the definition of $\delta_ij$ using the SSA form of the problem could include the $\phi$-functions of j, but not the $\phi$-functions of i. In that case the part of the state not mentioned in the arguments to $\phi$ would be simply j, which would be known. E.g. in the example on page 7, $\delta_12$ would be $(s_2' = s_3 \times i_3) \land (i_2' = i_3+1) \land (n' = n) \land (i_2' > n') \land (s_3' = s_2') \land (i_3' = i_2')$ The authors state on page 6 that the non-SSA version of the program is used for "simplicity of exposition". I don't agree that this simplifies - rather the opposite! As the verification condition is written, if the program does not contain any variables (except for the control variable), the $\alpha_j'$ is trivially true, so the translation of the source program could be arbitrary! This clearly will not do, since if the source program is an infinite loop and the translation a do-nothing, then the translation is not a refinement of the source by any reasonable definition of refinement. (In any case, the authors should be more clear about what their refinement relation is.) This points to some deeper problem with the verification condition 9. APPLICABILITY. If the work is primarily theoretical or conceptual, are the implications in terms of applicability adequately justified? If the paper is about a new formal technique, are satisfactory arguments presented in favor of the new technique? If a methodology is proposed, is it sound? If experimental results are presented, is their relevance justified? Comment: Yes. 10. PRESENTATION: Describe the QUALITY of the writing, including suggestions for changes where appropriate. Comment: Good, although following my comment under (8) above, I would like the presentation to be more specific in many places. 11. Were there any formatting or mechanical problems with this paper?: No. Are the figures and length acceptable?: Yes. Are the references correct?: Yes, as far as I could tell. 12. OTHER COMMENTS you believe would be useful to the author(s), including pointers to missing relevant work: +++++++++++++++++++++ End of FME 2001 Paper Review Report ++++++++++++++++++++++