Publications

Paulo J, Pereira JO.  2014.  A Survey and Classification of Storage Deduplication Systems. ACM Computing Surveys. 47(1):1–30. Abstractpp14b.pdf

The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.

The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope.
This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.

Ferreira JF, Gherghina C, He G, Qin S, Chin W-N.  2014.  Automated verification of the FreeRTOS scheduler in HIP/SLEEK. International Journal on Software Tools for Technology Transfer. 16(4):381-397. Abstract2014-sttt-automatedverificationfreertosscheduler.pdf

Automated verification of operating system kernels is a challenging problem, partly due to the use of shared mutable data structures. In this paper, we show how we can automatically verify memory safety and functional correctness properties of the task scheduler component of the FreeRTOS kernel using the verification system Hip/Sleek. We show how some of Hip/Sleek features like user-defined predicates and lemmas make the specifications highly expressive and the verification process viable. To the best of our knowledge, this is the first code-level verification of memory safety and functional correctness properties of the FreeRTOS scheduler. The outcome of our experiment confirms that Hip/Sleek can indeed be used to verify code that is used in production. Moreover, since the properties that we verify are quite general, we envisage that the same approach can be adopted to verify components of other operating systems.

Perez A, Abreu R, Riboira A.  2014.  A dynamic code coverage approach to maximize fault localization efficiency. Journal of Systems and Software. 90:18–28. Abstractpaper.pdf

Spectrum-based fault localization is amongst the most effective techniques for automatic fault localization. However, abstractions of program execution traces, one of the required inputs for this technique, require instrumentation of the software under test at a statement level of granularity in order to compute a list of potential faulty statements. This introduces a considerable overhead in the fault localization process, which can even become prohibitive in, e.g., resource constrained environments. To counter this problem, we propose a new approach, coined dynamic code coverage (DCC), aimed at reducing this instrumentation overhead. This technique, by means of using coarser instrumentation, starts by analyzing coverage traces for large components of the system under test. It then progressively increases the instrumentation detail for faulty components, until the statement level of detail is reached. To assess the validity of our proposed approach, an empirical evaluation was performed, injecting faults in six real-world software projects. The empirical evaluation demonstrates that the dynamic code coverage approach reduces the execution overhead that exists in spectrum-based fault localization, and even presents a more concise potential fault ranking to the user. We have observed execution time reductions of 27% on average and diagnostic report size reductions of 77% on average.

Hofer B, Perez A, Abreu R, Wotawa F.  2014.  On the empirical evaluation of similarity coefficients for spreadsheets fault localization. Automated Software Engineering. :1–28. Abstractspreadsheetdebugging.pdf

Spreadsheets are by far the most prominent example of end-user programs of ample size and substantial structural complexity. They are usually not thoroughly tested so they often contain faults. Debugging spreadsheets is a hard task due to the size and structure, which is usually not directly visible to the user, i.e., the functions are hidden and only the computed values are presented. A way to locate faulty cells in spreadsheets is by adapting software debugging approaches for traditional procedural or object-oriented programming languages. One of such approaches is spectrum-based fault localization (Sfl). In this paper, we study the impact of different similarity coefficients on the accuracy of Sfl applied to the spreadsheet domain. Our empirical evaluation shows that three of the 42 studied coefficients (Ochiai, Jaccard and Sorensen-Dice) require less effort by the user while inspecting the diagnostic report, and can also be used interchangeably without a loss of accuracy. In addition, we illustrate the influence of the number of correct and incorrect output cells on the diagnostic report.

Abreu R, Hofer B, Perez A, Wotawa F.  2014.  Using constraints to diagnose faulty spreadsheets. Software Quality Journal . Abstractconstr_based_debugging_of_spreadsheets.pdf

Spreadsheets can be viewed as a highly flexible programming environment for end users. Spreadsheets are widely adopted for decision making and may have a serious economical impact for the business. However, spreadsheets are staggeringly prone to errors. Hence, approaches for aiding the process of pinpointing the faulty cells in a spreadsheet are of great value. We present a constraint-based approach, ConBug, for debugging spreadsheets. The approach takes as input a (faulty) spreadsheet and a test case that reveals the fault and computes a set of diagnosis candidates for the debugging problem. Therefore, we convert the spreadsheet and a test case to a constraint satisfaction problem. We perform an empirical evaluation with 78 spreadsheets from different sources, where we demonstrate that our approach is light-weight and efficient. From our experimental results, we conclude that ConBug helps end users to pinpoint faulty cells.

Cunha J, Erwig M, Mendes J, Saraiva JA.  2014.  Model inference for spreadsheets. Automated Software Engineering. :1-32. Abstractase14.pdf

Many errors in spreadsheet formulas can be avoided if spreadsheets are built automati-
cally from higher-level models that can encode and enforce consistency constraints in the generated
spreadsheets. Employing this strategy for legacy spreadsheets is dicult, because the model has
to be reverse engineered from an existing spreadsheet and existing data must be transferred into
the new model-generated spreadsheet.
We have developed and implemented a technique that automatically infers relational schemas
from spreadsheets. This technique uses particularities from the spreadsheet realm to create better
schemas. We have evaluated this technique in two ways: First, we have demonstrated its appli-
cability by using it on a set of real-world spreadsheets. Second, we have run an empirical study
with users. The study has shown that the results produced by our technique are comparable to
the ones developed by experts starting from the same (legacy) spreadsheet data.
Although relational schemas are very useful to model data, they do not t well spreadsheets as
they do not allow to express layout. Thus, we have also introduced a mapping between relational
schemas and ClassSheets. A ClassSheet controls further changes to the spreadsheet and safeguards
it against a large class of formula errors. The developed tool is a contribution to spreadsheet
(reverse) engineering, because it lls an important gap and allows a promising design method
(ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal e ort.

D H, D J, D L, Moreno CB.  2014.  Link Community Detection Using Generative Model and Nonnegative Matrix Factorization. 9(1) Abstractjournal.pone_.0086899.pdf

Discovery of communities in complex networks is a fundamental data analysis problem with applications in various domains. While most of the existing approaches have focused on discovering communities of nodes, recent studies have shown the advantages and uses of link community discovery in networks. Generative models provide a promising class of techniques for the identification of modular structures in networks, but most generative models mainly focus on the detection of node communities rather than link communities. In this work, we propose a generative model, which is based on the importance of each node when forming links in each community, to describe the structure of link communities. We proceed to fit the model parameters by taking it as an optimization problem, and solve it using nonnegative matrix factorization. Thereafter, in order to automatically determine the number of communities, we extend the above method by introducing a strategy of iterative bipartition. This extended method not only finds the number of communities all by itself, but also obtains high efficiency, and thus it is more suitable to deal with large and unexplored real networks. We test this approach on both synthetic benchmarks and real-world networks including an application on a large biological network, and compare it with two highly related methods. Results demonstrate the superior performance of our approach over competing methods for the detection of link communities.

Azevedo PJ, Magalhães A.  2014.  Contrast Set Mining in Temporal Databases. Expert Systems with Applications. Abstractexsy12080.pdf

Understanding the underlying differences between groups or classes in certain contexts can be of the utmost importance. Contrast set mining relies on discovering significant patterns by contrasting two or more groups. A contrast set is a conjunction of attribute–value pairs that differ meaningfully in its distribution across groups. A previously proposed technique is rules for contrast sets, which seeks to express each contrast set found in terms of rules. This work extends rules for contrast sets to a temporal data mining task. We define a set of temporal patterns in order to capture the significant changes in the contrasts discovered along the considered time line. To evaluate the proposal accuracy and ability to discover relevant information, two different real-life data sets were studied using this approach.

Barbosa LS, Martins M, Carreteiro M.  2014.  A Hilbert- Style Axiomatisation for Equational Hybrid Logic. Journal of Logic, Language and Information. 23(1):31-52. Abstractart3a10.10072fs10849-013-9184-6.pdf

This paper introduces an axiomatisation for equational hybrid logic based on previous axiomatizations and natural deduction systems for propositional and first-order hybrid logic. Its soundness and completeness is discussed. This work is part of a broader research project on the development a general proof calculus for hybrid logics.

Barbosa LS, Shaikh S.  2014.  Selected Contributions from the Open Source Software Certification (OpenCert) Workshops. Science of Computer Programming. 91:139-140. Abstract1-s2.0-s0167642314001713-main.pdf

We present to you this special issue dedicated to the 2nd, 3rd and 4th editions of the International Workshop on Foundations and Techniques for Open Source Software Certification (OpenCert) held in 2008 (Milan, Italy), 2009 (York, UK) and 2010 (Pisa, Italy) respectively. This is a compilation of a selected set of extended papers presented at these workshops. OpenCert provides for a unique venue advancing the state of the art in the analysis and assurance of open source software with an ultimate aim of achieving certification and standards. The dramatic growth in open source software over recent years has provided for a fertile ground for fundamental research and demonstrative case studies. Over the years, OpenCert has enabled a thriving community, small but focused, examining issues ranging from certification to security and safety analysis for applications areas as diverse as railways, aviation, knowledge management, sustainable development, and the open source developers community.

Barbosa LS, Lumpe M.  2014.  Formal Aspects of Component Software. Science of Computer Programming. 94:253-254. Abstract1-s2.0-s0167642314001506-main.pdf

This issue includes extended versions of selected best papers from the 7th International Workshop on Formal Aspects of Component Software (FACS 2010) held in Guimarães, Portugal on October 14–16, 2010. The component-based software development approach has emerged as a promising paradigm to cope with an ever increasing complexity of present-day software solutions by bringing sound production and engineering principles into software engineering. However, many conceptual and technological issues remain that challenge component-based software development theory and practice. To address these issues, FACS seeks to provide a forum for researchers and practitioners in the areas of component software and formal methods to foster a better understanding of the component-based paradigm and its applications as well as how formal methods can or should be used to make component-based software development succeed.

Barbosa LS, de Junior FHC.  2014.  Selected and Extended Papers of the Brazilian Symposium on Programming Languages 2012. Science of Computer Programming. 96:175-176. Abstract1-s2.0-s016764231400330x-main.pdf

The Brazilian Symposium on Programming Languages (SBLP) is an annual conference romoted by the Brazilian Computer Society (SBC), which has become an important forum for researchers and practitioners in programming language design and implementation from around the world to interact and present their results and experience. This issue of Science of Computer Programming includes extended versions of selected papers from the XVI SBLP, held in Natal, Brazil on September 23–28, 2012.

Cunha A, Macedo N.  2014.  Least- Change Bidirectional Model Transformation With QVT- R and ATL. Software and Systems Modeling. 28:1-29. Abstractsosym14.pdf

QVT Relations (QVT-R) is the standard language proposed by the OMG to specify bidirectional model transformations. Unfortunately, in part due to ambiguities and omissions in the original semantics, acceptance and development of effective tool support has been slow. Recently, the checking semantics of QVTR has been clarified and formalized. In this article we propose a QVT-R tool that complies to such semantics. Unlike any other existing tool, it also supports metamodels enriched with OCL constraints (thus avoiding returning ill-formed models), and proposes an alternative enforcement semantics that works according to the simple and predictable “principle of least change”. The implementation is based on an embedding of both QVT-R transformations and UML class diagrams (annotated with OCL) in Alloy, a lightweight formal specification language with support for automatic model finding via SAT solving. We also show how this technique can be applied to bidirectionalize ATL, a popular (but unidirectional) model transformation language.

Pinto JS, Almeida JB, Barbosa MB, Vieira B, Filliâtre JC.  2014.  CAOVerif: An Open- Source Deductive Verification Platform for Cryptographic Software Implementations. Science of Computer Programming. 91:216–233. Abstractopencertjournal_ack.pdf

CAO is a domain-specific imperative language for cryptography, offering a rich mathematical type system and crypto-oriented language constructions. We describe the design and implementation of a deductive verification platform for CAO and demonstrate that the development time of such a complex verification tool could be greatly reduced by building on the Jessie plug-in included in the Frama-C framework. We discuss the interesting challenges raised by the domain-specific characteristics of CAO, and describe how we tackle these problems in our design. We base our presentation on real-world examples of CAO code, extracted from the open-source code of the NaCl cryptographic library, and illustrate how various cryptography-relevant security properties can be verified.

Silva A, Jacobs B, Staton S.  2014.  Preface. Electronic Notes in Theoretical Computer Science. 308:1-2. Abstract1-s2.0-s1571066114000681-main.pdf

This volume collects papers presented at the 30th Annual Conference on Mathematical Foundations of Programming Semantics (MFPS XXX), held on the campus of Cornell University, Ithaca, New York, USA, from Thursday, June 12 through Sunday, June 15, 2014. The MFPS conferences are devoted to those areas of mathematics, logic, and computer science that are related to models of computation in general and to the semantics of programming languages in particular. The series particularly stresses providing a forum where researchers in mathematics and computer science can meet and exchange ideas about problems of common interest. As the series also strives to maintain breadth in its scope, the conference strongly encourages participation by researchers in neighboring areas.