Recent Publications

Lopes N, Moreno CB.  2010.  Building Inverted Indexes Using Balanced Trees Over DHT Systems. EuroSys 2006 Poster. Abstract10.1.1.163.9683.pdf

Distributed Hash Table (DHT) systems are scalable and efficient data structures for object storage and location using a simple put/get interface. These systems place objects over a very large set of hosts using a multitude of algorithms in order to distribute objects uniformly among hosts using logarithmic (or lower) costs for routing table sizes and message hops [1, 2]. However, these systems assume that object size (storage load) and popularity (communication load) follow an uniform distribution. When unbalanced data is used on a DHT, hotspots are created at some specific (random) hosts. Although one might argue that storage is not a critical resource, due to the current trend on secondary storage capacity, storing such large objects creates network bottlenecks, which in turn may limit data availability.

Silva JC, Campos JC, Saraiva JA.  2010.  GUI Inspection from Source Code Analysis. Electronic Communications of the EASST. 33 Abstract459-1337-1-pb.pdf

Graphical user interfaces (GUIs) are critical components of todays software. Given their increased relevance, correctness and usability of GUIs are becoming essential. This paper describes the latest results in the development of our tool to reverse engineer the GUI layer of interactive computing systems. We use static analysis techniques to generate models of the user interface behaviour from source code. Models help in graphical user interface inspection by allowing designers to concentrate on its more important aspects. One particularly type of model that the tool is able to generate is state machines. The paper shows how graph theory can be useful when applied to these models. A number of metrics and algorithms are used in the analysis of aspects of the user interface's quality. The ultimate goal of the tool is to enable analysis of interactive system through GUIs source code inspection.

Azevedo PJ, Jorge AM.  2010.  Ensembles of jittered association rule classifiers. Data Mining and Knowledge Discovery. 21(1):91-129. Abstractart3a10.10072fs10618-010-0173-y.pdf

The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias–variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.

Almeida PS, Barbosa MB, Pinto JS, Vieira B.  2010.  Deductive verification of cryptographic software. Innovations in Systems and Software Engineering. 6:203–218. Abstractisse_2010.pdf

We apply state-of-the art deductive verification tools to check security-relevant properties of cryptographic software, including safety, absence of error propagation, and correctness with respect to reference implementations. We also develop techniques to help us in our task, focusing on methods oriented towards increased levels of automation, in scenarios where there are clear obvious limits to such automation. These techniques allow us to integrate automatic proof tools with an interactive proof assistant, where the latter is used off-line to prove once-and-for-all fundamental lemmas about properties of programs. The techniques developed have independent interest for practical deductive verification in general.

Barbosa LS, Cerone A, Petrenko A, Shaikh S.  2010.  Certification of open-source software: A role for formal methods? International Journal of Computer Systems Science and Engineering. :273-281. Abstractijcsse-bcps.pdf

Despite its huge success and increasing incorporation in complex, industrial-strength applications, open source software, by the very nature of its open, unconventional, distributed development model, is hard to assess and certify in an effective, sound and independent way. This makes its use and integration within safety or security-critical systems, a risk. And, simultaneously an opportunity and a challenge for rigourous, mathematically based, methods which aim at pushing software analysis and development to the level of a mature engineering discipline. This paper discusses such a challenge and proposes a number of ways in which open source development may benefit from the whole patrimony of formal methods.

Barbosa LS, Sun M.  2010.  Bringing Class Diagrams to Life. Innovations in Systems and Software Engineering. 6(1-2):91–98. Abstractissej-mb10.pdf

Research in formal methods emphasizes a funda- mental interconnection between modeling, calculation and prototyping, made possible by a common unambiguous, mathematical semantics. This paper, building on a broader research agenda on coalgebraic semantics for Unified Modeling Language diagrams, concentrates on class diagrams and discusses how such a coalgebraic perspective can be of use not only for formalizing their specification, but also as a basis for prototyping.

Rodrigues N, Barbosa LS.  2010.  Slicing for Architectural Analysis. Science of Computer Programming . 75(10):828–847. Abstractrb10-preprint.pdf

Current software development often relies on non trivial coordination logic for combining autonomous services, eventually running on different platforms. As a rule, however, such a coordination layer is strongly weaved within the application at source code level. Therefore, its precise identification becomes a major methodological (and technical) problem and a challenge to any program understanding or refactoring process.
The approach introduced in this paper resorts to slicing techniques to extract coordination data from source code. Such data is captured in a specific dependency graph structure from which a coordination model can be recovered either in the form of an Orc specification or as a collection of code fragments corresponding to the identification of typical coordination patterns in the system. Tool support is also discussed.

Pinto JS, Henriques PR, Cruz D, Areias S.  2010.  Safe integration of annotated components in source projects. ECEASST. Abstract2010_opencert_10_b.pdf

The decision of using existing software components versus building from scratch custom software is one of the most complex and important choices of the entire development/integration process. However, the reuse of software components raises a spectrum of issues, from requirements negotiation to product selection and integration. The correct tradeoff is reached after having analyzed advantages and issues correlated to the reuse. Despite the reuse failures in real cases, many efforts have been made to make this idea successful.
In this context of software reuse in open source projects, we address the problem of reusing annotated components proposing a rigorous approach to assure the quality of the application under construction. We introduce the concept of caller-based slicing as a way of certifying that the integration of a component annotated with a contract into a system will preserve the correct behavior of the former, avoiding malfunctioning after integration.
To complement the efforts done and the benefits of slicing techniques, there is also a need to find an efficient way to visualize the main program with the annotated components and the slices. To take full profit of visualization, it is crucial to combine the visualization of the control/data flow with the textual representation of source code. To attain this objective, we extend the notions of System Dependence Graph and Slicing Criterion to cope with annotations.

Azevedo PJ.  2010.  Rules for contrast sets. Intelligent Data Analysis. 14(6):623-640. Abstractida_cs.pdf

In this paper we present a technique to derive rules describing contrast sets. Contrast sets are a formalism to represent groups di ferences. We propose a novel approach to describe directional contrasts using rules where the contrasting e ect is partitioned into pairs of groups. Our approach makes use of a directional Fisher Exact Test to a nd signi cant di erences across groups. We used a Bonferroni within search adjustment to control type I errors and a pruning technique to prevent derivation of non signi cant contrast set specializations.

Silva A, Bonsangue M, Rutten J.  2010.  Non-deterministic Kleene Coalgebras. Logical Methods in Computer Science. 6(3):1–39. Abstract1007.3769.pdf

In this paper, we present a systematic way of deriving (1) languages of (generalized) regular expressions, and (2) sound and complete axiomatizations thereof, for a wide variety of systems. This generalizes both the results of Kleene (on regular languages and deterministic finite automata) and Milner (on regular behaviours and finite labelled transition systems), and includes many other systems such as Mealy and Moore machines.

Silva A, Rutten J.  2010.  A coinductive calculus of binary trees. Information and Computation. 208(5):578–593. Abstract2010-rutten-silva-ic.pdf

We study the set TA of infinite binary trees with nodes labelled in a semiring A from a coalgebraic perspective. We present coinductive definition and proof principles based on the fact that TA carries a final coalgebra structure. By viewing trees as formal power series, we develop a calculus where definitions are presented as behavioural differential equations. We present a general format for these equations that guarantees the existence and uniqueness of solutions. Although technically not very difficult, the resulting framework has surprisingly nice applications, which is illustrated by various concrete examples.

Abreu R, Zoeteweij P, Van Gemund AJC.  2010.  Simultaneous debugging of software faults. Journal of Systems and Software. 84(4):573-586. Abstractjss11.pdf

(Semi-)automated diagnosis of software faults can drastically increase debugging efficiency, improving reliability and time-to-market. Current automatic diagnosis techniques are predominantly of a statistical nature and, despite typical defect densities, do not explicitly consider multiple faults, as also demonstrated by the popularity of the single-fault benchmark set of programs. We present a reasoning approach, called Zoltar-M(ultiple fault), that yields multiple-fault diagnoses, ranked in order of their probability. Although application of Zoltar-M to programs with many faults requires heuristics (trading-off completeness) to reduce the inherent computational complexity, theory as well as experiments on synthetic program models and multiple-fault program versions available from the software infrastructure repository (SIR) show that for multiple-fault programs this approach can outperform statistical techniques, notably spectrum-based fault localization (SFL). As a side-effect of this research, we present a new SFL variant, called Zoltar-S(ingle fault), that is optimal for single-fault programs, outperforming all other variants known to date.

Riboira A, Abreu R.  2010.  The GZoltar project: a graphical debugger interface. Testing–Practice and Research Techniques. :215–218. Abstracttaicpart10.pdf

Software debugging is one of the most time-consuming and expensive tasks in software development. There are several tools that contribute to this process to become faster and more ecient, but are not integrated with each other, nor provide an intuitive interface. These tools can be integrated to create an IDE plug-in, which gathers the most important debugging information into one place. GZoltar is a new project to create that missing plug-in. The main goal of GZoltar project is to reduce debugging process time and costs.

Abreu R, Van Gemund AJC.  2010.  Diagnosing multiple intermittent failures using maximum likelihood estimation. Artificial Intelligence. 174(18):1481–1497. Abstractaij10.pdf

In fault diagnosis intermittent failure models are an important tool to adequately deal with realistic failure behavior. Current model-based diagnosis approaches account for the fact that a component c"j may fail intermittently by introducing a parameter g"j that expresses the probability the component exhibits correct behavior. This component parameter g"j, in conjunction with a priori fault probability, is used in a Bayesian framework to compute the posterior fault candidate probabilities. Usually, information on g"j is not known a priori. While proper estimation of g"j can be critical to diagnostic accuracy, at present, only approximations have been proposed. We present a novel framework, coined Barinel, that computes estimations of the g"j as integral part of the posterior candidate probability computation using a maximum likelihood estimation approach. Barinel's diagnostic performance is evaluated for both synthetic systems, the Siemens software diagnosis benchmark, as well as for real-world programs. Our results show that our approach is superior to reasoning approaches based on classical persistent failure models, as well as previously proposed intermittent failure models.

Oliveira N, Pereira MJV, Henriques PR, da Cruz D, Cramer B.  2010.  VisualLISA: A Visual Environment to Develop Attribute Grammars. Computer Science and Information Systems. 7(2):265–290.