Recent Publications

Matos M, Schiavoni V, Rivière E, Felber P, Oliveira R.  2014.  LayStream: composing standard gossip protocols for live video streaming. 14th IEEE International Conference on Peer-to-Peer Computing (P2P). Abstractp2p-laystream.pdf

Gossip-based live streaming is a popular topic, as attested by the vast literature on the subject. Despite the particular merits of each proposal, all need to implement and deal with common challenges such as membership management, topology construction and video packets dissemination. Well-principled gossip-based protocols have been proposed in the literature for each of these aspects. Our goal is to assess the feasibility of building a live streaming system, \sys, as a composition of these existing protocols, to deploy the resulting system on real testbeds, and report on lessons learned in the process. Unlike previous evaluations conducted by simulations and considering each protocol independently, we use real deployments. We evaluate protocols both independently and as a layered composition, and unearth specific problems and challenges associated with deployment and composition. We discuss and present solutions for these, such as a novel topology construction mechanism able to cope with the specificities of a large-scale and delay-sensitive environment, but also with requirements from the upper layer. Our implementation and data are openly available to support experimental reproducibility.

Campos F, Matos M, Pereira JO.  2014.  Coordenação de Serviços Web heterogéneos com tolerância a faltas. INForum - Simpósio de Informática. Abstractinforum-wsgossip.pdf

A norma Devices Profile for Web Services (DPWS) permite a descoberta, a configuração e a interoperabilidade de dispositivos heterogéneos ligados em rede com grande disparidade em termos de capacidade de processamento, desde pequenos eletrodomésticos inteligentes ou controladores em máquinas industriais, até servidores em centros de dados. Os mecanismos de notificação de eventos e de configuração automática previstos pelo DPWS são especialmente adequados a cenários Machine-to-Machine (M2M) devido à sua simplicidade e flexibilidade. No entanto, a escalabilidade em cenários com um elevado número de nós, nomeadamente, em grandes infraestruturas, é limitada. A coerência dos dados armazenados e manipulados pelos mecanismos de autodescoberta também não é adequada a aplicações críticas. Neste artigo, abordamos este problema com uma proposta assente na norma DPWS com os conceitos de difusão epidémica e de consenso, mostrando que é possível compor serviços adequados a diferentes aplicações assegurando garantias de tolerância a faltas e maior escalabilidade.

Coelho F, Cruz F, Vilaça R, Pereira JO, Oliveira R.  2014.  pH1: A Transactional Middleware for NoSQL. 33rd IEEE International Symposium on Reliable Distributed Systems - SRDS. Abstractph1.pdf

NoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency.
In this work we propose pH1, a generic middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. This is achieved in a non-intrusive manner,
requiring no modifications to servers and no native support for multiple versions. Instead, the transactional context is achieved by means of a multiversion distributed cache and an external
transaction certifier, exposed by extending the client’s interface with transaction bracketing primitives.
We validate and evaluate pH1 with Apache Cassandra and Hyperdex. First, using the YCSB benchmark, we show that the cost of providing ACID guarantees to these NoSQL databases
amounts to 11% decrease in throughput.
Moreover, using the transaction intensive TPC-C workload, pH1 presented an impact of 22% decrease in throughput. This contrasts with OMID, a previous proposal that takes advantage of
HBase’s support for multiple versions, with a throughput penalty of 76% in the same conditions.

Casanova P, Garlan D, Schmerl BR, Abreu R.  2014.  Diagnosing unobserved components in self-adaptive systems. SEAMS - 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. :75–84. Abstractunobserved2013_casanova_unobserved.pdf

Availability is an increasingly important quality for today's software-based systems and it has been successfully addressed by the use of closed-loop control systems in self-adaptive systems. Probes are inserted into a running system to obtain information and the information is fed to a controller that, through provided interfaces, acts on the system to alter its behavior. When a failure is detected, pinpointing the source of the failure is a critical step for a repair action. However, information obtained from a running system is commonly incomplete due to probing costs or unavailability of probes. In this paper we address the problem of fault localization in the presence of incomplete system monitoring. We may not be able to directly observe a component but we may be able to infer its health state. We provide formal criteria to determine when health states of unobservable components can be inferred and establish formal theoretical bounds for accuracy when using any spectrum-based fault localization algorithm.

Perez A, Abreu R.  2014.  A diagnosis-based approach to software comprehension. ICPC - 22nd International Conference on Program Comprehension. :37–47. Abstracticpc2014preprint.pdf

Program comprehension is a time-consuming task performed during the process of reusing, reengineering, and enhancing existing systems. Currently, there are tools to assist in program comprehension by means of dynamic analysis, but, e.g., most cannot identify the topology and the interactions of a certain functionality in need of change, especially when used in large, real-world software applications. We propose an approach, coined Spectrum-based Feature Comprehension (SFC), that borrows techniques used for automatic software-fault-localization, which were proven to be ef ective even when debugging large applications in resource constrained environments. SFC analyses the program by exploiting run-time information from test case executions to compute the components that are important for a given feature (and whether a component is used to implement just one feature or more), helping software engineers to understand how a program is structured and what the functionality's dependencies are. We present a toolset, coined Pangolin, that implements SFC and displays its report to the user using an intuitive visualization. A user study with the open-source application Rhino is presented, demonstrating the eciency of Pangolin in locating the components that should be inspected when changing a certain functionality.

Gupta S, Abreu R, de Kleer J, Van Gemund AJC.  2014.  Automatic systems diagnosis without behavioral models. IEEE Aerospace Conference. :1–8. Abstractaeroconf.pdf

Recent feedback obtained while applying Model-based diagnosis (MBD) in industry suggests that the costs involved in behavioral modeling (both expertise and labor) can outweigh the benefits of MBD as a high-performance diagnosis approach. In this paper, we propose an automatic approach, called ANTARES, that completely avoids behavioral modeling. Decreasing modeling sacrifices diagnostic accuracy, as the size of the ambiguity group (i.e., components which cannot be discriminated because of the lack of information) increases, which in turn increases misdiagnosis penalty. ANTARES further breaks the ambiguity group size by considering the component's false negative rate (FNR), which is estimated using an analytical expression. Furthermore, we study the performance of ANTARES for a number of logic circuits taken from the 74XXX/ISCAS benchmark suite. Our results clearly indicate that sacrificing modeling information degrades the diagnosis quality. However, considering FNR information improves the quality, attaining the diagnostic performance of an MBD approach.

Abreu R, Cunha J, Fernandes JP, Martins P, Perez A, Saraiva JA.  2014.  Smelling faults in spreadsheets. Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution. 14 Abstracticsme14.pdf

Despite being staggeringly error prone, spreadsheets are a highly flexible programming environment that is widely used in industry. In fact, spreadsheets are widely adopted for decision making, and decisions taken upon wrong (spreadsheet-based) assumptions may have serious economical impacts on businesses, among other consequences. This paper proposes a technique to automatically pinpoint potential faults in spreadsheets. It combines a catalog of spreadsheet smells that provide a first indication of a potential fault, with a generic spectrum-based fault localization strategy in order to improve (in terms of accuracy and false positive rate) on these initial results. Our technique has been implemented in a tool which helps users detecting faults.To validate the proposed technique, we consider a wellknown and well-documented catalog of faulty spreadsheets. Our experiments yield two main results: we were able to distinguish between smells that can point to faulty cells from smells and those that are not capable of doing so; and we provide a technique capable of detecting a significant number of errors: two thirds of the cells labeled as faulty are in fact (documented) errors.

Campos JC, Arcuri A, Fraser G, Abreu R.  2014.  Continuous Test Generation: Enhancing Continuous Integration with Automated Test Generation. Proceedings Automated Software Engineering (ASE). Abstractase14_ctg.pdf

In object oriented software development, automated unit test generation tools typically target one class at a time. A class, however, is usually part of a software project consisting of more than one class, and these are subject to changes over time. This context of a class offers significant potential to improve test generation for individual classes. In this paper, we introduce Continuous Test Generation (CTG), which includes automated unit test generation during continuous integration (i.e., infrastructure that regularly builds and tests software projects). CTG offers several benefits: First, it answers the question of how much time to spend on each class in a project. Second, it helps to decide in which order to test them. Finally, it answers the question of which classes should be subjected to test generation in the first place. We have implemented CTG using the EVOUITE unit test generation tool, and performed experiments using eight of the most popular open source projects available on GitHub, ten randomly selected projects from the SF100 corpus, and five industrial projects. Our experiments demonstrate improvements of up to +58% for branch coverage and up to +69% for thrown undeclared exceptions, while reducing the time spent on test generation by up to +83%.

Cardoso N, Abreu R.  2014.  Enhancing Reasoning Approaches to Diagnose Functional and Non-Functional Errors. 25th International Workshop on Principles of Diagnosis (DX'14). Abstractmain.pdf

Most approaches to automatic software diagnosis abstract the system under analysis in terms of component activity and correct/incorrect behaviour (colectivelly known as spectra). While this binary error abstraction has been shown to be capable of diagnosing functional errors, when diagnosing non-functional errors it yields suboptimal accuracy. The main reason for this limitation is related to the lack of mechanisms for encoding error symptoms (such as performance degradation) in such a binary schema. In this paper, we propose a novel approach to diagnose both functional and non-functional errors by incorporating into the classic, bayesian reasoning approaches to error diagnosis concepts from the fuzzy logic domain. The empirical evaluation on 27000 synthetic scenarios demonstrates that the proposed fuzzy logic-based approach considerably improves the diagnostic accuracy (20% on average, with 99% statistical significance) when compared to the classic, state-of-the-art approach.

Mendes A, Backhouse R, Ferreira JF.  2014.  Structure Editing of Handwritten Mathematics -- Improving the Computer Support for the Calculational Method. ACM International Conference on Interactive Tabletops and Surfaces (ITS 2014). Abstract2014-structureeditinghandwrittenmathematics.pdf

We present a structure editor that aims to facilitate the presentation and manipulation of handwritten mathematical expressions. The editor is oriented to the calculational mathematics involved in algorithmic problem solving and it provides features that allow reliable structure manipulation of mathematical formulae, as well as flexible and interactive presentations. We describe some of its most important features, including the use of gestures to manipulate algebraic formulae, the structured selection of expressions, definition and redefi- nition of operators in runtime, gesture’s editor, and handwrit- ten templates. The editor is made available in the form of a C# class library which can be easily used to extend existing tools. For example, we have extended Classroom Presenter,a tool for ink-based teaching presentations and classroom interaction. We have tested and evaluated the editor with target users. The results obtained seem to indicate that the software is usable, suitable for its purpose and a valuable contribution to teaching and learning algorithmic problem solving.

Ferreira JF, Mendes A.  2014.  The Magic of Algorithm Design and Analysis: Teaching Algorithmic Skills using Magic Card Tricks. 19th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2014). :75-80. Abstract2014-magicalgorithmdesign.pdf

We describe our experience using magic card tricks to teach algorithmic skills to first-year Computer Science undergraduates. We illustrate our approach with a detailed discussion on a card trick that is typically presented as a test to the psychic abilities of an audience. We use the trick to discuss concepts like problem decomposition, pre- and post-conditions, and invariants. We discuss pedagogical issues and analyse feedback collected from students. The feedback has been very positive and encouraging.

Martens C, Ferreira JF, Bosser A-G, Cavazza M.  2014.  Generative Story Worlds as Linear Logic Programs. Intelligent Narrative Technologies 7 (INT7). Abstract2014-generativestoryworldsllp.pdf

Linear logic programming languages have been identified in prior work as viable for specifying stories and analyzing their causal structure. We investigate the use of such a language for specifying story worlds, or settings where generalized narrative actions have uniform effects (not specific to a particular set of characters or setting elements), which may create emergent behavior through feedback loops. We show a sizable example of a story world specified in the language Celf and discuss its interpretation as a story-generating program, a simulation, and an interactive narrative. Further, we show that the causal analysis tools available by virtue of using a proof-theoretic language for specification can assist the author in reasoning about the structure and consequences of emergent stories.

Moreno CB, Almeida PS, Shoker A.  2014.  Making Operation-Based CRDTs Operation-Based. The International Conference on Distributed Applications and Interoperable Systems - DAIS. 8460:126–140. AbstractBest paper (joint)

Conflict-free Replicated Datatypes (CRDT) can simplify the design of eventually consistent systems. They can be classified into state- based or operation-based. Operation-based designs have the potential for allowing very compact solutions in both the sent messages and the object state size. Unfortunately, the current approaches are still far from this objective. In this paper, we introduce a new ‘pure’ operation-based frame- work that makes the design and the implementation of these CRDTs more simple and efficient. We show how to leverage the meta-data of the messaging middleware to design very compact CRDTs, while only disseminating operation names and their optional arguments.

Cruz F, Maia F, Oliveira R, Vilaça R.  2014.  Workload-aware table splitting for NoSQL. SAC - Proceedings of the ACM Symposium on Applied Computing. Abstractdads14.pdf

Massive scale data stores, which exhibit highly desirable scalability and availability properties are becoming pivotal systems in nowadays infrastructures. Scalability achieved by these data stores is anchored on data independence; there is no clear relationship between data, and atomic inter-node operations are not a concern. Such assumption over data allows aggressive data partitioning. In particular, data tables are horizontally partitioned and spread across nodes for load balancing. However, in current versions of these data stores, partitioning is either a manual process or automated but simply based on table size. We argue that size based partitioning does not lead to acceptable load balancing as it ignores data access patterns, namely data hotspots. Moreover, manual data partitioning is cumbersome and typically infeasible in large scale scenarios. In this paper we propose an automated table splitting mechanism that takes into account the system workload. We evaluate such mechanism showing that it simple, non-intrusive and e fective.

Cunha J, Fernandes JP, Mendes J, Pereira R, Saraiva JA.  2014.  MDSheet - Model-Driven Spreadsheets. 1st Workshop on Software Engineering Methods in Spreadsheets (SEMS 2014). Abstractpaper_11.pdf

This paper showcases MDSheet, a framework aimed at improving the engineering of spreadsheets. This framework is model-driven, and has been fully integrated under a spreadsheet system. Also, its practical interest has been demonstrated by several empirical studies.