Recent Publications

Maia F, Matos M, Vilaça R, Pereira JO, Oliveira R, Rivière E.  2013.  DataFlasks: an epidemic dependable key-value substrate. Proceedings of the 3rd International Workshop on Dependability of Clouds, Data Centers and Virtual Computing Environments (with DSN 2013). Abstractdataflasks_dcdv13.pdf

Recently, tuple-stores have become pivotal struc- tures in many information systems. Their ability to handle large datasets makes them important in an era with unprecedented amounts of data being produced and exchanged. However, these tuple-stores typically rely on structured peer-to-peer protocols which assume moderately stable environments. Such assumption does not always hold for very large scale systems sized in the scale of thousands of machines. In this paper we present a novel approach to the design of a tuple-store. Our approach follows a stratified design based on an unstructured substrate. We focus on this substrate and how the use of epidemic protocols allow reaching high dependability and scalability.

Vilaça R, Cruz F, Pereira JO, Oliveira R.  2013.  An Effective Scalable SQL Engine for NoSQL Databases. Proceedings of the 13th IFIP Distributed Applications and Interoperable Systems (DAIS). Abstractpaper_29.pdf

NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major im- pediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off. In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compli- ant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase.

Maia F, Matos M, Rivière E, Oliveira R.  2013.  Slicing as a distributed systems primitive. Proceedings of the 6th Latin-American Symposium on Dependable Computing (LADC). Abstractslicing_primitive.pdf

Large-scale distributed systems appear as the major in- frastructures for supporting planet-scale services. These systems call for appropriate management mechanisms and protocols. Slicing is an example of an autonomous, fully decentral- ized protocol suitable for large-scale environments. It aims at organizing the system into groups of nodes, called slices, according to an application-specific criteria where the size of each slice is relative to the size of the full system. This al- lows assigning a certain fraction of nodes to different task, according to their capabilities. Although useful, current slicing techniques lack some features of considerable practical importance. This pa- per proposes a slicing protocol, that builds on existing so- lutions, and addresses some of their frailties. We present novel solutions to deal with non-uniform slices and to per- form online and dynamic slices schema reconfiguration. Moreover, we describe how to provision a slice-local Peer Sampling Service for upper protocol layers and how to en- hance slicing protocols with the capability of slicing over more than one attribute. Slicing is presented as a complete, dependable and inte- grated distributed systems primitive for large-scale systems.

Beernaert L, Gomes P, Matos M, Vilaça R, Oliveira R.  2013.  Evaluating Cassandra as a manager of large file sets. Proceedings of the 3rd International Workshop on Cloud Data and Platforms (with EuroSys 2013). Abstractclouddp-cassfs.pdf

All companies developing their business on the Web, not only giants like Google or Facebook but also small com- panies focused on niche markets, face scalability issues in data management. The case study of this paper is the content management systems for classified or commercial advertise-ments on the Web. The data involved has a very significant growth rate and a read-intensive access pattern with a reduced update rate. Typically, data is stored in traditional file systems hosted on dedicated servers or Storage Area Network devices due to the generalization and ease of use of file systems. However, this ease in implementation and usage has a disadvantage: the centralized nature of these systems leads to availability, elasticity and scalability problems. The scenario under study, undemanding in terms of the system's consistency and with a simple interaction model, is suitable to a distributed database, such as Cassandra, conceived precisely to dynamically handle large volumes of data. In this paper, we analyze the suitability of Cassandra as a substitute for file systems in content management systems. The evaluation, conducted using real data from a produc- tion system, shows that using Cassandra, one can easily get horizontal scalability of storage, redundancy across multiple independent nodes, and load distribution imposed by the periodic activities of safeguarding data, while ensuring a comparable performance to that of a file system.

Gomes EF, Jorge AM, Azevedo PJ.  2013.  Classifying heart sounds using multiresolution time series motifs: an exploratory study. International Conference on Computer Science and Software Engineering - C3S2E. Abstractp23-gomes.pdf

The aim of this work is to describe an exploratory study on the use of a SAX-based Multiresolution Motif Discovery method for Heart Sound Classification. The idea of our work is to discover relevant frequent motifs in the audio signals and use the discovered motifs and their frequency as characterizing attributes. We also describe different configurations of motif discovery for defining attributes and compare the use of a decision tree based algorithm with random forests on this kind of data. Experiments were performed with a dataset obtained from a clinic trial in hospitals using the digital stethoscope DigiScope. This exploratory study suggests that motifs contain valuable information that can be further exploited for Heart Sound Classification.

Sá CR, Soares C, Knobbe A, Azevedo PJ, Jorge AM.  2013.  Multi-interval Discretization of Continuous Attributes for Label Ranking. 16th International Conference on Discovery Science - DS. :155-169. Abstractds2013.pdf

Label Ranking (LR) problems, such as predicting rankings of nancial analysts, are becoming increasingly important in data mining. While there has been a signi cant amount of work on the development of learning algorithms for LR in recent years, preprocessing methods for LR are still very scarce. However, some methods, like Naive Bayes for LR and APRIORI-LR, cannot deal with real-valued data directly. As a make-shift solution, one could consider conventional discretization methods used in classi cation, by simply treating each unique ranking as a separate class. In this paper, we show that such an approach has several disadvantages. As an alternative, we propose an adaptation of an existing method, MDLP, speci cally for LR problems. We illustrate the advantages of the new method using synthetic data. Additionally, we present results obtained on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and in most cases improves the results of the learning algorithms.

Nunes A, Pereira JO.  2013.  Improving transaction abort rates without compromising throughput through judicious scheduling. Proceedings of the 28th Annual ACM Symposium on Applied Computing. :493–494. Abstractimprovtxnabort.pdf

Althought optimistic concurrency control protocols have increasingly been used in distributed database management systems, they imply a trade-of between the number of transactions that can be executed concurrently, hence, the peak throughput, and transactions aborted due to conflicts.
We propose a novel optimistic concurrency control mechanism that controls transaction abort rate by minimizing the time during which transactions are vulnerable to abort, without compromising throughput. Briefly, we throttle transaction execution with an adaptive mechanism based on the state of the transaction queues while allowing out-of-order execution based on expected transaction latency. Prelimi- nary evaluation shows that this provides a substantial improvement in committed transaction throughput.

Nunes A, Oliveira R, Pereira JO.  2013.  Conflict Classes for Replicated Databases: A Case-Study. Workshop on Planetary-Scale Distributed Systems - W-PSDS. Abstractconflictclasses.pdf

The major challenge in fault-tolerant replicated transactional databases is providing efficient distributed concurrency control that allows non-conflicting transactions to execute concurrently. A common approach is to partition the data according to the data access patterns of the workload, assuming that this will allow operations in each partition to be scheduled independently and run in parallel.
The effectiveness of this approach hinges on the characteristics of the workload: (i) the ability to identify such partitions and (ii) the actual number of such partitions that arises. Performance results that have been presented to support such proposals are thus tightly linked to the simplistic synthetic benchmarks that have been used. This is worrisome, since these benchmarks have not been conceived for this purpose and the resulting definition of partitions might not be representative of real applications. In this paper we contrast a more complex synthetic benchmark (TPC-E) with a real application in the same area (financial brokerage), concluding that the real setting makes it much harder to determine a correct partition of the data and that sub-optimal partitioning severely constrains the performance of replication.

Almeida JB, Barbosa MB, Barthe G, Dupressoir F.  2013.  Certified computer-aided cryptography: efficient provably secure machine code from high-level implementations. ACM Conference on Computer and Communications Security. :1217-1230. Abstract316.pdf

We present a computer-aided framework for proving concrete security bounds for cryptographic machine code implementations. The front-end of the framework is an interactive verification tool that extends the EasyCrypt framework to reason about relational properties of C-like programs extended with idealised probabilistic operations in the style of code-based security proofs. The framework also incorporates an extension of the CompCert certified compiler to support trusted libraries providing complex arithmetic calculations or instantiating idealized components such as sampling operations. This certified compiler allows us to carry to executable code the security guarantees established at the high-level, and is also instrumented to detect when compilation may interfere with side-channel countermeasures deployed in source code.
We demonstrate the applicability of the framework by applying it to the RSA-OAEP encryption scheme, as standard- ized in PKCS#1 v2.1. The outcome is a rigorous analysis of the advantage of an adversary to break the security of as- sembly implementations of the algorithms specified by the standard. The example also provides two contributions of independent interest: it bridges the gap between computer-assisted security proofs and real-world cryptographic implementations as described by standards such as PKCS,and demonstrates the use of the CompCert certified compiler in the context of cryptographic software development.

Gomes T, Abade T, Silva JL, Campos JC.  2013.  Desenvolvimento de Jogos Educativos na plataforma APEX: O Jogo da Asma. Atas da Conferência Interação. :90-97. Abstractgomesasc2013-interacao.pdf

A plataforma APEX foi desenvolvida para a prototipagem de ambientes de computação ubíqua. Neste artigo exploramos a sua aplicabilidade ao desenvolvimento de jogos sérios, ou seja, jogos que para além de uma componente lúdica, possuem uma componente instrutiva e formativa. Em concreto, descrevemos o Jogo da Asma. Um jogo que pretende chamar a atenção das crianças para os factores causadores de ataques de asma, bem como transmitir conhecimento sobre como os evitar. Para além de descrever o jogo, descrevem-se os resultados de um estudo em que se procurou avaliar quer a viabilidade da utilização da plataforma na criação de jogos sérios, quer a usabilidade do próprio jogo.

Abade T, Gomes T, Silva JL, Campos JC.  2013.  Avaliação de Ambientes Ubíquos na Plataforma APEX. Atas da Conferência Interação 2013. :177-178. Abstractabadegsc2013-interacao.pdf

Este artigo descreve a avaliação de um ambiente ubíquo utilizando a APEX, uma plataforma de prototipagem rápida de ambientes ubíquos que permite que os utilizadores naveguem num mundo virtual, podendo experimentar muitas das funcionalidades da solução e do design proposto.

Cruz P, Campos JC.  2013.  Ambiente de geração, mutação e execução de casos de teste para aplicações Web. Atas da Conferência Interação 2013. :45-52. Abstractcruzc2013-interacao.pdf

Cada vez mais as interfaces gráficas são um ponto-chave entre a comunicação dos utilizadores e o sistema. Para garantir que estas executam devidamente uma adequada fases de testes é essencial. No entanto, a execução de testes numa interface é um processo dispendioso e moroso, sendo estes tipicamente executados de forma manual. Neste artigo é explorada a automatização do processo de teste de interfaces para aplicações Web. Adopta-se uma abordagem de testes baseados em modelos. Os casos de teste são gerados recorrendo a modelos de tarefas e o comportamento da interface comparado com o que está prescrito no modelos de tarefas. Uma ferramenta que suporta a abordagem está em desenvolvimento.

Gomes T, Abade T, Harrison M, Silva JL, Campos JC.  2013.  Developing Serious Games With The APEX Framework. Proceedings of the Workshop on Ubiquitous games and gamification for promoting behavior change and wellbeing. :37-40. Abstractdeveloping-games-apex-final.pdf

APEX was developed as a framework for the prototyping of ubiquitous computing (ubicomp) environments. In this paper we explore its role as a platform for developing serious games. In particular we describe the Asthma game, a game aimed at raising awareness of asthma triggers among children, thus stimulating a healthier life-style in what concerns asthma and respiratory problems.

Silva CE, Campos JC.  2013.  Combining Static and Dynamic Analysis for the Reverse Engineering of Web Applications. Proceedings of the 5th Symposium on Engineering Interactive Computing Systems - EICS. :107-112. Abstracteics13.pdf

Software has become so complex that it is increasingly hard to have a complete understanding of how a particular system will behave. Web applications, their user interfaces in particular, are built with a wide variety of technologies making them particularly hard to debug and maintain. Reverse engineering techniques, either through static analysis of the code or dynamic analysis of the running application, can be used to help gain this understanding. Each type of technique has its limitations. With static analysis it is difficult to have good coverage of highly dynamic applications, while dynamic analysis faces problems with guaranteeing that generated models fully capture the behavior of the system. This paper proposes a new hybrid approach for the reverse engineering of web applications' user interfaces. The approach combines dynamic analyzes of the application at runtime, with static analyzes of the source code of the event handlers found during interaction. Information derived from the source code is both directly added to the generated models, and used to guide the dynamic analysis.

Silva JC, Silva JL, Campos JC, Saraiva JA.  2013.  Uma Abordagem para a Geração de Casos de Teste Baseada em Modelos. Sistemas e Tecnologias de Informação. 2:142-146. Abstractid-265-jcsilva.pdf

The analytical methods based on evaluation models of interactive systems were proposed as an alternative to user testing in the last stages of the software development due to its costs. However, the use of isolated behavioral models of the system limits the results of the analytical methods. An example of these limitations relates to the fact that they are unable to identify implementation issues that will impact on usability. With the introduction of model-based testing we are enable to test if the implemented software meets the specified model. This paper presents an model-based approach for test cases generation from the static analysis of source code.