Publications

Matos M, Vilaça R, Pereira JO, Oliveira R.  2011.  An epidemic approach to dependable key-value substrates. Proceedings of 1st International Workshop on Dependability of Clouds, Data Centers and Virtual Computing Environments (with DSN). Abstractdcdv-epidemicstorage.pdf

The sheer volumes of data handled by today's Internet services demand uncompromising scalability from the persistence substrates. Such demands have been successfully addressed by highly decentralized key-value stores invariably governed by a distributed hash table. The availability of these structured overlays rests on the assumption of a moderately sta- ble environment. However, as scale grows with unprecedented numbers of nodes the occurrence of faults and churn becomes the norm rather than the exception, precluding the adoption of rigid control over the network's organization. In this position paper we outline the major ideas of a novel architecture designed to handle today's very large scale demand and its inherent dynamism. The approach rests on the well-known reliability and scalability properties of epidemic protocols to minimize the impact of churn. We identify several challenges that such an approach implies and speculate on possible solutions to ensure data availability and adequate access performance.

Vilaça R, Oliveira R, Pereira JO.  2011.  A correlation-aware data placement strategy for key-value stores. Proceeedings of the 11th IFIP Distributed Applications and Interoperable Systems (DAIS). Abstractdais11.pdf

Key-value stores hold the unprecedented bulk of the data produced by applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and pro- cessing models, and even elementary data consistency. Moreover, existing key-value stores have only random or order based placement strategies. In this paper we exploit arbitrary data relations easily expressed by the application to foster data locality and improve the performance of com- plex queries common in social network read-intensive workloads. We present a novel data placement strategy, supporting dynamic tags, based on multidimensional locality-preserving mappings. We compare our data placement strategy with the ones used in existing key-value stores under the workload of a typical social network appli- cation and show that the proposed correlation-aware data placement strategy offers a major improvement on the system's overall response time and network requirements.

Carvalho N, Bordalo J, Campos F, Pereira JO.  2011.  Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment. Proceedings of the 6th workshop on Middleware for service oriented computing - SOC. :1–7. Abstractcbcp11.pdf

The correctness and performance of large scale service oriented systems depend on distributed middleware components perform- ing various communication and coordination functions. It is, how- ever, very difficult to experimentally assess such middleware com- ponents, as interesting behavior often arises exclusively in large scale settings, but such deployments are costly and time consum- ing. We address this challenge with MINHA, a system that virtual- izes multiple JVM instances within a single JVM while simulating key environment components, thus reproducing the concurrency, distribution, and performance characteristics of the actual system. The usefulness of MINHA is demonstrated by applying it to the WS4D Java stack, a popular implementation of the Devices Profile for Web Services (DPWS) specification.

Sá CR, Soares C, Jorge AM, Azevedo PJ, Costa JP.  2011.  Mining Association Rules for Label Ranking. PAKDD - The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining. :432-443. Abstract13-sa.pdf

Recently, a number of learning algorithms have been adapted for label ranking, including instance-based and tree-based methods. In this paper, we continue this line of work by proposing an adaptation of association rules for label ranking based on the APRIORI algorithm. Given that the original APRIORI algorithm does not aim to obtain predictive models, two changes were needed for this achievement. The adaptation essentially consists of using variations of the support and con dence measures based on ranking similarity functions that are suitable for label ranking. Additionally we propose a simple greedy method to select the parameters of the algorithm. We also adapt the method to make a prediction from the possibly con icting consequents of the rules that apply to an example. Despite having made our adaptation from a very simple variant of association rules for classi cation, partial results clearly show that the method is making valid predictions. Additionally, they show that it competes well with state-of-the-art label ranking algorithms.

Castro N, Azevedo PJ.  2011.  Time Series Motifs Statistical Significance. Proceedings of the Eleventh SIAM International Conference on Data Mining - SDM. :687-698. Abstractstatmotifs.pdf

Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from nance to health. Many algorithms have been proposed for the task of eciently nding motifs. Surprisingly, most of these proposals do not focus on how to evaluate the discovered motifs. They are typically evaluated by human experts. This is unfeasible even for moderately sized datasets, since the number of discovered motifs tends to be prohibitively large. Statistical signi cance tests are widely used in bioinformatics and association rules mining communities to evaluate the extracted patterns. In this work we present an approach to calculate time series motifs statistical signi cance. Our proposal leverages work from the bioinformatics community by using a symbolic de nition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique - statistical tests - to a time series setting.This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif.

Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro M.  2011.  Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution. Principles of Distributed Systems. 7109:513–527. Abstract1109.4373v1.pdf

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass.

In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Gonçalves R, Almeida PS, Moreno CB, Fonte V, Preguiça N.  2011.  Evaluating Dotted Version Vectors in Riak. INForum - Simpósio de Informática. :474–479. Abstractmain.pdf

The NoSQL movement is rapidly increasing in importance, acceptance and usage in major (web) applications, that need the partition-tolerance and availability of the CAP theorem for scalability purposes, thus sacrificing the consistency side. With this approach, paradigms such as Eventual Consistency became more widespread. An eventual consistent system must handle data divergence and conflicts, that have to be carefully accounted for. Some systems have tried to use classic Version Vectors (VV) to track causality, but these reveal either scalability problems or loss of accuracy (when pruning is used to prevent vector growth).

Dotted Version Vectors (DVV) is a novel mechanism for dealing with data versioning in eventual consistent systems, that allows both accurate causality tracking and scalability both in the number of clients and servers, while limiting vector size to replication degree.

In this paper we describe briefly the challenges faced when incorporating DVV in Riak (a distributed key-value store), evaluate its behavior and performance, and discuss the advantages and disadvantages of this specific implementation.

Borges M, Moreno CB, Jesus P, Almeida PS.  2011.  Estimativa Contínua e Tolerante a Faltas de Funções Distribuição Cumulativa em Redes de Larga Escala. INForum - Simpósio de Informática. Abstractsimposio_de_informatica_inforum_2011_borges.pdf

Em ambientes descentralizados de larga escala como é o caso das redes de sensores sem fios, P2P e outras, a recolha de dados é praticável apenas se houver agregação dos mesmos. No estado da arte actual, as estratégias de resolução desta questão não são satisfatórias dado exigirem que os protocolos sejam reiniciados sempre que os valores iniciais mudam ou quando o rácio de entrada/saída de nodos é não nulo. Acresce que nenhuma estratégia satisfaz a estimativa de CDFs (funções de dis- tribuição cumulativa). A abordagem apresentada neste trabalho mostra como é possível o cálculo de funções de distribuição cumulativa em redes distribuídas, permitindo o seguimento dinâmico dos valores amostrados, sem que seja necessário reiniciar o protocolo em condições adversas de funcionamento da rede (perda de mensagens, entrada/saída de nodos). Esta abordagem é baseada num protocolo de averaging tolerante a faltas. Com este trabalho pretende-se também contribuir com uma estratégia que reduz o custo de comunicação entre nodos, com base em decisões locais e sensível à taxa de variação de estimativas. Os resultados de simulação mostram a resiliência deste protocolo, permitindo a estimativa contínua de CDFs em presença de dinamismo (perda de mensagens, alteração de valor amostrado, churn). Mostram também a convergência rápida na estimativa de CDFs para diferentes configurações de rede.

Fernandes JP, Saraiva JA, Seidel D, Voigtländer J.  2011.  Strictification of circular programs. Workshop on Partial Evaluation and Program Manipulation - PEPM. :131-140. Abstractfernandessaraivaseidelvoigtlander11.pdf

Circular functional programs (necessarily evaluated lazily) have been used as algorithmic tools, as attribute grammar implementations, and as target for program transformation techniques. Classically, Richard Bird [1984] showed how to transform certain multitraversal programs (which could be evaluated strictly or lazily) into one-traversal ones using circular bindings. Can we go the other way, even for programs that are not in the image of his technique? That is the question we pursue in this paper. We develop an approach that on the one hand lets us deal with typical examples corresponding to attribute grammars, but on the other hand also helps to derive new algorithms for problems not previously in reach.

Martins M, Madeira A, Diaconescu R, Barbosa LS.  2011.  Hybridization of Institutions. Proceedings of 4th International Conference on Algebra and Coalgebra in Computer Science - CALCO. 6859:283-297. Abstractmmdb11.pdf

Modal logics are successfully used as specification logics for reactive systems. However, they are not expressive enough to refer to individual states and reason about the local behaviour of such systems. This limitation is overcome in hybrid logics which introduce special symbols for naming states in models. Actually, hybrid logics have recently regained interest, resulting in a number of new results and techniques as well as applications to software specification. In this context, the first contribution of this paper is an attempt to ‘universalize’ the hybridization idea. Following the lines of 16, where a method to modalize arbitrary institutions is presented, the pape r introduces a method to hybridize logics at the same institution-independent level. The method extends arbitrary institutions with Kripke semantics (for multi-modalities with arbitrary arities) and hybrid features.This paves the ground for a general result: any encoding (expressed as comorphism) from an arbitrary institution to first order logic (FOL)determines a comorphism from its hybridization to FOL. This second contribution opens the possibility of effective tool support to specification languages based upon logics with hybrid features.

Madeira A, Faria JM, Martins M, Barbosa LS.  2011.  Hybrid Specification of Reactive Systems: An Institutional Approach. Proceedings of 9th International Conference on Software Engineering and Formal Methods - SEFM. 7041:269-285. Abstractoriginalsefm2011.pdf

This paper introduces a rigorous methodology for requirements specification of systems that react to external stimulus by evolving through different operational modes. In each mode different functionalities are provided. Starting from a classical state-machine specification, the envisaged methodology interprets each state as a different mode of operation endowed with an algebraic specification of the corresponding functionality. Specifications are given in an expressive variant of hybrid logic which is, at a later stage, translated into first-order logic to bring into scene suitable tool support. The paper’s main contribution is to provide rigorous foundations for the method, framing specification logics as institutions and the translation process as a comorphism between them.

Rodrigues C, Martins M, Madeira A, Barbosa LS.  2011.  Refinement by interpretation in Pi-institutions. Proceedings of 15th International Refinement Workshop - Refine . 55:53-64. Abstract1106.4093v1.pdf

The paper discusses the role of interpretations, understood as multifunctions that preserve and reflect logical consequence, as refinement witnesses in the general setting of pi-institutions. This leads to a smooth generalization of the refinement-by-interpretation approach, recently introduced by the authors in more specific contexts. As a second, yet related contribution a basis is provided to build up a refinement calculus of structured specifications in and across arbitrary pi-institutions.

Barbosa LS, Henriques PR, Sanchez A.  2011.  Towards rigorous analysis of Open Source Software. Proceedings of 5th International Workshop on Harnessing Theories for Tool Support in Software - TTSS. :77-89. Abstractoriginalbrs11.pdf

This paper discusses the (often hidden) potential of Open Source Software development to resort to, benefit from and cross-fertilize formal engineering methods, whose role is indisputable in the production of trustworthy software components. A strategy addressing the incorporation of formal verification methods in the Open Source Software lifecycle, in a somewhat less conventional way — that of assisting the re-engineering process of running code — is proposed.

Sanchez A, Barbosa LS, Riesco D.  2011.  Bigraphical Modelling of Architectural Patterns. 8th International Symposium on Formal Aspects of Component Software - FACS. 7253:313-330. Abstractfacs2011.pdf

Archery is a language for b ehavioural mo delling of architectural patterns, supp orting hierarchical comp osition and a typ e discipline. This pap er extends Archery to cop e with the patterns' structural dimension through a set of (re-)conguration combinators and constraints that all instances of a pattern must ob ey. Both typ es and instances of architectural patterns are semantically represented as bigraphical reactive systems and op erations up on them as reaction rules. Such a bigraphical semantics provides a rigorous mo del for Archery patterns and reduces constraint verication in architectures to a typ e-checking problem.

Martins A, Barbosa LS, Rodrigues N.  2011.  Shacc: A Functional Prototyper for a Component Calculus. Proceedings of 4th International Conference on Algebra and Coalgebra in Computer Science - CALCO . 6859:413-419. Abstractoriginalcalcotools2011.pdf

Over the last decade component-based software development arose as a promising paradigm to deal with the ever increasing complexity in software design, evolution and reuse. Shacc is a prototyping tool for component-based systems in which components are modelled coinductively as generalized Mealy machines. The prototype is built as a Haskell library endowed with a graphical user interface developed in Swing.