Publications

Soares L, Pereira JO.  2009.  A Simple Approach to Shared Storage Database Servers. WDDDM - Proceedings of the 3rd Workshop on Dependable Distributed Data Management (with EuroSys). :21–24. Abstractsp09b.pdf

This paper introduces a generic technique to obtain a shared-storage database cluster from an off-the-shelf database management system, without needing to heavily refactor server software to deal with distributed locking, buffer invalidation, and recovery from partial cluster failure. Instead, the core of the proposal is the combination of a replication protocol and a surprisingly simple modification to the common copy-on-write logical volume management technique: One of the servers is allowed to skip copy-on-write and directly update the original backing store. This makes it possible to use any shared-nothing database server software in a shared or partially shared storage configuration, thus allowing large cluster configurations with a small number of copies of data.

Silva P, Visser J, Oliveira JN.  2009.  Galois: A Language for Proofs Using Galois Connections and Fork Algebras. PLMMS'09 - ACM SIGSAM 2009 International Workshop on Programming Languages for Mechanized Mathematics Systems. Abstractplmms09.pdf

Galois is a domain specific language supported by the Galculator interactive proof-assistant prototype. Galculator uses an equational approach based on Galois connections with indirect equality as an additional inference rule. Galois allows for the specification of different theories in a point-free style by using fork algebras, an extension of relation algebras with expressive power of first-order logic. The language offers sub-languages to derive proof rules from Galois connections, to express proof tactics, and to organize axioms and theorems into modular definitions.

In this paper, we describe how the algebraic theory underlying the proof-method drives the design of the Galois language. We provide the syntax and semantics of important fragments of Galois and show how they are hierarchically combined into a complete language.

Cunha A, Pacheco H.  2009.  Mapping between Alloy specifications and database implementations. Proceedings of the 7th International Conference on Software Engineering and Formal Methods - SEFM. :285–294. Abstractsefm09.pdf

The emergence of lightweight formal methods tools such as Alloy improves the software design process, by encouraging developers to model and verify their systems before engaging in hideous implementation details. However, an abstract Alloy specification is far from an actual implementation, and manually refining the former into the latter is unfortunately a non-trivial task. This paper identifies a subset of the Alloy language that is equivalent to a relational database schema with the most conventional integrity constraints, namely functional and inclusion dependencies. This semantic correspondence enables both the automatic translation of Alloy specifications into relational database schemas and the reengineering of legacy databases into Alloy. The paper also discusses how to derive an object-oriented application layer to serve as interface to the underlying database.

Vilaça R, Oliveira R.  2009.  Clouder: a flexible large scale decentralized object store - architecture overview. Proceedings of 3rd Workshop on Dependable Distributed Data Management (with EuroSys). Abstractclouder.pdf

The current exponential growth of data calls for massive scale capabilities of storage and processing. Such large volumes of data tend to disallow their centralized storage and processing making extensive and flexible data partitioning unavoidable. This is being acknowledged by several major Internet players embracing the Cloud computing model and offering fi rst generation remote storage services with simple processing capabilities. In this position paper we present preliminary ideas for the architecture of a flexible, efficient and dependable fully decentralized object store able to manage very large sets of variable size objects and to coordinate in place processing. Our target are local area large computing facilities composed of tens of thousands of nodes under the same administrative domain. The system should be capable of leveraging massive replication of data to balance read scalability and fault tolerance.

Matos M, Sousa AL, Pereira JO, Oliveira R, Deliot E, Murray P.  2009.  CLON: Overlay networks and gossip protocols for cloud environments. Proceedings of On the Move to Meaningful Internet Systems - OTM. 5870 Abstractdoa-clon.pdf

Although epidemic or gossip-based multicast is a robust and scalable approach to reliable data dissemination, its inherent redundancy results in high resource con- sumption on both links and nodes. This problem is aggravated in settings that have costlier or resource constrained links, as happens in Cloud Computing infrastruc- tures composed by several interconnected data centers across the globe. The goal of this work is therefore to improve the efficiency of gossip-based reliable multicast by reducing the load imposed on those constrained links. In detail, the proposed CLON protocol combines an overlay that gives preference to local links and a dissemination strategy that takes into account locality. Extensive experimental evaluation using a very large number of simulated nodes shows that this results in a reduction of traffic in constrained links by an order of magnitude, while at the same time preserving the resilience properties that make gossip-based protocols so attractive.

Ferreira M, Oliveira JN.  2009.  An Integrated Formal Methods Tool-Chain and Its Application to Verifying a File System Model. SBMF - 12th Brazilian Symposium on Formal Methods. 5902:153-169. Abstractpaper_37.pdf

Tool interoperability as a mean to achieve integration is among the main goals of the international Grand Challenge initiative. In the context of the Verifiable file system mini-challenge put forward by Rajeev Joshi and Gerard Holzmann, this paper focuses on the integration of different formal methods and tools in modelling and verifying an abstract file system inspired by the Intel R©Flash File System Core. We combine high-level manual specification and proofs with current state of the art mechanical verification tools into a tool-chain which involves Alloy, VDM++ and HOL. The use of (pointfree) relation modelling provides the glue which binds these tools together.

Vilaça R, Pereira JO, Oliveira R, Inigo A-J, de Mendívil JR.  2009.  On the Cost of Database Clusters Reconfiguration. SRDS '09: Proceedings of the 2009 28th IEEE International Symposium on Reliable Distributed Systems. :259–267. Abstractrecovery.pdf

Database clusters based on share-nothing replication techniques are currently widely accepted as a practical solution to scalability and availability of the data tier. A key issue when planning such systems is the ability to meet service level agreements when load spikes occur or cluster nodes fail. This translates into the ability to provision and deploy additional nodes. Many current research efforts focus on designing autonomic controllers to perform such reconfiguration, tuned to quickly react to system changes and spawn new replicas based on resource usage and performance measurements. In contrast, we are concerned about the inherent impact of deploying an additional node to an online cluster, considering both the time required to finish such an action as well as the impact on resource usage and performance of the cluster as a whole. If noticeable, such impact hinders the practicability of self-management techniques, since it adds an additional dimension that has to be accounted for. Our approach is to systematically benchmark a number of different reconfiguration scenarios to assess the cost of bringing a new replica online. We consider factors such as: workload characteristics, incremental and parallel recovery, flow control and outdatedness of the recovering replica. As a result, we show that research should be refocused from optimizing the capture and transmition of changes to applying them, which in a realistic setting dominates the cost of the recovery operation.

Frade MJ, Saabas A, Uustalu T.  2009.  Bidirectional data-flow analyses, type-systematically. PEPM. :141-150. Abstractpepm09-revised.pdf

We show that a wide class of bidirectional data-flow analyses and program optimizations based on them admit declarative descriptions in the form of type systems. The salient feature is a clear separation between what constitutes a valid analysis and how the strongest one can be computed (via the type checking versus principal type inference distinction). The approach also facilitates elegant relational semantic soundness definitions and proofs for analyses and optimizations, with an application to mechanical transformation of program proofs, useful in proof-carrying code. Unidirectional forward and backward analyses are covered as special cases; the technicalities in the general bidirectional case arise from more subtle notions of valid and principal types. To demonstrate the viability of the approach we consider two examples that are inherently bidirectional: type inference (seen as a data-flow problem) for a structured language where the type of a variable may change over a program's run and the analysis underlying a stack usage optimization for a stack-based low-level language.

Costa B, Matos M, Sousa AL.  2009.  CAPI: Cloud Computing API. Simpósio de Informática - INForum. :499–502. Abstractinforum-capi.pdf

Cloud Computing is an emerging business model to provide access to IT resources in a pay per use fashion. Those resources range from low-level virtual machines, passing by application platforms and ending in ready to use software delivered through the Internet, creating a layered stack of differentiated services. Different cloud vendors tend to specialize in different levels, offering different services each with its own proprietary API. This clearly led to provider lock-in and hinders the portability of a given application between different providers. The goal of this paper is therefore to provide an abstraction of the different levels of services in order to reduce vendor lock-in and improve portability, two issues that we believe impair the adoption of the Cloud Computing model.

Matos M, Sousa AL, Pereira JO, Oliveira R.  2009.  A Distributed Bootstrapping Protocol for Overlay Networks. Simpósio de Informática - INForum. :357–368. Abstractinforum-cloncontact.pdf

Peer to peer overlay networks have become popular due to their inherent scalability and resilience properties that come naturally from their decentralized nature. Unfortunately, despite this decentralized approach there is still one important piece that remains centralized: a set of servers to provide identifiers to peers joining the system. This initial step is necessary as new peers need to obtain some contact points in order to establish links to them and join the overlay. This puts out of the protocol model a crucial mechanism as it introduces an external centralized entity to manage the process, leading to problems of scale and fault-tolerance. In scenarios where the churn rate is considerable, the cost of maintaining the list of known peers by the well-known servers may be unbearable. Furthermore, if the peer identifiers provided are not evenly distributed across the universe, this will inevitably cluster the overlay, making it more prone to partitions. In this paper we propose a fully decentralized protocol to obtain those set of initial contact peers using the already deployed infrastructure. By conducting an extensive experimental evaluation we show the effectiveness of the protocol and reason how the ideal number of contact peers may be provided in a fully decentralized fashion.

Matos M, Sousa AL, Pereira JO, Oliveira R.  2009.  CLON: overlay network for clouds. Proceedings of the 3rd Workshop on Dependable Distributed Data Management - WDDM. :14–17. Abstractdoa-clon.pdf

Gossip-based protocols have been gaining an increasing interest from the research community due to the high resilience to node churn and high scalability, thus making them suitable to modern large-scale dynamic systems. Unfortunately, these properties come at the cost of redundant message transmissions to ensure bimodal delivery to all interested peers. In systems with high message throughput, those additional messages could pose a significant burden on the excess of required bandwidth. Furthermore, the overlays upon which message dissemination takes place are oblivious to the underlying network, or rely on posterior optimizations that bias the overlay to mimic the network topology. This contributes even more to the required bandwidth as ’undesirable’ paths are chosen with equal probability among desired ones. In a Cloud Computing scenario, nodes tend to be aggregated in sub-nets inside a data-center or in multiple data-centers, which are connected by costlier, long-distance links. The goal of this work is, therefore, to build an overlay that approximates the structure of the physical network, while ensuring the connectivity properties desirable to ensure reliable dissemination. By having each node judiciously choose which nodes are in its dissemination list at construction time, i.e. by giving preference to local nodes, we are able to significantly reduce the number of messages traversing the long-distance links. In a later stage, this overlay shall be presented as a service upon which data dissemination and management protocols could be run.

Jesus P, Moreno CB, Almeida PS.  2009.  Dependability in Aggregation by Averaging. INForum - Simpósio de Informática. :457–470. Abstractsimposio_de_informatica_inforum_2009_jesus.pdf

Aggregation is an important building block of modern distributed applications, allowing the determination of meaningful properties (e.g. network size, total storage capacity, average load, majorities, etc.) that are used to direct the execution of the system. In the recent years, several approaches have been proposed to compute aggregation functions on distributed settings, exhibiting different characteristics, in terms of accuracy, time and communication complexity. However, the majority of the existing aggregation algorithms exhibit relevant dependability issues, when prospecting their use in real application environments. In this paper, we reveal some dependability issues of aggregation algorithms based on iterative averaging techniques, giving some directions to solve them. This class of algorithms is considered robust (when compared to common tree-based approaches), being independent from the used routing topology and providing an aggregation result at all nodes. However, their robustness is strongly challenged and their correctness often compromised, when changing the assumptions of their working environment to more realistic ones. The correctness of this class of algorithms relies on the maintenance of a fundamental invariant, commonly designated as mass conservation. We will argue that this main invariant is often broken in practical settings, and that additional mechanisms and modifications are required to maintain it, incurring in some degradation of the algorithms performance. In particular, we discuss the behavior of three representative algorithms (Push-Sum Protocol [1], Push- Pull Gossip protocol [2] and Distributed Random Grouping [3]) under asynchronous and faulty (with message loss and node crashes) environments. More specifically, we propose and evaluate two new versions of the Push-Pull Gossip protocol, which solve its message interleaving problem (evidenced even in a synchronous operation mode).

Pardo A, Fernandes JP, Saraiva JA.  2009.  Shortcut fusion rules for the derivation of circular and higher-order monadic programs. Workshop on Partial Evaluation and Program Manipulation. :81-90. Abstractpepm09.pdf

Functional programs often combine separate parts using intermediate data structures for communicating results. Programs so defined are modular, easier to understand and maintain, but suffer from inefficiencies due to the generation of those gluing data structures. To eliminate such redundant data structures, some program transformation techniques have been proposed. One such technique is shortcut fusion, and has been studied in the context of both pure and monadic functional programs.
In this paper, we study several shortcut fusion extensions, so that, alternatively, circular or higher-order programs are derived. These extensions are also provided for effect-free programs and monadic ones. Our work results in a set of generic calculation rules, that are widely applicable, and whose correctness is formally established.

Martins M, Madeira A, Barbosa LS.  2009.  Refinement via interpretation. 7th IEEE International Conference on Software Engineering and Formal Methods - SEFM . :250–259. Abstractsefm09-mmba2009.pdf

Traditional notions of refinement of algebraic specifications, based on signature morphisms, are often too rigid to capture a number of relevant transformations in the context of software design, reuse and adaptation. This paper proposes an alternative notion of specification refinement, building on recent work on logic interpretation. The concept is discussed, its theory partially developed, its use illustrated through a number of examples.

Martins M, Madeira A, Barbosa LS.  2009.  Refinement by interpretation in a general setting. International Workshop in Refinements 2009. 259:105-121. Abstractpaper2_1.pdf

Refinement by interpretation replaces signature morphisms by logic interpretations as a means to translate specifications and witness refinements. The approach was recently introduced by the authors [13] in the context of equational specifications, in order to capture a number of relevant transformations in software design, reuse and adaptation. This paper goes a step forward and discusses the generalization of this idea to deductive systems of arbitrary dimension. This makes possible, for example, to refine sentential into equational specifications and the latter into modal ones. Moreover, the restriction to logics with finitary consequence relations is dropped which results in increased flexibility along the software development process.