HASLab

Macedo N. 2014. A Relational Approach to Biditectional Transformations. Abstractphd14nmm.pdf

Transforming data between different formats is an essential task in computer science and software engineering. Ordinary as this exercise may seem, creating a target artifact from a source artifact a is often just the first step in a dynamic evolution process: the initial transformation implicitly binds a and b, and as either artifact gets updated, modifications must be propagated to the other side in order to keep the overall system consistent. For decades, this kind of problem has been addressed via ad hoc or domainspecific techniques in virtually every area of computer science—the view-update problem from the database community being the classic example. However, in the last few years, research on bidirectional transformation (Czarnecki et al., 2009) has exploded, with developments in areas like heterogeneous data synchronization (Brabrand et al., 2005; Kawanaka and Hosoya, 2006; Foster et al., 2007), string manipulation (Bohannon et al., 2008; Barbosa et al., 2010), functional languages (Matsuda et al., 2007; Voigtländer, 2009; Pacheco and Cunha, 2010), model transformation (Ehrig et al., 2007; Cicchetti et al., 2010; Macedo and Cunha, 2013), user interfaces (Meertens, 1998; Huet al., 2008), relational databases (Bancilhon and Spyratos, 1981; Dayal and Bernstein,1982; Bohannon et al., 2006), graph transformation (Schürr, 1994; Hidaka et al., 2010), or spreadsheet systems (Cunha et al., 2012; Macedo et al., 2014c). The main idea behind bidirectional transformation frameworks is having a single transformation artifact denote the transformations in both directions (either by construction or through calculation), avoiding the cumbersome and error-prone task of manually writing and maintaining two coherent transformations.

Matos M. 2013. Epidemic Algorithms for Large Scale Data Dissemination. Abstractmiguel_angelo_marques_de_matos.pdf

Distributed systems lie at the core of modern IT infrastructures and services, such as the Internet, e-commerce, the stock exchange, Cloud Computing and the SmartGrid. These systems, built and developed throughout the last decades, have relied, due to their importance, on distributed algorithms with strong cor- rectness and safety guarantees. However, such algorithms have failed to accom- pany, for theoretical and practical reasons, the requirements of the distributed systems they support in terms of scale, scope and pervasiveness. Reality is unfor- giving and thus researchers had the need to design and develop new algorithms based on probabilistic principles that, despite their probabilistic yet quantifiable guarantees, are suitable to today’s modern distributed systems. In this dissertation, we study the challenges of and propose solutions for, ap- plying probabilistic dissemination algorithms, also known as epidemic- or gossip- based, in very large scale distributed systems. In particular, we focus on the issues of scalability of content types (topic-based publish-subscribe), content size (efficient data dissemination) and ordering requirements (total order). For each one of these issues, we present a novel distributed algorithm that solves the prob- lem while matching state-of-the art performance and trade-offs, and evaluate it on a realistic setting.

Coelho F. 2013. Implementation and test of transactional primitives over Cassandra. Abstractthesis.pdf

NoSQL databases opt not to oer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. These limitations bring considerable complexity to the development of client applications and are therefore an obstacle to the broader adoption of the technology. In this work we propose a middleware layer over NoSQL databases that oers transactional guarantees with Snapshot Isolation. The proposed solution is achieved in a non-intrusive manner, providing to the clients the same interface as a NoSQL database, simply adding the transactional context. The transactional context is the focus of our contribution and is modularly based on a Non Persistent Version Store that holds several versions of elements and interacts with an external transaction certier. In this work, we present an implementation of our system over Apache Cassandra and by using two representative benchmarks, YCSB and TPC-C, we measure the cost of adding transactional support with ACID guarantees.

Portela B. 2013. Segurança Criptográfica no Armazenamento e Partilha de Dados em Ambientes Cloud. Abstractmsc_bernardoportela.pdf

Technological developments in the cloud model allow for an exponential growth in the online storage of information. File sharing services such as Dropbox or Google Drive already play an important role among the users, promoting the demand for information availability and data sharing. The implementation of secure systems is strictly related to the concept of trust. In practice, data sharing systems tend to assume a weak level of trust, usually considering an adversary model that excludes the provider, reducing its usefulness for users with sensible information. On the other hand, not trusting the provider in any way is a solution that contradicts the model itself, as it forces the bulk of processing to be performed locally, and the remote server’s high performance remains unused.Therefore, the challenges in developing cloud security consist in adapting the trust level, balancing performance, security and functionality. This thesis proposes an analysis of the current security and its alternative solutions, as well as an implementation of a secure file sharing system. These initial objectives consist on both a theoretical and practical approach to the subject, identifying the problems, presenting alternatives and testing their viability. Finally, the implemented system is to apply the solutions previously evaluated on a practical perspective, followed by a security validation that allows for comparison with modern systems.

Jesus P. 2012. Robust Distributed Data Aggregation. Abstractpaulo_cesar_de_oliveira_jesus.pdf

Distributed aggregation algorithms are an important building block of modern large scale systems, as it allows the determination of meaningful system-wide properties (e.g., network size, total storage capacity, average load, or majorities) which are required to direct the execution of distributed applications. In the last decade, several algorithms have been proposed to address the distributed computation of aggregation functions (e.g., COUNT, SUM, AVERAGE, and MAX/MIN), exhibiting different properties in terms of accuracy, speed and communication tradeoffs. However, existing approaches exhibit many issues when challenged in faulty and dynamic environments, lacking in terms of fault-tolerance and support to churn.
This study details a novel distributed aggregation approach, named Flow Updating, which is fault-tolerant and able to operate on dynamics networks. The algorithm is based on manipulating flows (inspired by the concept from graph theory), that are updated using idempotent messages, providing it with unique robustness capabilities. Experimental results showed that Flow Updating outperforms previous averaging algorithms in terms of time and message complexity, and unlike them it self adapts to churn and changes of the initial input values without requiring any periodic restart, supporting node crashes and high levels of message loss.
In addition to this main contribution, others can also be found in this research work, namely: a definition of the aggregation problem is proposed; existing distributed aggregation algorithm are surveyed and classified into a comprehensive taxonomy; a novel algorithm is introduced, based on Flow Updating, to estimate the Cumulative Distribution Function (CDF) of a global system attribute.
It is expected that this work will constitute a relevant contribution to the area of distributed computing, in particular to the robust distributed computation of aggregation functions in dynamic networks.

Vilaça R. 2012. Clouder: A Flexible Large Scale Decentralized Object Store. Abstractricardo_manuel_pereira_vilaca.pdf

Large scale data stores have been initially introduced to support a few concrete extreme scale applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. In strong contrast with traditional relational databases (RDBMS), large scale data stores present very simple data models and APIs, lacking most of the established relational data management operations; and relax consistency guarantees, providing eventual consistency. With a number of alternatives now available and mature, there is an increasing willingness to use them in a wider and more diverse spectrum of applications, by skewing the current trade-off towards the needs of common business users, and easing the migration from current RDBMS. This is particularly so when used in the context of a Cloud solution such as in a Platform as a Service (PaaS). This thesis aims at reducing the gap between traditional RDBMS and large scale data stores, by seeking mechanisms to provide additional consistency guarantees and higher level data processing primitives in large scale data stores. The devised mechanisms should not hinder the scalability and dependability of large scale data stores. Regarding, higher level data processing primitives this thesis explores two complementary approaches: by extending data stores with additional operations such as general multi-item operations; and by coupling data stores with other existent processing facilities without hindering scalability. We address this challenges with a new architecture for large scale data stores, efficient multi item access for large scale data stores, and SQL processing atop large scale data stores. The novel architecture allows to find the right trade-offs among flexible usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data store’s data models with tags and a multi-tuple data placement strategy, that allow to efficiently store and retrieve large sets of related data at once. For efficient SQL support atop scalable data stores we devise design modifications to existing relational SQL query engines, allowing them to be distributed. We demonstrate our approaches with running prototypes and extensive experimental evaluation using proper workloads.

Mendes J. 2012. Evolution of Model-Driven Spreadsheets. Abstractmscthesis.pdf

Spreadsheets are likely to be one of the most widely used programming environments in the world. Spreadsheet popularity is due to characteristics such as their low entry barrier, their availability on almost any computer, their simple visual interface but mainly due to their flexibility. This flexibility, however, comes with a cost: spreadsheets are extremely error prone, as indicated by several studies. The work presented in this thesis aims at tackling the problem of spreadsheet errors. The strategy described is based on a model-driven approach, and is achieved by embedding spreadsheet models within spreadsheets themselves. This embedding enables users to create models in the same environment that they use for spreadsheet development which is precisely the environment that they are familiar with. Moreover, a set of evolution operations that can be performed on these models and respective instances is defined. In this setting, users interact and evolve both models and spreadsheets in the same coherent environment. This facilitates the establishment and maintenance of a consistency relationship between models and instances throughout the spreadsheet development life cycle, and by this it is expected the reduction of the number of errors that are made and the improvement of productivity in using spreadsheets. Resulting from this work, a prototype was created and is also discussed in this dissertation. This tool can be used to validate the approach followed in this thesis and to provide a foundational framework for future developments.

Shoker A. 2012. Byzantine Fault Tolerance: From Static Selection to Dynamic Switching. AbstractPaper

Byzantine Fault Tolerance (BFT) is becoming crucial with the revolution of online applications and due to the increasing number of innovations in computer technologies. Although dozens of BFT protocols have been introduced in the previous decade, their adoption by practitioners sounds disappointing. To some extant, this indicates that existing protocols are, perhaps, not yet too convincing or satisfactory. The problem is that researchers are still trying to establish `the best protocol' using traditional methods, e.g., through designing new protocols. However, theoretical and experimental analyses demonstrate that it is hard to achieve one-size-fits-all BFT protocols. Indeed, we believe that looking for smarter tactics like `fasten fragile sticks with a rope to achieve a solid stick' is necessary to circumvent the issue. In this thesis, we introduce the first BFT selection model and algorithm that automate and simplify the election process of the `preferred' BFT protocol among a set of candidate ones. The selection mechanism operates in three modes: Static, Dynamic, and Heuristic. For the two latter modes, we present a novel BFT system, called Adapt, that reacts to any potential changes in the system conditions and switches dynamically between existing BFT protocols, i.e., seeking adaptation. The Static mode allows BFT users to choose a single BFT protocol only once. This is quite useful in Web Services and Clouds where BFT can be sold as a service (and signed in the SLA contract). This mode is basically designed for systems that do not have too fluctuating states. In this mode, an evaluation process is in charge of matching the user preferences against the profiles of the nominated BFT protocols considering both: reliability, and performance. The elected protocol is the one that achieves the highest evaluation score. The mechanism is well automated via mathematical matrices, and produces selections that are reasonable and close to reality. Some systems, however, may experience fluttering conditions, like variable contention or message payloads. In this case, the static mode will not be efficient since a chosen protocol might not fit the new conditions. The Dynamic mode solves this issue. Adapt combines a collection of BFT protocols and switches between them, thus, adapting to the changes of the underlying system state. Consequently, the `preferred' protocol is always polled for each system state. This yields an optimal quality of service, i.e., reliability and performance. Adapt monitors the system state through its \emph{Event System}, and uses a Support Vector Regression method to conduct run time predictions for the performance of the protocols (e.g., throughput, latency, etc). Adapt also operates in a Heuristic mode. Using predefined heuristics, this mode optimizes user preferences to improve the selection process. The evaluation of our approach shows that selecting the `preferred' protocol is automated and close to reality in the static mode. In the Dynamic mode, Adapt always achieves the optimal performance among available protocols. The evaluation demonstrates that the overall system performance can be improved significantly too. Other cases explore that it is not always worthy to switch between protocols. This is made possible through conducting predictions with high accuracy, that can reach more than 98% in many cases. Finally, the thesis shows that Adapt can be smarter through using heuristics.

Cunha J. 2011. Model-based Spreadsheet Engineering. Abstractthesis.pdf

Spreadsheets can be viewed as programming languages for non-professional programmers. These so-called ``end-user'' programmers vastly outnumber professional programmers creating millions of new spreadsheets every year. As a programming language, spreadsheets lack support for abstraction, testing, encapsulation, or structured programming. As a result, and as numerous studies have shown, the high rate of production is accompanied by an alarming high rate of errors. Some studies report that up to 90% of real-world spreadsheets contain errors. After their initial creation, many spreadsheets turn out to be used for storing and processing increasing amounts of data and supporting increasing numbers of users over long periods of time, making them complicated systems.
An emerging solution to handle the complex and evolving software systems is Model-driven Engineering (MDE). To consider models as first class entities and any software artifact as a model or a model element is one of the basic principles of MDE.
We adopted some techniques from MDE to solve spreadsheet problems. Most spreadsheets (if not all) lack a proper specification or a model. Using reverse engineering techniques we are able to derive various models from legacy spreadsheets. We use functional dependencies (a formalism that allow us to define how some column values depend on other column values) as building blocks for these models. Models can be used for several spreadsheet improvements, namely refactoring, safe evolution, migration or even generation of edit assistance. The techniques presented in this work are available under the framework HAEXCEL that we developed. It is composed of online and batch tools, reusable HASKELL libraries and OpenOffice.org extensions.
A study with several end-users was organized to survey the impact of the techniques we designed. The results of this study indicate that the models can bring great benefits to spreadsheet engineering helping users to commit less errors and to work faster.

Proenca J. 2011. Synchronous Coordination of Distributed Components. Abstractthesis-colour.pdf

Coordination is a relatively recent field, considerably inspired by concurrency theory [83, 85, 91]. Coordination languages and models [90] are based on the philosophy that an application or a system should be divided into the parts that perform computations, typically components or services, and the parts that coordinate the results and resources required to perform the computations. The coordination aspect focuses on the latter, describing how the components or services are connected. In this thesis we study a specific class of coordination models, namely synchronous, exogenous, and composable models, and we exploit implementation techniques for such models in distributed environments. Our work concentrates on the Reo coordination model [8] as the main representative of this class of coordination models.

Macedo N. 2010. Translating Alloy specifications to the point-free style. Abstractnunomacedomsc.pdf

Every program starts from a model, an abstraction, which is iteratively re ned until we reach the
nal result, the implementation. However, at the end, one must ask: does the nal program resemble
in anyway the original model? Was the original idea correct to begin with? Formal methods
guarantee that those questions are answered positively, resorting to mathematical techniques. In
particular, in this thesis we are interested on the second factor: veri cation of formal models.
A trend of formal methods defends that they should be lightweight, resulting in a reduced
complexity of the speci cation, and automated analysis. Alloy was proposed as a solution for this
problem. In Alloy, the structures are described using a simple mathematical notation: relational
logic. A tool for model checking, automatic veri cation within a given scope, is also provided.
However, sometimes model checking is not enough and the need arises to perform unbounded
veri cations. The only way to do this is to mathematically prove that the speci cations are correct.
As such, there is the need to nd a mathematical logic expressive enough to be able to represent
the speci cations, while still being su ciently understandable.
We see the point-free style, a style where there are no variables or quanti cations, as a kind
of Laplace transform, where complex problems are made simple. Being Alloy completely relational,
we believe that a point-free relational logic is the natural framework to reason about Alloy
speci cations.
Our goal is to present a translation from Alloy speci cations to a point-free relational calculus,
which can then be mathematically proven, either resorting to proof assistants or to manual proving.
Since our motivation for the use of point-free is simplicity, we will focus on obtaining expressions
that are simple enough for manipulation and proofs about them.

Silva A. 2010. Kleene Coalgebra. Abstractthesis.pdf

Computer systems have widely spread since their appearance, and they now play a crucial role in many daily activities, with their deployment ranging from small home appliances to safety critical components, such as airplane or automobile control systems. Accidents caused by either hardware or software failure can have disastrous consequences, leading to the loss of human lives or causing enormous financial drawbacks. One of the greatest challenges of computer science is to cope with the fast evolution of computer systems and to develop formal techniques which facilitate the construction of dependable software and hardware systems. Since the early days of computer science, many scientists have searched for suitable models of computation and for specification languages that are appropriate for reasoning about such models. When devising a model for a particular system, there is a need to encode different features which capture the inherent behavior of the system. For instance, some systems have deterministic behavior (a calculator or an elevator), whereas others have inherently non-deterministic or probabilistic behavior (think of a casino slot machine). The rapidly increasing complexity of systems demands for compositional and unifying models of computation, as well as general methods and guidelines to derive specification languages.

Ferreira JF. 2010. Principles and Applications of Algorithmic Problem Solving. Abstractthesis-a4-colour.pdf

Algorithmic problem solving provides a radically new way of approaching and solving problems in general by using the advances that have been made in the basic principles of correct-by-construction algorithm design. The aim of this thesis is to provide educational material that shows how these advances can be used to support the teaching of mathematics and computing.
We rewrite material on elementary number theory and we show how the focus on the algorithmic content of the theory allows the systematisation of existing proofs and, more importantly, the construction of new knowledge in a practical and elegant way. For example, based on Euclid’s algorithm, we derive a new and efficient algorithm to enumerate the positive rational numbers in two different ways, and we develop a new and constructive proof of the two-squares theorem. Because the teaching of any subject can only be effective if the teacher has access to abundant and sufficiently varied educational material, we also include a catalogue of teaching scenarios. Teaching scenarios are fully worked out solutions to algorithmic problems together with detailed guidelines on the principles captured by the problem, how the problem is tackled, and how it is solved. Most of the scenarios have a recreational flavour and are designed to promote self-discovery by the students. Based on the material developed, we are convinced that goal-oriented, calculationa algorithmic skills can be used to enrich and reinvigorate the teaching of mathematics and computing.

Paulo J. 2009. Efficient storage of data in cloud computing. Abstractpp09.pdf

Keeping critical data safe and accessible from several locations has become a global preoccupation, either being this data personal, organizational or from applications. As a consequence of this issue, we verify the emergence of on-line storage services. In addition, there is the new paradigm of Cloud Computing, which brings new ideas to build services that allow users to store their data and run their applications in the Cloud. By doing a smart and efficient management of these services' storage, it is possible to improve the quality of service offered, as well as to optimize the usage of the infrastructure where the services run. This management is even more critical and complex when the infrastructure is composed by thousand of nodes running several virtual machines and sharing the same storage. The elimination of redundant data at these services' storage can be used to simplify and enhance this management.
This dissertation presents a solution to detect and eliminate duplicated data between virtual machines that run on the same physical host and write their virtual disks' data to a shared storage. A prototype that implements this solution is introduced and evaluated. Finally, a study that compares the efficiency of two different approaches used to eliminate redundant data in a personal data set is described.

Silva P. 2009. On the Design of a Galculator. Abstractthesis.pdf

The increased complexity of software systems together with the lack of tools and technique to support their development led to the so-called "software crisis". Different views about the problem originated diverse approaches to a possible solution, although it is now generally accepted that a "silver bullet" does not exist.
The formal methods view considers mathematical reasoning as fundamental to fulfill the most important property of software systems: correctness. However, since correctness proofs are generally difficult and expensive, only critical applications are regarded as potential targets for their use. Developments in tool support such as proof assistants, model checkers and abstract interpreters allows for reducing this cost and making proofs affordable to a wider range of applications. Nevertheless, the effectiveness of a tool is highly dependent of the underlying theory.
This dissertation follows a calculational proof style in which equality plays the fundamental role. Instead of the traditional logical approach, fork algebras, an extension of relation algebras, are used. In this setting, Galois connections are important because they reinforce the calculational nature of the algebraic approach, bringing additional structure to the calculus. Moreover, Galois connections enjoy several valuable proper- ties and allow for transformation in the domain of problems. In this dissertation, it is shown how fork algebras and Galois connections can be integrated together with the indirect equality principle. This combination offers a very powerful, generic device to tackle the complexity of proofs in program verification. This power is enhanced with the design of an innovative proof assistant prototype, the Galculator.

Paulo Jesus

Distributed Systems

Publications

Recent Publications