Publications

Kozen D, Silva A.  2012.  Practical Coinduction. :1-22. Abstractstructural.pdf

Induction is a well-established proof principle that is taught in most undergraduate programs in mathematics and computer science. In computer science, it is used primarily to reason about inductively-defined datatypes such as finite lists, finite trees, and the natural numbers. Coinduction is the dual principle that can be used to reason about coinductive datatypes such as infinite streams or trees, but it is not as widespread or as well understood. In this paper, we illustrate through several examples the use of coinduction in informal mathematical arguments. Our aim is to promote the principle as a useful tool for the working mathematician and to bring it to a level of familiarity on par with induction. We show that coinduction is not only about bisimilarity and equality of behaviors, but also applicable to a variety of functions and relations defined on coinductive datatypes.

Jeannin J-B, Kozen D, Silva A.  2012.  CoCaml: Programming with Coinductive Types. :1-24. Abstractcocaml.pdf

We present CoCaml, a functional programming language extending OCaml, which allows us to define functions on coinductive datatypes parameterized by an equation solver. We provide numerous examples that attest to the usefulness of the new programming constructs, including operations on infinite lists, infinitary lambda-terms and p-adic numbers.

Jeannin J-B, Kozen D, Silva A.  2012.  Language constructs for non-well-founded computation. :1-12. Abstractnwf.pdf

Recursive functions defined on a coalgebraic datatype may not converge if there are cycles in the input, that is, if the input object is not well-founded. Even so, there is often a useful solution; for example, the free variables of an infinitary lambda-term, or the expected running time of a finite-state probabilistic protocol. Theoretical models of recursion schemes have been well studied under the names well-founded coalgebras, recursive coalgebras, corecursive algebras, and Elgot algebras. Much of this work focuses on conditions ensuring unique or canonical solutions, e.g.\ when the domain is well-founded. If the domain is not well-founded, there can be multiple solutions. The standard semantics of recursive programs gives a particular solution, namely the least solution in a flat Scott domain, which may not be the one we want. Unfortunately, current programming languages provide no support for specifying alternatives. In this paper we give numerous examples in which it would be useful to do so and propose programming language constructs that would allow the specification of alternative solutions and methods to compute them. The programmer must essentially provide a way to solve equations in the codomain. We show how to implement these new constructs as an extension of existing languages. We also prove some theoretical results characterizing well-founded coalgebras that slightly extend results of Adamek, Luecke, and Milius (2007).

Shapiro M, Preguiça N, Moreno CB, Zawirsky M, others.  2011.  A comprehensive study of convergent and commutative replicated data types. :47. Abstracttechreport.pdf

Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs,
and sequences. It discusses some properties needed to implement non-trivial CRDTs.

Shapiro M, Preguiça N, Moreno CB, Zawirsky M.  2011.  Conflict-free replicated data types. Stabilization, Safety, and Security of Distributed Systems. :386–400. Abstractrr-7687.pdf

Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in largescale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflictfree Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro M.  2011.  Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution. :513-527. Abstractmdfu.pdf

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Jesus P, Moreno CB, Almeida PS.  2011.  A Survey of Distributed Data Aggregation Algorithms. Arxiv preprint arXiv:1110.0725. :45. Abstract1110.0725.pdf

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others.
In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task.
This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

Macedo N, Cunha A.  2011.  Automatic unbounded verification of Alloy specifications with Prover 9. CoRR. abs/1209.5773:17. Abstract1209.5773v1.pdf

Alloy is an increasingly popular lightweight specification language based on relational logic. Alloy models can be automatically verified within a bounded scope using off-the-shelf SAT solvers. Since false assertions can usually be disproved using small counter-examples, this approach suffices for most applications. Unfortunately, it can sometimes lead to a false sense of security, and in critical applications a more traditional unbounded proof may be required. The automatic theorem prover Prover9 has been shown to be particularly effective for proving theorems of relation algebras, a quantifier-free (or point-free) axiomatization of a fragment of relational logic. In this paper we propose a translation from Alloy specifications to fork algebras (an extension of relation algebras with the same expressive power as relational logic) which enables their unbounded verification in Prover9. This translation covers not only logic assertions, but also the structural aspects (namely type declarations), and was successfully implemented and applied to several examples.

Oliveira JN.  2011.  Pointfree Foundations for (Generic) Lossless Decomposition. :64. Abstracthaslab-tr201103.pdf

This report presents a typed, “pointfree” generalization of relational data depen- dency theory expressed not in the standard set-theoretic way, “a` la Codd”, but in the calculus of binary relations which, initiated by De Morgan in the 1860s, is the core of modern algebra of programming.
Contrary to the intuition that a binary relation is just a particular case of an n- ary relation, this report shows the effectiveness of the former in “explaining” and reasoning about the latter. Data dependency theory, which is central to relational database design, is addressed in pointfree calculational style instead of reasoning about (sets of) tuples in conventional “implication-first” logic style.
It turns out that the theory becomes more general, more structured and sim- pler. Elegant expressions replace lengthy formulæ and easy-to-follow calculations replace pointwise proofs with lots of “· · ·” notation, case analysis and natural lan- guage explanations for “obvious” steps. In particular, attributes are generalized to arbitrary (observation) functions and the principle of lossless decomposition is established for arbitrary such functions.
The report concludes by showing how the proposed generalization of data dependency theory paves the way to interesting synergies with other branches of computer science, namely formal modeling and transition systems theory. A number of open topics for research in the field are proposed as future work.

Bonchi F, Bonsangue M, Rutten J, Silva A.  2011.  Brzozowski's algorithm (co)algebraically. Software Engineering [SEN]. :1-13. Abstract18796d.pdf

We give a new presentation of Brzozowski's algorithm to minimize finite automata, using elementary facts from universal algebra and coalgebra, and building on earlier work by Arbib and Manes on the duality between reachability and observability. This leads to a simple proof of its correctness and opens the door to further generalizations.

Kozen D, Silva A.  2011.  Left-handed completeness. Computing and Information Science Technical Reports. :1-18. Abstractleft_1.pdf

We give a new, significantly shorter proof of the completeness of the left-handed star rule of Kleene algebra. The proof reveals the rich interaction of algebra and coalgebra in the theory.

Kozen D, Silva A.  2011.  On Moessner's theorem. :1-8. Abstractpascal.pdf

Moessner's theorem describes a procedure for generating a sequence of n integer sequences that lead unexpectedly to the sequence of nth powers 1n, 2n, 3n, ... Paasche's theorem is a generalization of Moessner's; by varying the parameters of the procedure, one can obtain the sequence of factorials 1!, 2!, 3!, ... or the sequence of superfactorials 1!!, 2!!, 3!!, ... Long's theorem generalizes Moessner's in another direction, providing a procedure to generate the sequence a1n-1, (a+d)2n-1, (a+2d)3n-1, ... Proofs of these results in the literature are typically based on combinatorics of binomial coefficients or calculational scans. In this note we give a short and revealing algebraic proof of a general theorem that contains Moessner's, Paasche's, and Long's as special cases. We also prove a generalization that gives new Moessner-type theorems.

Bonsangue M, Caltais G, Goriac E, Lucanu D, Rutten J, Silva A.  2011.  Automatic equivalence proofs for non-deterministic coalgebras (revised and extended). :1-38. Abstract18793d.pdf

A notion of generalized regular expressions for a large class of systems modeled as coalgebras, and an analogue of Kleene’s theorem and Kleene algebra, were recently proposed by a subset of the authors of this paper. Examples of the systems covered include infinite streams, deterministic automata, Mealy machines and labelled transition sytems. In this paper, we present a novel algorithm to decide whether two expressions are bisimilar or not. The procedure is implemented in the automatic theorem prover CIRC, by reducing coinduction to an entailment relation between an algebraic specification and an appropriate set of equations. We illustrate the generality of the tool with three examples: infinite streams of real numbers, Mealy machines and labelled transition systems.

Bonchi F, Bonsangue M, Boreale M, Rutten J, Silva A.  2011.  A coalgebraic perspective on linear weighted automata. :1-31. Abstract18069a.pdf

Weighted automata are a generalization of non-deterministic automata where each transition, in addition to an input letter, has also a quantity expressing the weight (e.g. cost or probability) of its execution. As for non-deterministic automata, their behaviours can be expressed in terms of either (weighted) bisimilarity or (weighted) language equivalence.

Coalgebras provide a categorical framework for the uniform study of state-based systems and their behaviours.
In this work, we show that coalgebras can suitably model weighted automata in two different ways: coalgebras on Set (the category of sets and functions) characterize weighted bisimilarity, while coalgebras on Vect (the category of vector spaces and linear maps) characterize weighted language equivalence.

Relying on the second characterization, we show three different procedures for computing weighted language equivalence. The first one consists in a generalization of the usual partition refinement algorithm for ordinary automata. The second one is the backward version of the first one. The third procedure relies on a syntactic representation of rational weighted languages.

Bonchi F, Hülsbusch M, König B, Silva A.  2011.  A coalgebraic perspective on minimization, determinization and behavioural metrics. :1-23. Abstract18079d_1.pdf

Coalgebra offers a unified theory of state based systems, including infinite streams, labelled transition systems and deterministic automata. In this paper, we use the coalgebraic view on systems to derive, in a uniform way, abstract procedures for checking behavioural equivalence in coalgebras, which perform (a combination of) minimization and determinization in the system. First, we show that for coalgebras on categories equipped with factorization structures, there exists an abstract procedure for equivalence checking. For instance, when considering epi-mono factorizations in the category of sets and functions, this procedure corresponds to the usual (coalgebraic) minimization algorithm and two states are behaviourally equivalent iff they are mapped to the same state in the minimized coalgebra. Second, motivated by several examples, we consider coalgebras on categories without suitable factorization structures: under certain conditions, it is possible to apply the above procedure after transforming coalgebras with reflections. This transformation can be thought of as some kind of determinization. Finally, we show that the result of the procedure also induces a pseudo-metric on the states, in such a way that the distance between each pair of states is minimized.