Recent Publications

Jorge AM, Azevedo PJ. 2012. Optimal leverage association rules with numerical interval conditions. Intelligent Data Analysis. 16(1):25-47. Abstractida00509_publicado.pdf

In this paper we propose a framework for defining and discovering optimal association rules involving a numerical attribute A in the consequent. The consequent has the form of interval conditions A, A≥ x or A ∈ I where I is an interval or a set of intervals of the form [x_l,x_u. The optimality is with respect to leverage, one well known association rule interest measure. The generated rules are called Maximal Leverage Rules MLR and are generated from Distribution Rules. The principle for finding the MLR is related to the Kolmogorov-Smirnov goodness of fit statistical test. We propose different methods for MLR generation, taking into account leverage optimallity and readability. We theoretically demonstrate the optimality of the main exact methods, and measure the leverage loss of approximate methods. We show empirically that the discovery process is scalable.

Castro N, Azevedo PJ. 2012. Significant motifs in time series. Statistical Analysis and Data Mining. 5(1):35-53. Abstracttimeseriesmotifsstatisticalsignificance-castroazevedo.pdf

Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from finance to health. Many algorithms have been proposed for the task of efficiently finding motifs. Surprisingly, most of these proposals do not focus on how to evaluate the discovered motifs. They are typically evaluated by human experts. This is unfeasible even for moderately sized datasets, since the number of discovered motifs tends to be prohibitively large. Statistical significance tests are widely used in the data mining communities to evaluate extracted patterns. In this work we present an approach to calculate time series motifs statistical significance. Our proposal leverages work from the bioinformatics community by using a symbolic definition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique—statistical tests—to a time series setting. This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif.

Barros J B, Cruz D, Henriques PR, Pinto JS. 2012. Assertion-based slicing and slice graphs. Formal Aspects of Computing. 24:217–248. Abstractsefm2010specslicing-camera-ready.pdf

This paper revisits the idea of slicing programs based on their axiomatic semantics, rather than using criteria based on control/data dependencies. We show how the forward propagation of preconditions and the backward propagation of post conditions can be combined in a new slicing algorithm that is more precise than the existing specification-based algorithms. The algorithm is based on (i) a precise test for removable statements, and (ii) the construction of a slice graph, a program control flow graph extended with semantic labels. It improves on previous approaches in two aspects: it does not fail to identify removable commands; and it produces the smallest possible slice that can be obtained (in a sense that will be made precise). The paper also reviews in detail, through examples, the ideas behind the use of preconditions and post conditions for slicing programs.

Fernandes JP, Hattum-Janssen NV, Ribeiro AN, Fonte V, Santos LP, Sousa P. 2012. An integrated approach to develop professional and technical skills for informatics engineering students. European Journal of Engineering Education. 37(2):167–177. Abstract2012-ejee.pdf

Many of the current approaches used to teaching and learning in engineering education are not the most appropriate to prepare students for the challenges they will face in their professional careers. The active involvement of students in their learning process facilitates the development of the technical and professional competencies they need as professionals. This article describes the organisation and impact of a mini-conference and project work - the creation of a software product and its introduction in the market - aimed at the development of professional competencies in general and writing skills in particular. The course was evaluated by assessing the students’ perception of the development of a number of professional competencies through a questionnaire completed by 125 students of two consecutive editions. The results indicate that the project work and the mini-conference had a positive impact on students’ perceptions of the development of professional competencies.

Moreno CB, Almeida PS, Menezes R, Jesus P. 2012. Extrema propagation: Fast Distributed Estimation of Sums and Network Sizes. IEEE Transactions on Parallel and Distributed Systems. 23(4):668–675. Abstract10.1.1.65.6474.pdf

Aggregation of data values plays an important role on distributed computations, in particular, over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network size to dimension distributed hash tables or estimates of the average system load to direct load balancing. Distributed aggregation using nonidempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multipath routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps can be made just slightly above the theoretical minimum; and it is fully distributed, with no single point of failure and the result produced at every node.

Barbosa LS, Meng S. 2012. A Calculus for Generic, QoS-Aware Component Composition. Mathematics in Computer Science. 6(4):475-497. Abstractbm2012_1.pdf

Software QoS properties, such as response time, availability, bandwidth requirement, memory usage, among many others, play a major role in the processes of selecting and composing software components. This paper extends a component calculus to deal, in an effective way, with them. The calculus models components as generalised Mealy machines, i.e., state-based entities interacting along their life time through well defined interfaces of observers and actions. QoS is introduced through an algebraic structure specifying the relevant QoS domain and how its values are composed under different disciplines. A major effect of introducing QoS-awareness is that a number of equivalences holding in the plain calculus become refinement laws. The paper also introduces a prototyper for the calculus developed as a ‘proof-of-concept’ implementation.

Cunha A, Pacheco H, Hu Z. 2012. Delta Lenses over Inductive Types. Electronic Communications of the EASST. 49:1-17. Abstractbx12.pdf

Existing bidirectional languages are either state-based or operation-based, depending on whether they represent updates as mere states or as sequences of edit operations.In-between both worlds are delta-based frameworks, where updates are represented using alignment relationships between states.In this paper, we formalize delta lenses over inductive types using dependent type theory and develop a point-free delta lens language with an explicit separation of shape and data.In contrast with the already known issue of data alignment, we identify the new problem of shape alignment and solve it by lifting standard recursion patterns such as folds and unfolds to delta lenses that use alignment to infer meaningful shape updates.

Bonchi F, Bonsangue M, Boreale M, Rutten J, Silva A. 2012. A coalgebraic perspective on linear weighted automata. Information and Computation. 211(1):77–105. Abstractic-lwa.pdf

Weighted automata are a generalisation of non-deterministic automata where each transition, in addition to an input letter, has also a quantity expressing the weight (e.g. cost or probability) of its execution. As for non-deterministic automata, their behaviours can be expressed in terms of either (weighted) bisimilarity or (weighted) language equivalence.

Coalgebras provide a categorical framework for the uniform study of state-based systems and their behaviours. In this work, we show that coalgebras can suitably model weighted automata in two different ways: coalgebras on Set (the category of sets and functions) characterise weighted bisimilarity, while coalgebras on Vect (the category of vector spaces and linear maps) characterise weighted language equivalence.

Relying on the second characterisation, we show three different procedures for computing weighted language equivalence. The first one consists in a generalisation of the usual partition refinement algorithm for ordinary automata. The second one is the backward version of the first one. The third procedure relies on a syntactic representation of rational weighted languages.

Bonsangue M, Clarke D, Silva A. 2012. A model of context-dependent component connectors. Science of Computer Programming. 77(6):685–706. Abstract14628d_1.pdf

Recent approaches to component-based software engineering employ coordinat- ing connectors to compose components into software systems. For maximum flexibility and reuse, such connectors can themselves be composed, resulting in an expressive calculus of connectors whose semantics encompasses complex combinations of synchronisation, mutual exclusion, non-deterministic choice and state-dependent behaviour. A more expressive notion of connector includes also context-dependent behaviour, namely, whenever the choices a connector can take change non-monotonically as the context, given by the pending activity on its ports, changes. Context dependency can express notions of priority and inhibition. Capturing context-dependent behaviour in formal models is non-trivial, as it is unclear how to propagate context information through composition. In this paper we present an intuitive automata-based formal model of context- dependent connectors, and argue that it is superior to previous attempts at such a model for the coordination language Reo.

Hofer B, Wotawa F, Abreu R. 2012. AI for the win: improving spectrum-based fault localization. ACM SIGSOFT Software Engineering Notes. 37(6):1–8. Abstractwosq_2012.pdf

A considerable amount of time in software engineering is spent in debugging. In practice, mainly debugging tools which allow for executing a program step-by-step and setting break points are used. This debugging method is however very time consuming and cumbersome. There is a need for tools which undertake the task of narrowing down the most likely fault locations. These tools must complete this task with as little user interaction as possible and the results computed must be beneficial so that such tools appeal to programmers. In order to come up with such tools, we present three variants of the well-known spectrum-based fault localization technique that are enhanced by using methods from Artificial Intelligence. Each of the three combined approaches outperforms the underlying basic method concerning diagnostic accuracy. Hence, the presented approaches support the hypothesis that combining techniques from different areas is beneficial. In addition to the introduction of these techniques, we perform an empirical evaluation, discuss open challenges of debugging and outline possible solutions.

Riboira A, Rodrigues R, Abreu R. 2012. Integrating Interactive Visualizations of Automatic Debugging Techniques on an Integrated Development Environment. International Journal of Creative Interfaces and Computer Graphics (IJCICG). 3(2):42–59. Abstractcamera-ready.doc

Automated debugging techniques based on statistical analysis of historical test executions data have recently received considerable attention due to their diagnostic capabilities. However, the tools that materialize such techniques suffer from a common, rather important shortcoming: the lack of effective diagnostic reports' visualizations. This limitation prevents the wide adoption of such tools, as it is difficult to understand the diagnostic reports yielded by them. To fill this gap, the authors propose a framework for integrating interactive visualizations of automatic debugging reports in a popular development environment namely, the Eclipse integrated development environment. The framework, coined GZoltar, provides several important features to aid the developer's efficiency to find the root cause of observed failures quickly, such as direct links to the source code editor. Furthermore, the authors report on the results of a user study conducted to assess GZoltar's effectiveness.

Wong PYH, Albert E, Muschevici R, Proença J, Schäfer J, Schlatte R. 2012. The ABS tool suite: modelling, executing and analysing distributed adaptable object-oriented systems. STTT - Software Tools for Technology Transfer. 14(5):567–588. Abstract1-original.pdf

Modern software systems must support a high degree of variability to accommodate a wide range of requirements and operating conditions. This paper introduces the Abstract Behavioural Specication (ABS) language and tool suite, a comprehensive platform for developing and analysing highly adaptable distributed concurrent software systems. The ABS language has a hybrid functional and object-oriented core, and comes with extensions that support the development of systems that are adaptable to diversied requirements, yet capable to maintain a high level of trustworthiness. Using ABS, system variability is consistently traceable from the level of requirements engineering down to object behaviour. This facilitates temporal evolution, as changes to the required set of features of a system are automatically re ected by functional adaptation of the system's behaviour. The analysis capabilities of ABS stretch from debugging, observing and simulating to resource analysis of ABS models and help ensure that a system will remain dependable throughout its evolutionary lifetime. We report on the experience of using the ABS language and the ABS tool suite in an industrial case study.

Avvenuti M, Bernardeschi C, Francesco ND, Masci P. 2012. JCSI: A tool for checking secure information flow in Java Card applications. Journal of Systems and Software. 85:2479-2493. Abstract

This paper describes a tool for checking secure information flow in Java Card applications. The tool performs a static analysis of Java Card CAP files and includes a CAP viewer. The analysis is based on the theory of abstract interpretation and on a multi-level security policy assignment. Actual values of variables are abstracted into security levels, and bytecode instructions are executed over an abstract domain. The tool can be used for discovering security issues due to explicit or implicit information flows and for checking security properties of Java Card applications downloaded from untrusted sources.

Jin D, Yang B, Moreno CB, Liu D, He D, Liu J. 2011. A Markov random walk under constraint for discovering overlapping communities in complex networks. Journal of Statistical Mechanics: Theory and Experiment. 2011(P05031):21. Abstract1303.5675.pdf

Detection of overlapping communities in complex networks has motivated recent research in the relevant fields. Aiming this problem, we propose a Markov dynamics based algorithm, called UEOC, which means, 'unfold and extract overlapping communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the corresponding annealed network (degree conserving random network), is performed to unfold the community. Then, a cutoff criterion with the aid of a local community function, called conductance, which can be thought of as the ratio between the number of edges inside the community and those leaving it, is presented to extract this emerged community from the entire network. The UEOC algorithm depends on only one parameter whose value can be easily set, and it requires no prior knowledge on the hidden community structures. The proposed UEOC has been evaluated both on synthetic benchmarks and on some real-world networks, and was compared with a set of competing algorithms. Experimental result has shown that UEOC is highly effective and efficient for discovering overlapping communities.

Shapiro M, Preguiça N, Moreno CB, Zawirsky M, Fatourou P, others. 2011. Convergent and Commutative Replicated Data Types. Bulletin of the European Association for Theorical Computer Science. :67–88. Abstract120-477-1-pb.pdf

Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs and monotonic DAGs. It discusses some properties needed to implement non-trivial CRDTs.