TOPiCo: detecting most frequent items from multiple high-rate event streams

Citation:
Oliveira R, Matos M, Felber P, Sutra P, Rivière E, Schiavoni V.  2015.  TOPiCo: detecting most frequent items from multiple high-rate event streams. DEBS 2015 - .

Date Presented:

July

Abstract:

Systems such as social networks, search engines or trading platforms operate geographically distant sites that continuously generate streams of events at high-rate. Such events can be access logs to web servers, feeds of messages from participants of a social network, or financial data, among others. The ability to timely detect trends and popularity variations is of paramount importance in such systems. In particular, determining what are the most popular events across all sites allows to capture the most relevant information in near real-time and quickly adapt the system to the load. This paper presents TOPiCo, a protocol that computes the most popular events across geo-distributed sites in a low cost, bandwidth-efficient and timely manner. TOPiCo starts by building the set of most popular events locally at each site. Then, it disseminates only events that have a chance to be among the most popular ones across all sites, significantly reducing the required bandwidth. We give a correctness proof of our algorithm and evaluate TOPiCo using a real-world trace of more than 240 million events spread across 32 sites. Our empirical results shows that (i) TOPiCo is timely and cost-efficient for detecting popular events in a large-scale setting, (ii) it adapts dynamically to the distribution of the events, and (iii) our protocol is particularly efficient for skewed distributions.

Citation Key:

2593

DOI:

10.1145/2675743.2771838

PreviewAttachmentSize
p58-schiavoni.pdf1005.82 KB