Time Series Motifs Statistical Significance

Citation:
Castro N, Azevedo PJ.  2011.  Time Series Motifs Statistical Significance. Proceedings of the Eleventh SIAM International Conference on Data Mining - SDM. :687-698.

Date Presented:

April

Abstract:

Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from nance to health. Many algorithms have been proposed for the task of eciently nding motifs. Surprisingly, most of these proposals do not focus on how to evaluate the discovered motifs. They are typically evaluated by human experts. This is unfeasible even for moderately sized datasets, since the number of discovered motifs tends to be prohibitively large. Statistical signi cance tests are widely used in bioinformatics and association rules mining communities to evaluate the extracted patterns. In this work we present an approach to calculate time series motifs statistical signi cance. Our proposal leverages work from the bioinformatics community by using a symbolic de nition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique - statistical tests - to a time series setting.This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif.

Citation Key:

DBLP:conf/sdm/CastroA11

DOI:

10.1.1.225.9334

PreviewAttachmentSize
statmotifs.pdf671.68 KB