Title :
An indexing scheme for fast similarity search in large time series databases
Author :
Keogh, Eamonn J. ; Pazzani, Michael J.
Author_Institution :
Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA
Abstract :
Addresses the problem of similarity searching in large time-series databases. We introduce a novel indexing algorithm that allows faster retrieval. The index is formed by creating bins that contain time series subsequences of approximately the same shape. For each bin, we can quickly calculate a lower bound on the distance between a given query and the most similar element of the bin. This bound allows us to search the bins in best-first order, and to prune some bins from the search space without having to examine the contents. Additional speedup is obtained by optimizing the data within the bins such that we can avoid having to compare the query to every item in the bin. We call our approach STB (Shape To Bit-vector) indexing, and experimentally validate it on space telemetry, medical and synthetic data, demonstrating approximately an order-of-magnitude speedup
Keywords :
database indexing; database theory; medical information systems; query processing; software performance evaluation; space telemetry; statistical databases; time series; very large databases; STB indexing; best-first search; bins; data optimization; fast retrieval; indexing algorithm; large time series databases; lower bound; medical data; query distance; search space pruning; shape-to-bit vector indexing; similarity searching; space telemetry data; speedup; synthetic data; time series subsequences; Computer science; Databases; Electronic switching systems; Extraterrestrial measurements; Indexing; Medical diagnostic imaging; Read only memory; Shape; Telemetry; Time measurement;
Conference_Titel :
Scientific and Statistical Database Management, 1999. Eleventh International Conference on
Conference_Location :
Cleveland, OH
Print_ISBN :
0-7695-0046-3
DOI :
10.1109/SSDM.1999.787621