DocumentCode :
907876
Title :
Optimizing similarity search for arbitrary length time series queries
Author :
Kahveci, Tamer ; Singh, Ambuj K.
Author_Institution :
Dept. of Comput. Sci., California Univ., Santa Barbara, CA, USA
Volume :
16
Issue :
4
fYear :
2004
fDate :
4/1/2004 12:00:00 AM
Firstpage :
418
Lastpage :
433
Abstract :
We consider the problem of finding similar patterns in a time sequence. Typical applications of this problem involve large databases consisting of long time sequences of different lengths. Current time sequence search techniques work well for queries of a prespecified length, but not for arbitrary length queries. We propose a novel indexing technique that works well for arbitrary length queries. The proposed technique stores index structures at different resolutions for a given data set. We prove that this index structure is superior to existing index structures that use a single resolution. We propose a range query and nearest neighbor query technique on this index structure and prove the optimality of our index structure for these search techniques. The experimental results show that our method is 4 to 20 times faster than the current techniques, including sequential scan, for range queries and 3 times faster than sequential scan and other techniques for nearest neighbor queries. Because of the need to store information at multiple resolution levels, the storage requirement of our method could potentially be large. In the second part, we show how the index information can be compressed with minimal information loss. According to our experimental results, even after compressing the size of the index to one fifth, the total cost of our method is 3 to 15 times less than the current techniques.
Keywords :
database indexing; optimisation; query formulation; query processing; storage management; time series; very large databases; arbitrary length time series queries; data set; index structures; indexing technique; large databases; multiple resolutions; nearest neighbor query technique; range query technique; sequential scan; similarity search optimization; storage requirement; subsequence search; Bioinformatics; Biosensors; Costs; Databases; Economic forecasting; Genomics; Indexing; Nearest neighbor searches; Stock markets; Weather forecasting;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2004.1269667
Filename :
1269667
Link To Document :
بازگشت