Author_Institution :
Dept. of Comput. Sci., Vermont Univ., Burlington, VT, USA
Abstract :
In many applications, local or remote sensors send in streams of data, and the system needs to monitor the streams to discover relevant events/patterns and deliver instant reaction correspondingly. An important scenario is that the incoming stream is a continually appended time series, and the patterns are time series in a database. At each time when a new value arrives (called a time position), the system needs to find, from the database, the nearest or near neighbors of the incoming time series up to the time position. This paper attacks the problem by using fast Fourier transform (FFT) to efficiently find the cross correlations of time series, which yields, in a batch mode, the nearest and near neighbors of the incoming time series at many time positions. To take advantage of this batch processing in achieving fast response time, this paper uses prediction methods to predict future values. When the prediction length is long, FFT is used to compute the cross correlations of the predicted series (with the values that have already arrived) and the database patterns, and to obtain predicted distances between the incoming time series at many future time positions and the database patterns. If the prediction length is short, the direct computation method is used to obtain these predicted distances to avoid the overhead of using FFT. When the actual data value arrives, the prediction error together with the predicted distances is used to filter out patterns that are not possible to be the nearest or near neighbors, which provides fast responses. Experiments show that with reasonable prediction errors, the performance gain is significant. Especially, when the long term predictions are available, the proposed method can handle incoming data at a very fast streaming rate.
Keywords :
data mining; fast Fourier transforms; query processing; temporal databases; data mining; data stream processing; database patterns; fast Fourier transform; nearest neighbor search; similarity search; similarity-based query; time series analysis; time series streaming; Application software; Computer Society; Computerized monitoring; Databases; Delay; Fast Fourier transforms; Filters; Prediction methods; Remote monitoring; Stock markets; Index Terms- Similarity search; continuous query.; data mining; data stream processing; nearest neighbor search; time series analysis;