Author_Institution :
Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., Clear Water Bay, Hong Kong
Abstract :
Due to the wide usage of stream time series, an efficient and effective similarity search over stream data becomes essential for many applications. Although many approaches have been proposed for searching through archived data, because of the unique characteristics of the stream, for example, data are frequently updated, traditional methods may not work for the stream time series. Especially, for the cases where the arrival of data is often delayed for various reasons, for example, the communication congestion or batch processing and so on, queries on such incomplete time series or even future time series may result in inaccuracy. Therefore, in this paper we propose two approaches, polynomial and probabilistic, to predict the unknown values that have not arrived at the system. We also present efficient indexes, that is, a multidimensional hash index and B+-tree, to facilitate the prediction and similarity search on future time series, respectively. Extensive experiments demonstrate the efficiency and effectiveness of our methods in terms of I/O, prediction and query accuracy
Keywords :
database indexing; polynomials; probability; query formulation; time series; tree data structures; tree searching; B-tree; multidimensional hash index; query accuracy; similarity search; stream time series; Accuracy; Application software; Computer science; Costs; Data mining; Delay effects; Monitoring; Polynomials; Sensor phenomena and characterization; Sensor systems;