DocumentCode
2334722
Title
Distance measures for effective clustering of ARIMA time-series
Author
Kalpakis, Konstantinos ; Gada, Dhiral ; Puttagunta, Vasundhara
Author_Institution
Dept. of Comput. Sci. & Electron. Eng., Maryland Univ., Baltimore, MD, USA
fYear
2001
fDate
2001
Firstpage
273
Lastpage
280
Abstract
Much environmental and socioeconomic time-series data can be adequately modeled using autoregressive integrated moving average (ARIMA) models. We call such time series "ARIMA time series". We propose the use of the linear predictive coding (LPC) cepstrum for clustering ARIMA time series, by using the Euclidean distance between the LPC cepstra of two time series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desired features for accurate clustering and efficient indexing of ARIMA time series. For example, just a few LPC cepstral coefficients are sufficient in order to discriminate between time series that are modeled by different ARIMA models. In fact, this approach requires fewer coefficients than traditional approaches, such as DFT (discrete Fourier transform) and DWT (discrete wavelet transform). The proposed distance measure can be used for measuring the similarity between different ARIMA models as well. We cluster ARIMA time series using the "partition around medoids" method with various similarity measures. We present experimental results demonstrating that, using the proposed measure, we achieve significantly better clusterings of ARIMA time series data as compared to clusterings obtained by using other traditional similarity measures, such as DFT, DWT, PCA (principal component analysis), etc. Experiments were performed both on simulated and real data
Keywords
autoregressive moving average processes; cepstral analysis; data mining; economic cybernetics; environmental factors; linear predictive coding; pattern clustering; social sciences; socio-economic effects; temporal databases; time series; ARIMA time-series clustering; Euclidean distance; LPC cepstral coefficients; autoregressive integrated moving average; dissimilarity measure; distance measure; environmental data; indexing; linear predictive coding; partition-around-medoids method; similarity measures; socioeconomic data; Cepstral analysis; Cepstrum; Discrete Fourier transforms; Discrete wavelet transforms; Euclidean distance; Fourier transforms; Indexing; Linear predictive coding; Principal component analysis; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location
San Jose, CA
Print_ISBN
0-7695-1119-8
Type
conf
DOI
10.1109/ICDM.2001.989529
Filename
989529
Link To Document