• DocumentCode
    2334722
  • Title

    Distance measures for effective clustering of ARIMA time-series

  • Author

    Kalpakis, Konstantinos ; Gada, Dhiral ; Puttagunta, Vasundhara

  • Author_Institution
    Dept. of Comput. Sci. & Electron. Eng., Maryland Univ., Baltimore, MD, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    273
  • Lastpage
    280
  • Abstract
    Much environmental and socioeconomic time-series data can be adequately modeled using autoregressive integrated moving average (ARIMA) models. We call such time series "ARIMA time series". We propose the use of the linear predictive coding (LPC) cepstrum for clustering ARIMA time series, by using the Euclidean distance between the LPC cepstra of two time series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desired features for accurate clustering and efficient indexing of ARIMA time series. For example, just a few LPC cepstral coefficients are sufficient in order to discriminate between time series that are modeled by different ARIMA models. In fact, this approach requires fewer coefficients than traditional approaches, such as DFT (discrete Fourier transform) and DWT (discrete wavelet transform). The proposed distance measure can be used for measuring the similarity between different ARIMA models as well. We cluster ARIMA time series using the "partition around medoids" method with various similarity measures. We present experimental results demonstrating that, using the proposed measure, we achieve significantly better clusterings of ARIMA time series data as compared to clusterings obtained by using other traditional similarity measures, such as DFT, DWT, PCA (principal component analysis), etc. Experiments were performed both on simulated and real data
  • Keywords
    autoregressive moving average processes; cepstral analysis; data mining; economic cybernetics; environmental factors; linear predictive coding; pattern clustering; social sciences; socio-economic effects; temporal databases; time series; ARIMA time-series clustering; Euclidean distance; LPC cepstral coefficients; autoregressive integrated moving average; dissimilarity measure; distance measure; environmental data; indexing; linear predictive coding; partition-around-medoids method; similarity measures; socioeconomic data; Cepstral analysis; Cepstrum; Discrete Fourier transforms; Discrete wavelet transforms; Euclidean distance; Fourier transforms; Indexing; Linear predictive coding; Principal component analysis; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989529
  • Filename
    989529