• DocumentCode
    128750
  • Title

    Implementation of time series data clustering based on SVD for stock data analysis on hadoop platform

  • Author

    Yonghong Xie ; Wulamu, Aziguli ; Yantao Wang ; Zheng Liu

  • Author_Institution
    Sch. of Comput. & Commun. Eng., Univ. of Sci. & Technol. Beijing (USTB), Beijing, China
  • fYear
    2014
  • fDate
    9-11 June 2014
  • Firstpage
    2007
  • Lastpage
    2010
  • Abstract
    With a growing amount of data, a viable solution is to use a cluster consisting of a large of computers for parallel processing, and Hadoop parallel computing platform is a typical representative. Clustering analysis for time series data is one of the main methods mining time series data, however, general clustering algorithms can´t perform clustering for time series data directly since series data has a special structure. The time series clustering algorithm presented is a combining algorithm from algorithms of Canopy and K-means based on SVD. Using singular value decomposition for feature extraction from the time series data, and then use Canopy and K-means algorithms to clustering analysis the feature data of the time series, at last, the algorithm is implemented on Hadoop platform by Mahout leading to a new clustering method that can handle massive time series data. Finally, this new clustering analysis method is successfully applied to real stock time series data with a satisfactory result.
  • Keywords
    data mining; feature extraction; parallel processing; pattern clustering; singular value decomposition; time series; Canopy algorithm; Hadoop parallel computing platform; K-means algorithm; SVD; feature data clustering analysis; feature extraction; hadoop platform; massive time series data; parallel processing; singular value decomposition; stock data analysis; time series data clustering; time series data mining; Algorithm design and analysis; Clustering algorithms; Clustering methods; Data mining; Matrix decomposition; Time series analysis; Vectors; SVD; clustering analysis; hadoop; k-means; mahout; stock data; time series data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics and Applications (ICIEA), 2014 IEEE 9th Conference on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4799-4316-6
  • Type

    conf

  • DOI
    10.1109/ICIEA.2014.6931498
  • Filename
    6931498