• DocumentCode
    2835775
  • Title

    Application of Compound Gaussian Mixture Model clustering in the data stream

  • Author

    Ming-ming Gao ; Ji-Zhen Liu ; Xiang-xiang Gao

  • Author_Institution
    Sch. of Control & Comput. Eng., North China Electr. Power Univ., Beijing, China
  • Volume
    7
  • fYear
    2010
  • fDate
    22-24 Oct. 2010
  • Abstract
    The characteristics of data stream are infinite data and quick stream speed. Clustering modeling is an important method which link to the effect of clustering technology. A nice modeling method impacts on the performance of data stream mining system. In this paper put forward a model which named Compound Gaussian Mixture Model (CGMM) and the clustering algorithm of CGMM which combines classical GMM clustering algorithm. In the paper also put forward the EM algorithm based on Compound Gaussian Mixture Model with the added labeled samples and help the initial parameters to been studied. The algorithm also can find the overlap between the Gaussian distribution and then merge them. And EM is used to initialize Compound Gaussian Mixture Model clustering algorithm. Semi-supervised clustering uses some of labeled data to help clustering analysis. The experimental results demonstrate that the algorithm increases the recognition rate for samples compared with the unsupervised study and have good clustering ability. We compare the Compound Gaussian Mixture Model Clustering algorithm and Clustream algorithm. From the results, we conclude that CGMM based on clustering algorithm has higher performance than classic Clustream algorithm. And the experimental results show that the algorithm is effective to solve data stream clustering.
  • Keywords
    Gaussian distribution; data mining; expectation-maximisation algorithm; learning (artificial intelligence); pattern clustering; CGMM; EM algorithm; clustream algorithm; compound Gaussian mixture model clustering; data stream mining system; semi-supervised clustering; Compounds; Probability; Compound Gaussian Mixture Model; Semi - supervised clustering; clustering; data stream;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Application and System Modeling (ICCASM), 2010 International Conference on
  • Conference_Location
    Taiyuan
  • Print_ISBN
    978-1-4244-7235-2
  • Electronic_ISBN
    978-1-4244-7237-6
  • Type

    conf

  • DOI
    10.1109/ICCASM.2010.5620507
  • Filename
    5620507