• DocumentCode
    1928689
  • Title

    A Novel Clustering Algorithm for Prefix-Coded Data Stream Based upon Median-Tree

  • Author

    Feng, Guangsheng ; Wang, Huiqiang ; Zhao, Qian ; Liang, Ying

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin
  • fYear
    2008
  • fDate
    28-29 Jan. 2008
  • Firstpage
    79
  • Lastpage
    84
  • Abstract
    In actual data streams, there are lots of prefix-coded data, which widely existed in applications. What leads to non-ideal performance and clustering result is that the special treatment of these prefix-coded data structure is not considered in traditional clustering algorithm. To deal with this problem, a new concept of median-tree as well as a method of calculating the coding distance is proposed in this paper. Based upon this, a simple algorithm-dfCluster is put forward, which is capable of dealing with the prefix-coded data streams efficiently. Also, the algorithm analysis is presented in depth. At last, the designed experiment demonstrates that dfCluster is more efficient than the naive algorithm to cluster those kinds of data streams, and meanwhile, the performance of our algorithm is not limited by the specified value of k just as in algorithm k-means.
  • Keywords
    pattern clustering; tree data structures; clustering algorithm; median-tree; prefix-coded data stream; prefix-coded data structure; Algorithm design and analysis; Clustering algorithms; Computer science; Data engineering; Data mining; Educational institutions; Internet; Noise shaping; Partitioning algorithms; Statistics; Clustering; Data Stream; Median-tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet Computing in Science and Engineering, 2008. ICICSE '08. International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-0-7695-3112-0
  • Electronic_ISBN
    978-0-7695-3112-0
  • Type

    conf

  • DOI
    10.1109/ICICSE.2008.103
  • Filename
    4548238