DocumentCode
1928689
Title
A Novel Clustering Algorithm for Prefix-Coded Data Stream Based upon Median-Tree
Author
Feng, Guangsheng ; Wang, Huiqiang ; Zhao, Qian ; Liang, Ying
Author_Institution
Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin
fYear
2008
fDate
28-29 Jan. 2008
Firstpage
79
Lastpage
84
Abstract
In actual data streams, there are lots of prefix-coded data, which widely existed in applications. What leads to non-ideal performance and clustering result is that the special treatment of these prefix-coded data structure is not considered in traditional clustering algorithm. To deal with this problem, a new concept of median-tree as well as a method of calculating the coding distance is proposed in this paper. Based upon this, a simple algorithm-dfCluster is put forward, which is capable of dealing with the prefix-coded data streams efficiently. Also, the algorithm analysis is presented in depth. At last, the designed experiment demonstrates that dfCluster is more efficient than the naive algorithm to cluster those kinds of data streams, and meanwhile, the performance of our algorithm is not limited by the specified value of k just as in algorithm k-means.
Keywords
pattern clustering; tree data structures; clustering algorithm; median-tree; prefix-coded data stream; prefix-coded data structure; Algorithm design and analysis; Clustering algorithms; Computer science; Data engineering; Data mining; Educational institutions; Internet; Noise shaping; Partitioning algorithms; Statistics; Clustering; Data Stream; Median-tree;
fLanguage
English
Publisher
ieee
Conference_Titel
Internet Computing in Science and Engineering, 2008. ICICSE '08. International Conference on
Conference_Location
Harbin
Print_ISBN
978-0-7695-3112-0
Electronic_ISBN
978-0-7695-3112-0
Type
conf
DOI
10.1109/ICICSE.2008.103
Filename
4548238
Link To Document