DocumentCode :
3452973
Title :
New Cluster Detection Based on Multi-Representation Index Tree Text Clustering
Author :
Song, Hui ; Wang, Lifeng ; Li, Baiyan ; Liu, Xiaoqiang
Author_Institution :
Sch. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
fYear :
2010
fDate :
27-28 Nov. 2010
Firstpage :
1
Lastpage :
4
Abstract :
Traditional Clustering is a powerful technique for revealing the "hot" topics among documents. However, it\´s hard to discover the new type events coming out gradually. In this paper, we propose a novel model for detecting new clusters from time-streaming documents. It consists of three parts: the cluster definition based on Multi-Representation Index Tree (MI-Tree), the new cluster detecting process and the metrics for measuring a new cluster. Compared with the traditional method, we process the newly coming data first and merge the old clustering tree into the new one. This algorithm can avoid this effect: the documents enjoying high similarity were assigned to different clusters. We designed and implemented a system for practical application, the experimental results on a variety of domains demonstrate that our algorithm can recognize new valuable clusters during the iteration process, and produce quality clusters.
Keywords :
pattern clustering; text analysis; trees (mathematics); iteration process; multirepresentation index tree; text clustering; time streaming document; Accuracy; Algorithm design and analysis; Clustering algorithms; Indexes; Measurement; Merging; Peer to peer computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Technology and Applications (DBTA), 2010 2nd International Workshop on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-6975-8
Electronic_ISBN :
978-1-4244-6977-2
Type :
conf
DOI :
10.1109/DBTA.2010.5659018
Filename :
5659018
Link To Document :
بازگشت