DocumentCode :
2167973
Title :
Topic detection based on K-means
Author :
Zhang, Dan ; Li, Shengdong
Author_Institution :
Dept. of Comput. Sci. & Technol., Mudanjiang Normal Univ., Mudanjiang, China
fYear :
2011
fDate :
9-11 Sept. 2011
Firstpage :
2983
Lastpage :
2985
Abstract :
Essential difference between topic detection and text clustering is distribution of news corpus and time characteristics of news corpus. So we should study topic detection according to the news corpus, and it is necessary for news corpus to be in-depth and extensive research. Vector space model (VSM) is one of the most simple and effective topics representation model. And K-means is a well-known and widely used partitional clustering method. Therefore, we do a topic detection experiment to study how news corpus and K-means affect topic detection. Then we get the variation law that they affect topic detection, and add up their optimal values in topic detection. Finally, TDT evaluation methods prove that the optimal topic detection overall performance in topic detection experiment based on large-scale corpus enhances by 38.378% more than topic detection based on small-scale corpus. This experiment shows that topic detection based on K-means is suited to deal with large-scale data.
Keywords :
pattern clustering; text analysis; K-means; TDT evaluation method; news corpus; optimal topic detection; partitional clustering method; text clustering; topics representation model; vector space model; Algorithm design and analysis; Clustering algorithms; Computer architecture; Educational institutions; Feature extraction; Vectors; k-means; news corpus; tdt evaluation; topic detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electronics, Communications and Control (ICECC), 2011 International Conference on
Conference_Location :
Ningbo
Print_ISBN :
978-1-4577-0320-1
Type :
conf
DOI :
10.1109/ICECC.2011.6066301
Filename :
6066301
Link To Document :
بازگشت