DocumentCode :
531655
Title :
Text Clustering via Term Semantic Units
Author :
Jing, Liping ; Yun, Jiali ; Yu, Jian ; Huang, Houkuan
Author_Institution :
Dept. of Comput. Sci., Beijing Jiaotong Univ., Beijing, China
Volume :
1
fYear :
2010
fDate :
Aug. 31 2010-Sept. 3 2010
Firstpage :
417
Lastpage :
420
Abstract :
How best to represent text data is an important problem in text mining tasks including information retrieval, clustering, classification and etc.. In this paper, we proposed a compact document representation with term semantic units which are identified from the implicit and explicit semantic information. Among it, the implicit semantic information is extracted from syntactic content via statistical methods such as latent semantic indexing and information bottleneck. The explicit semantic information is mined from the external semantic resource (Wikipedia). The proposed compact representation model can map a document collection in a low-dimension space (term semantic units which are much less than the number of all unique terms). Experimental results on real data sets have shown that the compact representation efficiently improve the performance of text clustering.
Keywords :
pattern clustering; statistical analysis; text analysis; Wikipedia; compact document representation model; explicit semantic information; external semantic resource; information retrieval; semantic information extraction; statistical methods; term semantic units; text clustering; compact representation; term semantic units; text clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4244-8482-9
Electronic_ISBN :
978-0-7695-4191-4
Type :
conf
DOI :
10.1109/WI-IAT.2010.23
Filename :
5616629
Link To Document :
بازگشت