Title :
Research on Text Clustering Based on Concept Weight
Author :
Li, Yuqin ; Lv, Xueqiang ; Liu, Yufang ; Shi, Shuicai
Author_Institution :
Chinese Inf. Process. Res. Center, Beijing Inf. Sci. & Technol. Univ., Beijing, China
Abstract :
Through research on the calculation method of feature words´ weight in texts and semantic similarity between words, we proposed a calculation method of feature words´ weight based on concept weight for the semantic association phenomenon of text features and the prevalence of high-dimensional problem in a text vector space model. This method reduces the semantic loss of the feature set and the dimension of the text vector, and then makes the text vector space model better and improves the quality of text clustering. Experimental results show the feasibility of the method, and prove that concept-weight-based text clustering increased by 22 percentage points or so than non-concept-weight-based in the final evaluation of the FI index value.
Keywords :
feature extraction; pattern clustering; set theory; text analysis; word processing; concept weight-based text clustering; feature set; feature word; semantic association phenomenon; text vector space model; Data mining; Data models; Electronic mail; Feature extraction; Information processing; Information science; Semantics; Concept Document Frequency; Concept Frequency; Concept Weight; Text Clustering;
Conference_Titel :
Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4244-8891-9
Electronic_ISBN :
978-0-7695-4281-2
DOI :
10.1109/ICGEC.2010.64