DocumentCode
2042101
Title
A feature selection method for document clustering based on part-of-speech and word co-occurrence
Author
Liu, Zitao ; Yu, Wenchao ; Deng, Yalan ; Wang, Yongtao ; Bian, Zhiqi
Author_Institution
Int. Sch. of Software, Wuhan Univ., Wuhan, China
Volume
5
fYear
2010
fDate
10-12 Aug. 2010
Firstpage
2331
Lastpage
2334
Abstract
Feature selection is a process which chooses a subset from the original feature set according to some rules. The selected feature retains original physical meaning and provides a better understanding for the data and learning process. However, few modern feature selection approaches take the advantage of features´ context information. Based on this analysis, we propose a novel feature selection method based on part-of-speech and word co-occurrence. According the components of Chinese document text, we utilize the words´ part-of-speech attributes to filter lots of meaningless terms. Then we define and use co-occurrence words by their part-of-speech to select features. In the evaluating process, we use the text corpus from Sogou Lab to do some experiments and use Entropy and Precision as criteria to give an objective evaluation of document clustering performance. The results show that our method can select better features and get a more pleasant clustering performance.
Keywords
feature extraction; pattern clustering; speech synthesis; text analysis; unsupervised learning; word processing; Chinese document; Sogou lab; context information; document clustering; feature selection method; learning process; part of speech; text corpus; word co-occurrence; Context; Educational institutions; Entropy; Feature extraction; Machine learning; Software; Speech; document clustering; feature selection; part-ofspeech; word co-occurrence;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location
Yantai, Shandong
Print_ISBN
978-1-4244-5931-5
Type
conf
DOI
10.1109/FSKD.2010.5569827
Filename
5569827
Link To Document