DocumentCode
2259881
Title
Document representation combining concepts and words in Chinese text categorization
Author
Che, Chao ; Teng, Hongfei
Author_Institution
Dalian Univ. of Technol., Dalian, China
fYear
2009
fDate
24-27 Sept. 2009
Firstpage
1
Lastpage
5
Abstract
Word-based representation is widely used in text categorization. However, performance of this approach is affected by the problems derived from language variation. In this paper, we investigate a document representation combining words and concepts to integrate the advantages of two types of representations. The approach takes the part of speech as the concept for the word which is error-prone in word sense disambiguation to reduce the disambiguation mistakes. The approach employs three ways to measure the contributions of different representation forms to classification and selects the most productive one as the feature to drop the concepts not suitable for representation while not losing the lexical semantic information. We conduct experiments to compare the performance of different types of representations on Chinese text categorization corpus of Fudan University. And the results confirm the validity of our combination representation.
Keywords
natural language processing; text analysis; Chinese text categorization; document representation; language variation; lexical semantic information; word sense disambiguation; word-based representation; Channel hot electron injection; Chaos; Dictionaries; Frequency; Robustness; Speech; Text categorization; Thesauri; Text categorization; combination representation; concept-based representation;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location
Dalian
Print_ISBN
978-1-4244-4538-7
Electronic_ISBN
978-1-4244-4540-0
Type
conf
DOI
10.1109/NLPKE.2009.5313771
Filename
5313771
Link To Document