DocumentCode :
2259881
Title :
Document representation combining concepts and words in Chinese text categorization
Author :
Che, Chao ; Teng, Hongfei
Author_Institution :
Dalian Univ. of Technol., Dalian, China
fYear :
2009
fDate :
24-27 Sept. 2009
Firstpage :
1
Lastpage :
5
Abstract :
Word-based representation is widely used in text categorization. However, performance of this approach is affected by the problems derived from language variation. In this paper, we investigate a document representation combining words and concepts to integrate the advantages of two types of representations. The approach takes the part of speech as the concept for the word which is error-prone in word sense disambiguation to reduce the disambiguation mistakes. The approach employs three ways to measure the contributions of different representation forms to classification and selects the most productive one as the feature to drop the concepts not suitable for representation while not losing the lexical semantic information. We conduct experiments to compare the performance of different types of representations on Chinese text categorization corpus of Fudan University. And the results confirm the validity of our combination representation.
Keywords :
natural language processing; text analysis; Chinese text categorization; document representation; language variation; lexical semantic information; word sense disambiguation; word-based representation; Channel hot electron injection; Chaos; Dictionaries; Frequency; Robustness; Speech; Text categorization; Thesauri; Text categorization; combination representation; concept-based representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
Type :
conf
DOI :
10.1109/NLPKE.2009.5313771
Filename :
5313771
Link To Document :
بازگشت