DocumentCode
2859836
Title
Semantic Feature Selection Using WordNet
Author
Chua, Stephanie ; Kulathuramaiyer, Narayanan
Author_Institution
Universiti Malaysia Sarawak
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
166
Lastpage
172
Abstract
The web has caused an explosion of documents, requiring the need for an automated text categorization system. This paper explores the notion of semantic feature selection by employing WordNet [Introduction to WordNet: An On-line Lexical Database], a lexical database. The proposed semantic approach employs noun synonyms and word senses for feature selection to select terms that are semantically representative of a category of documents. The categorical sense disambiguation extends the use of WordNet, which has been typically used for text retrieval and word sense disambiguation [A WordNet-based Algorithm for Word Sense Disambiguation]. Our experiments on the Reuters-21578 dataset have shown that automated semantic feature selection is able to perform better than well known statistical feature selection methods, Information Gain and Chi-Square as a feature selection method.
Keywords
Computer science; Explosions; Feature extraction; Frequency; Information technology; Mutual information; Performance gain; Spatial databases; Statistics; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10115
Filename
1410799
Link To Document