Title :
Semantic Feature Selection Using WordNet
Author :
Chua, Stephanie ; Kulathuramaiyer, Narayanan
Author_Institution :
Universiti Malaysia Sarawak
Abstract :
The web has caused an explosion of documents, requiring the need for an automated text categorization system. This paper explores the notion of semantic feature selection by employing WordNet [Introduction to WordNet: An On-line Lexical Database], a lexical database. The proposed semantic approach employs noun synonyms and word senses for feature selection to select terms that are semantically representative of a category of documents. The categorical sense disambiguation extends the use of WordNet, which has been typically used for text retrieval and word sense disambiguation [A WordNet-based Algorithm for Word Sense Disambiguation]. Our experiments on the Reuters-21578 dataset have shown that automated semantic feature selection is able to perform better than well known statistical feature selection methods, Information Gain and Chi-Square as a feature selection method.
Keywords :
Computer science; Explosions; Feature extraction; Frequency; Information technology; Mutual information; Performance gain; Spatial databases; Statistics; Text categorization;
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
DOI :
10.1109/WI.2004.10115