• DocumentCode
    2859836
  • Title

    Semantic Feature Selection Using WordNet

  • Author

    Chua, Stephanie ; Kulathuramaiyer, Narayanan

  • Author_Institution
    Universiti Malaysia Sarawak
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    166
  • Lastpage
    172
  • Abstract
    The web has caused an explosion of documents, requiring the need for an automated text categorization system. This paper explores the notion of semantic feature selection by employing WordNet [Introduction to WordNet: An On-line Lexical Database], a lexical database. The proposed semantic approach employs noun synonyms and word senses for feature selection to select terms that are semantically representative of a category of documents. The categorical sense disambiguation extends the use of WordNet, which has been typically used for text retrieval and word sense disambiguation [A WordNet-based Algorithm for Word Sense Disambiguation]. Our experiments on the Reuters-21578 dataset have shown that automated semantic feature selection is able to perform better than well known statistical feature selection methods, Information Gain and Chi-Square as a feature selection method.
  • Keywords
    Computer science; Explosions; Feature extraction; Frequency; Information technology; Mutual information; Performance gain; Spatial databases; Statistics; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10115
  • Filename
    1410799