Title :
A comprehensive analysis of using semantic information in text categorization
Author :
Celik, Koray ; Gungor, Tunga
Author_Institution :
Dept. of Comput. Eng., Bogazici Univ., Istanbul, Turkey
Abstract :
Traditional text categorization methods only deal with the content of the documents and use some statistic based metrics to represent the documents. The representation is then used by a machine learning approach to determine the document class. In this picture, the meaning of the document is missing. In order to add meaning into the text categorization process, we start with using part-of-speech tagging (POS). As expected, in a document each part-of-speech tag does not contribute the same amount of information to the document meaning. In addition to the POS information, we make use of WordNet to add semantic features such as synonyms, hypernyms, hyponyms, meronyms and topics into classification process. Using WordNet´s semantic features introduces ambiguity and not all semantic features are really related to the document content. To overcome this problem, we introduce a new method to eliminate the ambiguity. Various combinations of POS, WordNet and word sense disambiguation are applied and the results show that using semantic features perform better than the traditional, context based methods.
Keywords :
learning (artificial intelligence); natural language processing; pattern classification; text analysis; POS information; WordNet semantic features; ambiguity elimination; classification process; comprehensive analysis; document class; document content; document meaning; document representation; hypernyms; hyponyms; machine learning approach; meronyms; part-of-speech tagging; semantic information; statistic-based metrics; synonyms; text categorization methods; topics; word sense disambiguation; Accuracy; Conferences; Context; Measurement; Semantics; Tagging; Text categorization; pos tagging; semantic; text categorization; word sense disambiguation; wordnet;
Conference_Titel :
Innovations in Intelligent Systems and Applications (INISTA), 2013 IEEE International Symposium on
Conference_Location :
Albena
Print_ISBN :
978-1-4799-0659-8
DOI :
10.1109/INISTA.2013.6577651