Title :
Combining Bag-of-Words and Bag-of-Concepts representations for Arabic text classification
Author :
Alahmadi, Ahmed ; Joorabchi, Arash ; Mahdi, Abdulhussain E.
Author_Institution :
Dept. of Electron. & Comput. Eng., Univ. of Limerick, Limerick, Ireland
Abstract :
This paper introduces a set of new approaches for text representation for automatic classification of Arabic textual documents. These approaches are based on combining the well-known Bag-of-Words (BOW) and the Bag-of-Concepts (BOC) text representation schemes and utilizing Wikipedia as a knowledge base. The proposed representations are used to generate a vector space model, which in turn is fed into a classifier to categorize a collection of Arabic textual documents. Three different machine learning based classifiers have been utilized in this work. Performance of proposed text representation models is evaluated in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representation schemes that are based on augmenting the standard BOW with the BOC.
Keywords :
learning (artificial intelligence); natural language processing; pattern classification; text analysis; Arabic text classification; Arabic textual documents; Wikipedia; bag-of-concepts representation; bag-of-words representation; machine learning based classifiers; text representation schemes; vector space model; Arabic Text Classification; Natural Language Processing; Wikipedia;
Conference_Titel :
Irish Signals & Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT 2014). 25th IET
Conference_Location :
Limerick
DOI :
10.1049/cp.2014.0711