Title :
Cross-Lingual Short-Text Document Classification for Facebook Comments
Author :
Faqeeh, Mosab ; Abdulla, Nawaf ; Al-Ayyoub, Mahmoud ; Jararweh, Yaser ; Quwaider, Muhannad
Author_Institution :
Jordan Univ. of Sci. & Technol., Irbid, Jordan
Abstract :
Document Classification (DC) is one of the fundamental problems in text mining. Plenty of works exist on DC with interesting approaches and excellent results, however, most of them focus on a long-text documents written in a single language with English being the most studied language. This work is concerned with the natural step beyond such works which is cross-lingual DC for short-text documents. Specifically, we consider two languages, Arabic and English, and compare the performance of some of the most popular document classifiers on two datasets of short Facebook comments. Apart from limited attempts, the addressed problem has not been studied well enough. The results are encouraging and new insights are obtained.
Keywords :
pattern classification; social networking (online); text analysis; DC; English; Facebook comments; cross-lingual short-text document classification; long-text documents; text mining; Accuracy; Facebook; Niobium; Sentiment analysis; Support vector machines; Text categorization; cross-lingual text analysis; decision tree; document classification; k-nearest neighbor; naive Bayes; social network comments; support vector machine;
Conference_Titel :
Future Internet of Things and Cloud (FiCloud), 2014 International Conference on
Conference_Location :
Barcelona
DOI :
10.1109/FiCloud.2014.99