Title :
Weirdness Coefficient as a Feature Selection Method for Arabic Special Domain Text Classification
Author :
Al-Thubaity, Abdulmohsen ; Alanazi, Ayidh ; Hazzaa, I. ; Al-Tuwaijri, Haya
Author_Institution :
Comput. Res. Inst., King Abdulaziz City for Sci. & Technol., Riyadh, Saudi Arabia
Abstract :
Given the importance of organizing and managing the rapid growth in knowledge of Arabic electronic content, this study introduces the Weirdness Coefficient (W) as a new feature selection method for Arabic special domain text classification. The proposed method was used to classify a dataset comprising five Islamic topics using Naive base (NB) and K-nearest neighbor (K-NN) classifiers, and three representation schemas. The results were also compared with a well-known feature selection method, Chi-squared. In addition to its simplicity in computation, the Weirdness Coefficient showed promising classification accuracy.
Keywords :
pattern classification; text analysis; Arabic electronic content; Arabic special domain text classification; Islamic topics; K-NN; K-nearest neighbor classifiers; NB; Naïve base classifiers; feature selection method; weirdness coefficient; Accuracy; Classification algorithms; Computers; Educational institutions; Electronic mail; Niobium; Text categorization; Arabic text classification; K-NN; NB; Weirdness Coefficient; feature selection;
Conference_Titel :
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4673-6113-2
Electronic_ISBN :
978-0-7695-4886-9
DOI :
10.1109/IALP.2012.64