DocumentCode :
2453241
Title :
Centroid-based Classification Enhanced with Wikipedia
Author :
Bawakid, Abdullah ; Oussalah, Mourad
Author_Institution :
Dept. of Electron., Electr. & Comput. Eng., Univ. of Birmingham, Birmingham, UK
fYear :
2010
fDate :
12-14 Dec. 2010
Firstpage :
65
Lastpage :
70
Abstract :
Most of the traditional text classification methods employ Bag of Words (BOW) approaches relying on the words frequencies existing within the training corpus and the testing documents. Recently, studies have examined using external knowledge to enrich the text representation of documents. Some have focused on using WordNet which suffers from different limitations including the available number of words, synsets and coverage. Other studies used different aspects of Wikipedia instead. Depending on the features being selected and evaluated and the external knowledge being used, a balance between recall, precision, noise reduction and information loss has to be applied. In this paper, we propose a new Centroid-based classification approach relying on Wikipedia to enrich the representation of documents through the use of Wikpedia´s concepts, categories structure, links, and articles text. We extract candidate concepts for each class with the help of Wikipedia and merge them with important features derived directly from the text documents. Different variations of the system were evaluated and the results show improvements in the performance of the system.
Keywords :
Web sites; pattern classification; text analysis; Wikipedia; WordNet; bag of words approach; candidate concepts extraction; centroid-based classification; testing document; text classification method; text representation; training corpus; words frequency; Electronic publishing; Encyclopedias; Feature extraction; Internet; Support vector machine classification; Training; Categorization; Classification; Semantics; Wikipedia; text enrichment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-9211-4
Type :
conf
DOI :
10.1109/ICMLA.2010.17
Filename :
5708814
Link To Document :
بازگشت