DocumentCode :
2345213
Title :
Exploiting Negative Categories and Wikipedia Structures for Document Classification
Author :
Murugeshan, Meenakshi Sundaram ; Lakshmi, K. ; Mukherjee, Saswati
Author_Institution :
Dept. of Comput. Sci. & Eng., Guindy Anna Univ., Chennai, India
fYear :
2009
fDate :
27-28 Oct. 2009
Firstpage :
868
Lastpage :
872
Abstract :
This paper explores the effect of profile based method for classification of Wikipedia XML documents. Our approach builds two profiles, exploiting the whole content, Initial Descriptions and links in the Wikipedia documents. For building profiles we use the negative category information which has shown to perform well for classifying unstructured texts. The performance of Cosine and Fractional Similarity metrics is also compared. The use of two classifiers and their weighted average improves the classification performance.
Keywords :
Web sites; XML; document handling; pattern classification; Wikipedia XML documents; Wikipedia structures; cosine metrics; document classification; fractional similarity metrics; initial descriptions; negative categories; negative category information; Communications technology; Computer science; Educational institutions; Paper technology; Radio frequency; Testing; Wikipedia; XML; Feature Selection; Multiple Classifiers; Negative Categories; Profile Creation; Similarity Measures; XML Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Recent Technologies in Communication and Computing, 2009. ARTCom '09. International Conference on
Conference_Location :
Kottayam, Kerala
Print_ISBN :
978-1-4244-5104-3
Electronic_ISBN :
978-0-7695-3845-7
Type :
conf
DOI :
10.1109/ARTCom.2009.79
Filename :
5328383
Link To Document :
بازگشت