DocumentCode :
3539084
Title :
Automatic genre classification of Web documents using discriminant analysis for feature selection
Author :
Maeda, Akira ; Hayashi, Yukinori
Author_Institution :
Coll. of Inf. Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
fYear :
2009
fDate :
4-6 Aug. 2009
Firstpage :
405
Lastpage :
410
Abstract :
In this paper, we propose a method to classify Web documents by genre (not by topic) based on features of words and HTML tags. For classification, we use SVM (support vector machine) and Naiumlve Bayes. In order to improve the accuracy of classification, we calculate discriminant efficiencies of each pair of a word and a HTML tag to find out HTML tags which are effective in classification. The experimental results show that our method using discriminant efficiencies achieves 8% increase in classification accuracy.
Keywords :
Bayes methods; Internet; document handling; pattern classification; support vector machines; HTML tag; Naiumlve Bayes method; Web document; automatic genre classification; discriminant analysis; discriminant efficiency calculation; feature selection; support vector machine; Blogs; HTML; Portable media players; Search engines; Support vector machine classification; Support vector machines; Text analysis; Uniform resource locators; User-generated content; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location :
London
Print_ISBN :
978-1-4244-4456-4
Electronic_ISBN :
978-1-4244-4457-1
Type :
conf
DOI :
10.1109/ICADIWT.2009.5273844
Filename :
5273844
Link To Document :
بازگشت