DocumentCode
3539084
Title
Automatic genre classification of Web documents using discriminant analysis for feature selection
Author
Maeda, Akira ; Hayashi, Yukinori
Author_Institution
Coll. of Inf. Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
fYear
2009
fDate
4-6 Aug. 2009
Firstpage
405
Lastpage
410
Abstract
In this paper, we propose a method to classify Web documents by genre (not by topic) based on features of words and HTML tags. For classification, we use SVM (support vector machine) and Naiumlve Bayes. In order to improve the accuracy of classification, we calculate discriminant efficiencies of each pair of a word and a HTML tag to find out HTML tags which are effective in classification. The experimental results show that our method using discriminant efficiencies achieves 8% increase in classification accuracy.
Keywords
Bayes methods; Internet; document handling; pattern classification; support vector machines; HTML tag; Naiumlve Bayes method; Web document; automatic genre classification; discriminant analysis; discriminant efficiency calculation; feature selection; support vector machine; Blogs; HTML; Portable media players; Search engines; Support vector machine classification; Support vector machines; Text analysis; Uniform resource locators; User-generated content; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location
London
Print_ISBN
978-1-4244-4456-4
Electronic_ISBN
978-1-4244-4457-1
Type
conf
DOI
10.1109/ICADIWT.2009.5273844
Filename
5273844
Link To Document