Title :
Binary Cybergenre Classification Using Theoretic Feature Measures
Author :
Dong, Lie ; Walters, Christine ; Duffy, Jack ; Shepherd, Michael
Author_Institution :
Dalhousie Univ., Halifax, NS
Abstract :
In this study, we conducted an investigation on automatic genre classification for three common types of Web pages addressing the effect of three theoretic feature selection measures, a range of feature set size, and three machine classifiers on the accuracy of the Web page classification in the context of a set of controlled experiments. Our results are encouraging and we conclude that for binary classification tasks, at least for these Web page genres, it is possible to reach satisfying results with small content-based feature sets generated with a sound feature selection measure and furthermore there is no evidence of interaction between these feature selection measures and the machine classifiers used
Keywords :
Internet; classification; feature extraction; information retrieval; search engines; support vector machines; automatic Web page genre classification; binary cybergenre classification; content-based feature sets; machine classifier; theoretic feature selection measure; Automatic control; HTML; Information retrieval; Lifting equipment; Robustness; Search engines; Size control; Size measurement; Uniform resource locators; Web pages;
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7