Title :
Rough Set-Aided Feature Selection for Automatic Web-Page Classification
Author :
Wakaki, Toshiko ; Itakura, Hiroyuki ; Tamura, Masaki
Author_Institution :
Shibaura Institute of Technology, Japan
Abstract :
Recently Web-pages on the World Wide Web are explosively increasing, and it is now required for portal sites such as Yahoo! service having directory-style search engines to classify Web-pages into many categories automatically. This paper investigates how rough settheory can help select relevant features for Web-page classification. Our experimental results show that the combination of the rough set-aided feature selection method and the Support Vector Machine with a linear kernel is quite useful in practice to classify Web-pages into many categories because not only the performance gives acceptable accuracy but also the high dimensionality reduction is achieved without depending on arbitrary thresholds for feature selection.
Keywords :
Decision trees; Humans; Itemsets; Kernel; Portals; Search engines; Support vector machine classification; Support vector machines; Web pages; Web sites;
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
DOI :
10.1109/WI.2004.10109