Title :
Preprocessing and Feature Preparation in Chinese Web Page Classification
Author :
Huang, Weitong ; Xu, Luxiong ; Liu, Yanmin
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing
Abstract :
A detailed design and implementation of a Chinese Web-page classification system is described in this paper, and some methods on Chinese Web-page preprocessing and feature preparation are proposed. Experimental results on a Chinese Web-page dataset show that methods we designed can improve the performance from 75.82% to 81.88%.
Keywords :
Web sites; classification; natural language processing; Chinese Web page classification; Chinese Web-page dataset; Chinese Web-page preprocessing; feature preparation; Application software; Computer applications; Computer science; Data mining; Design engineering; HTML; Navigation; Particle separators; Vocabulary; Web pages; Chinese web-page preprocessing; Feature preparation; Text classification;
Conference_Titel :
Computer Engineering and Technology, 2009. ICCET '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-3334-6
DOI :
10.1109/ICCET.2009.72