Title :
Studies on Chinese Web page classification
Author :
Shen, Dou ; Cong, Yan ; Sun, Jian-tao ; Lu, W-chang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
In this paper we make studies on several key aspects for Chinese Web page classification such as Web page representation, word segmentation and feature selection. For the first two aspects, we test the published techniques on these issues on our Chinese corpora and give reasonable analysis for their performance. As to feature selection, we bring forward the idea of taking the role of a word´s POS into consideration in pre-processing and the experimental results validate our idea.
Keywords :
Web sites; classification; Chinese Web page classification; Web page representation; data sets; feature selection; word segmentation; Computer science; Electronic mail; Explosives; Niobium; Performance analysis; Search engines; Sun; Testing; Web pages; Web sites;
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
DOI :
10.1109/ICMLC.2003.1264435