Title :
SVM based Chinese Web page automatic classification
Author_Institution :
Inst. of Comput. Sci., Zhejiang Normal Univ., Jinhua, China
Abstract :
This paper deals with Chinese web page classification based on support vector machine (SVM). First, Some methods are proposed for feature extraction and selection based on textual keywords. Then Special problems are discussed on statistic learning theory, support vector machine and their application in classification. Quadratic program algorithm is also described for constructing the SVM classifier. In the experiment part, the sample set, including 5096 samples, is chosen from the web version of Chinese People´s Daily. It is separated into two sets, the training set with 3398 samples and the test set with 1698 samples. Two kinds of kernel function, polynomial and radial basis function, are considered in constructing the SVM classifier. The final classification correct rates are 89.81%, 86.51% for the two classifiers, respectively.
Keywords :
Web sites; feature extraction; learning (artificial intelligence); polynomials; quadratic programming; radial basis function networks; statistical analysis; support vector machines; text analysis; Chinese Web page; SVM; automatic classification; feature extraction; kernel function; polynomial function; quadratic program algorithm; radial basis function; statistic learning theory; support vector machine; textual keyword; Classification algorithms; Feature extraction; Kernel; Machine learning; Polynomials; Statistics; Support vector machine classification; Support vector machines; Testing; Web pages;
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
DOI :
10.1109/ICMLC.2003.1259884