Title :
Web Page Classification Based on a Least Square Support Vector Machine with Latent Semantic Analysis
Author :
Zhang, Yong ; Fan, Bin ; Xiao, Long-bin
Author_Institution :
Sch. of Comput. & Commun., Lanzhou Univ. of Technol., Lanzhou
Abstract :
Chinese Web page classification (WPC) has been considered as a hot research area in data mining. In order to effectively classify Web pages, we present a Web page categorization based on a least square support vector machine (LS-SVM) with latent semantic analysis (LSA). LSA uses singular value decomposition (SVD) to obtain latent semantic structure of original term-document matrix solving the polysemous and synonymous keywords problem. LS-SVM is an effective method for learning the classification knowledge from massive data, especially on condition of high cost in getting labeled classical examples. We adopt a novel method of Web page expression, and make use of summarization algorithm to reduce the noise of Web pages. A preliminary experimental comparison is made showing encouraging results.
Keywords :
Web sites; data mining; singular value decomposition; support vector machines; Chinese Web page classification; data mining; latent semantic analysis; least square support vector machine; singular value decomposition; summarization algorithm; Data mining; Equations; HTML; Hydrogen; Internet; Least squares methods; Runtime; Support vector machine classification; Support vector machines; Web pages; latent semantic analysis; least square support vector machine; noise reduction; web page classification; web page expression;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.259