DocumentCode :
3286257
Title :
Web Page Classification Based on a Least Square Support Vector Machine with Latent Semantic Analysis
Author :
Zhang, Yong ; Fan, Bin ; Xiao, Long-bin
Author_Institution :
Sch. of Comput. & Commun., Lanzhou Univ. of Technol., Lanzhou
Volume :
2
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
528
Lastpage :
532
Abstract :
Chinese Web page classification (WPC) has been considered as a hot research area in data mining. In order to effectively classify Web pages, we present a Web page categorization based on a least square support vector machine (LS-SVM) with latent semantic analysis (LSA). LSA uses singular value decomposition (SVD) to obtain latent semantic structure of original term-document matrix solving the polysemous and synonymous keywords problem. LS-SVM is an effective method for learning the classification knowledge from massive data, especially on condition of high cost in getting labeled classical examples. We adopt a novel method of Web page expression, and make use of summarization algorithm to reduce the noise of Web pages. A preliminary experimental comparison is made showing encouraging results.
Keywords :
Web sites; data mining; singular value decomposition; support vector machines; Chinese Web page classification; data mining; latent semantic analysis; least square support vector machine; singular value decomposition; summarization algorithm; Data mining; Equations; HTML; Hydrogen; Internet; Least squares methods; Runtime; Support vector machine classification; Support vector machines; Web pages; latent semantic analysis; least square support vector machine; noise reduction; web page classification; web page expression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
Type :
conf
DOI :
10.1109/FSKD.2008.259
Filename :
4666173
Link To Document :
بازگشت