DocumentCode :
1973060
Title :
Features Discovery for Web Classification Using Support Vector Machine
Author :
Othman, M.S. ; Yusuf, L.M. ; Salim, Juhana
Author_Institution :
Fac. of Comput. Sci. & Inf. Syst., Univ. Teknol. Malaysia, Skudai, Malaysia
fYear :
2010
fDate :
22-23 June 2010
Firstpage :
36
Lastpage :
40
Abstract :
The ever fast-expanding web information resources pose a big challenge to internet users seeking the most relevant, latest and quality information. The sheer vast amount of web information has resulted in restructuring of the resources. Thus, an appropriate web classification method needs to be established in order for quality web information to be accessed. This paper intends to discuss the web document features that classify the web information resources. Six web document features have been identified which are text, meta tag and title (A), title and text (B), title (C), meta tag and title (D), meta tag (E) and text (F). The Support Vector Machine (SVM) method is used to classify the web document while four types of kernels namely: Radial Basis Function (RBF), linear, polynomial and sigmoid kernels was applied to test the accuracy of the classification. The studies show that the text, meta tag and title (A) features is the best features for classification of web document that employs the four kernels followed by the features on title and text (B) as well as the features on meta tag and title (C). The studies also found that the linear kernel is the best kernel in classifying the web document compared to the RBF, polynomial and sigmoid kernel.
Keywords :
Internet; Web sites; document handling; pattern classification; support vector machines; Internet user; Web classification method; Web information resource; information retrieval; linear kernel; polynomial kernel; radial basis function kernel; sigmoid kernel; support vector machine; web document classification; web information quality; Accuracy; Feature extraction; HTML; Internet; Kernel; Support vector machines; Text categorization; Support Vector Machine (SVM); Web Classification; Web Document;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Computing and Cognitive Informatics (ICICCI), 2010 International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-6640-5
Electronic_ISBN :
978-1-4244-6641-2
Type :
conf
DOI :
10.1109/ICICCI.2010.16
Filename :
5566043
Link To Document :
بازگشت