Title :
Web classification using extraction and machine learning techniques
Author :
Yusuf, L.M. ; Othman, M.S. ; Salim, Juhana
Author_Institution :
Fac. of Comput. Sci. & Inf. Syst., Univ. Teknol. Malaysia, Skudai, Malaysia
Abstract :
Internet services that has become easier to access has contributed to the drastic increase in the number of web pages. This phenomenon has created new difficulties to internet users about retrieving the latest, relevant and excellent web information. This is due to the enormous contents of web information that have caused problems in the restructuring of web information. Thus, in order to ensure the latest, quality and relevant web information is optimally retrievable, it is necessary to undertake the task of web document classification. This paper discusses the result of classifying web document using the extraction and machine learning techniques. Four types of kernels namely the Radial Basis Function (RBF), linear, polynomial and sigmoid are applied to test the accuracy of the classification. The results show that the accuracy percentage of web document classification will increase whenever more web document is used. The results also show that linear kernel technique is the best in web document classification compared to RBF, polynomial and sigmoid.
Keywords :
Internet; document handling; information retrieval; learning (artificial intelligence); pattern classification; radial basis function networks; Internet services; Web document classification; Web information retrieval; Web pages; extraction technique; linear kernel; machine learning technique; polynomial kernel; radial basis function; sigmoid kernel; Accuracy; Astronomy; Biology; Europe; Finance; HTML; Testing; Extraction; Machine Learning; Web Classification; Web Document;
Conference_Titel :
Information Technology (ITSim), 2010 International Symposium in
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-6715-0
DOI :
10.1109/ITSIM.2010.5561603