Title :
A Comparison Study: Web Pages Categorization with Bayesian Classifiers
Author :
Fu, Zengmei ; Chen, Chuanliang ; Gong, Yunchao ; Bie, Rongfang
Author_Institution :
Dept. of Comput. Sci., Beijing Normal Univ., Beijing
Abstract :
In the recent few years, web mining has become a hotspot of data mining with the development of Internet. Web pages classification is one of the essential techniques for web mining since classifying web pages of an interesting class is often the first step of mining the web. The high dimensional text vocabulary space is one of the main challenges of web pages. In this paper, we study the capabilities of Bayesian classifiers for web pages categorization. Several feature selection techniques, such as Chi Squared, Information Gain and Gain Ratio are used for selecting relevant words in web pages. Results on benchmark dataset show that the performances of Aggregating One-Dependence Estimators (AODE) and Hidden Naive Bayes (HNB) are both more competitive than other traditional methods.
Keywords :
Bayes methods; Internet; classification; data mining; Bayesian classifier; Web mining; Web page categorization; Web page classification; aggregating one-dependence estimators; feature selection technique; hidden naive Bayes; text vocabulary space; Bayesian methods; Computer science; Data mining; Equations; High performance computing; Internet; Performance gain; Software performance; Web mining; Web pages; Bayesian Classifiers; Data Mining; Web Pages Categorization;
Conference_Titel :
High Performance Computing and Communications, 2008. HPCC '08. 10th IEEE International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-0-7695-3352-0
DOI :
10.1109/HPCC.2008.80