DocumentCode :
3250620
Title :
Heterogeneous learner for Web page classification
Author :
Yu, Hwanjo ; Chang, Kevin Chen-Chuan ; Han, Jiawei
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear :
2002
fDate :
2002
Firstpage :
538
Lastpage :
545
Abstract :
Classification of an interesting class of Web pages has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propose a heterogeneous learning framework for classifying Web pages, which (1) eliminates the need for negative training data, and (2) increases classification accuracy by using two heterogeneous learners. Our framework uses two heterogeneous learners-a decision list and a linear separator which complement each other-to eliminate the need for negative training data in the training phase and to increase the accuracy in the testing phase. Our results show that our heterogeneous framework achieves high accuracy without requiring negative training data; it enhances the accuracy of linear separators by reducing the errors on "low-margin data". That is, it classifies more accurately while requiring less human efforts in training.
Keywords :
Web sites; document handling; feature extraction; learning (artificial intelligence); pattern classification; Web page classification; classification accuracy; decision list; heterogeneous learner; linear separator; low-margin data; machine learning algorithms; Computer science; Humans; Machine learning algorithms; Particle separators; Resumes; Search engines; Testing; Training data; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183999
Filename :
1183999
Link To Document :
بازگشت