DocumentCode :
2571412
Title :
A boosted semi-supervised learning framework for web page filtering
Author :
He, Zhu ; Li, Xi ; Hu, Weiming
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
fYear :
2009
fDate :
11-14 Oct. 2009
Firstpage :
2133
Lastpage :
2136
Abstract :
The World Wide Web provides great convenience for users to obtain information. However, there exists much harmful information on the Internet, such as pornographic content and prohibited drugs´ information. Thus, how to filter harmful Web pages on the Internet is quite an important issue. In general, the problem of harmful Web page filtering is converted to that of Web page classification, which needs plenty of well labeled training samples. However, the cost of labeling a large set of Web pages is very expensive. To address this problem, we adopt a semi-supervised framework for Web page filtering. In this framework, each Web page is represented by bags of different features, extracted using its HTML structure. Then a semi-supervised learning strategy is taken for efficiently obtaining well labeled training samples. Finally, a boosting classifier is utilized for harmful Web page filtering. Experiments have demonstrated the effectiveness of our framework.
Keywords :
Internet; information filtering; learning (artificial intelligence); pattern classification; HTML structure; Internet; Web page classification; Web page filtering; World Wide Web; boosted semisupervised learning framework; boosting classifier; feature extraction; semisupervised learning strategy; Costs; Drugs; Feature extraction; Information filtering; Information filters; Internet; Labeling; Semisupervised learning; Web pages; Web sites; machine learning; semi-supervised learning; web page filtering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on
Conference_Location :
San Antonio, TX
ISSN :
1062-922X
Print_ISBN :
978-1-4244-2793-2
Electronic_ISBN :
1062-922X
Type :
conf
DOI :
10.1109/ICSMC.2009.5346290
Filename :
5346290
Link To Document :
بازگشت