DocumentCode :
2799700
Title :
Graph-based Semi-supervised Learning Algorithm for Web Page Classification
Author :
Liu, Rong ; Zhou, Jianzhong ; Liu, Ming
Author_Institution :
Digital Eng. Res. Center, Huazhong Univ. of Sci. & Technol., Wuhan
Volume :
2
fYear :
2006
fDate :
16-18 Oct. 2006
Firstpage :
856
Lastpage :
860
Abstract :
Many application domains such as Web page classification suffer from not having enough labeled training examples for learning. However, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. As a result, there has been a great deal of work in resent years on semi-supervised learning. This paper proposes a graph-based semi-supervised learning algorithm that is applied to the Web page classification. Our algorithm uses a similarity measure between Web pages to construct a k-nearest neighbor graph. Labeled and unlabeled Web pages are represented as nodes in the weighted graph, with edge weights encoding the similarity between the Web pages. In order to use unlabeled data to help classification and get higher accuracy, edge weights of the graph are computed through combining weighting schemes and link information of Web pages. The learning problem is then formulated in terms of label propagation in the graph. By using probabilistic matrix methods and belief propagation, the labeled nodes push out labels through unlabeled nodes. Our preliminary experiments on the WebKB dataset show that the algorithm in this paper can effectively exploit unlabeled data in addition to labeled ones to get higher accuracy of Web page classification
Keywords :
Internet; Web sites; belief networks; classification; graph theory; learning (artificial intelligence); probability; Web page classification; belief propagation; graph-based semisupervised learning; k-nearest neighbor graph; probabilistic matrix; weighted graph; Application software; Classification algorithms; Computer science; Data mining; Inference algorithms; Machine learning; Semisupervised learning; Support vector machine classification; Support vector machines; Web pages; Semi-supervised learning. Graph. Web page classification. Link information.;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications, 2006. ISDA '06. Sixth International Conference on
Conference_Location :
Jinan
Print_ISBN :
0-7695-2528-8
Type :
conf
DOI :
10.1109/ISDA.2006.253724
Filename :
4021776
Link To Document :
بازگشت