Title :
Document Classification with One-class Multiview Learning
Author :
Chen, Bin ; Li, Bin ; Pan, Zhisong ; Feng, Aimin
Author_Institution :
Dept. of Comput., Yangzhou Univ. Yangzhou, Yangzhou, China
Abstract :
Recently, automatic document classification has attracted a lot of attentions due to the large quantity of web documents. Amongst, a special case is to distinguish whether a document belongs to a target class (directory) when only the documents of target class are given, which is a standard oneclass classification problem. Moreover, differed from other data, Web pages have intrinsic (text) and extrinsic(hyperlink) features. Thus they are very suitable for multiview learning. To tackle the task of one-class document classification, a multiview one-class classifier isproposed, it utilizes the one-cluster clustering based data description (OCCDD) as the base one-class classifier, then gets a one-class classifier in each view by setting a membership threshold, simultaneously, achieves the consensus of different views by a regularization term.Hereafter, different views boost each other, rather than ensemble the results independently or perform document recognition in single view case. We conduct the experiments on the standard WebKB dataset with OCCDD and the proposed multiview method. Experimental results show the good performance of the multiview method in terms of effectiveness and stability to parameter.
Keywords :
Internet; document handling; pattern classification; pattern clustering; Web pages; automatic document classification; document recognition; one-class multiview learning; one-cluster clustering based data description; Aerospace industry; Clustering algorithms; Computer industry; Frequency; Information systems; Labeling; Learning systems; Object detection; Sparse matrices; Web pages;
Conference_Titel :
Industrial and Information Systems, 2009. IIS '09. International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-3618-7
DOI :
10.1109/IIS.2009.15