Title :
Multi-labeled document classification using semi-supervived mixture model of Watson distributions on document manifold
Author :
Nguyen Kim Anh ; Ngo Van Linh ; Nguyen Khac Toi ; Nguyen The Tam
Author_Institution :
Sch. of Inf. & Commun. Technol., Hanoi Univ. of Sci. & Technol., Hanoi, Vietnam
Abstract :
Classification of multilabel documents is essential to information retrieval and text mining. Most of existing approaches to multilabel text classification do not pay attention to relationship between class labels and input documents and also rely on labeled data all the time for classification. In fact, unlabeled data is readily available whereas generation of labeled data is expensive and error prone as it needs human annotation. In this paper, we propose a novel multilabel document classification approach based on semi-supervised mixture model of Watson distributions on document manifold which explicitly considers the manifold structure of document space to exploit efficiently both labeled and unlabeled data for classification. Our proposed approach models all labels within a dataset simultaneously, which lends itself well to the task of considering the relationship between these labels. The experimental results show that proposed method outperforms the state-of-the-art methods applying to multilabeled text classification.
Keywords :
mixture models; pattern classification; statistical distributions; text analysis; Watson distributions; document manifold; labeled data; multilabeled document classification; multilabeled text classification; semisupervived mixture model; unlabeled data; Approximation methods; Art; Data models; Education; Manifolds; Support vector machines; Vectors; Laplacian Regularization; Mixture Models; Probabilistic Graphical Models; Semi-supervised Learning;
Conference_Titel :
Soft Computing and Pattern Recognition (SoCPaR), 2013 International Conference of
Conference_Location :
Hanoi
Print_ISBN :
978-1-4799-3399-0
DOI :
10.1109/SOCPAR.2013.7054113