DocumentCode :
2208467
Title :
Weighted Feature Subset Non-negative Matrix Factorization and Its Applications to Document Understanding
Author :
Wang, Dingding ; Li, Tao ; Ding, Chris
Author_Institution :
Sch. of Comput. & Inf. Sci., Florida Int. Univ., Miami, FL, USA
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
541
Lastpage :
550
Abstract :
Keyword (Feature) selection enhances and improves many Information Retrieval (IR) tasks such as document categorization, automatic topic discovery, etc. The problem of keyword selection is usually solved using supervised algorithms. In this paper, we propose an unsupervised approach that combines keyword selection and document clustering (topic discovery) together. The proposed approach extends non-negative matrix factorization (NMF) by incorporating a weight matrix to indicate the importance of the keywords. The proposed approach is further extended to a weighted version in which each document is also assigned a weight to assess its importance in the cluster. This work considers both theoretical and empirical weighted feature subset selection for NMF and draws the connection between unsupervised feature selection and data clustering. We apply our proposed approaches to various document understanding tasks including document clustering, summarization, and visualization. Experimental results demonstrate the effectiveness of our approach for these tasks.
Keywords :
document handling; information retrieval; matrix decomposition; pattern clustering; unsupervised learning; data clustering; document clustering; information retrieval; keyword selection; nonnegative matrix factorization; unsupervised feature selection; weighted feature subset selection; Non-negative matrix factorization; feature selection; weighted feature subset non-negative matrix factorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.47
Filename :
5694008
Link To Document :
بازگشت