Title :
Document Representation Using Nonnegative Matrix Factorization
Author :
Pei, XiaoBing ; Xiao, Laiyuan ; Chen, Changqing
Author_Institution :
Coll. of Software, HuaZhong Universirty of Sci. & Technol., Wuhan, China
Abstract :
Non-negative matrix factorization (NMF) is an emerging technique of latent semantic analysis from the given document corpus. The existing NMF algorithms don not use the intrinsic structure information of original document corpus. In order to preserve intrinsic structure information in latent semantic space extracted by NMF, a NMF algorithm with intrinsic structure information properties is presented. The primary ideal is to extend the original NMF through incorporating the intrinsic structure information constraints inside the NMF decomposition. Our experimental results performed on the RCV1 and SECTOR data sets show that the proposed method is superior to NMF for document latent semantic analysis.
Keywords :
document handling; information retrieval; matrix decomposition; SECTOR data sets; document representation; intrinsic structure information constraints; latent semantic analysis; nonnegative matrix factorization; vector space information retrieval; Data mining; Educational institutions; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Performance analysis; Space technology; Sparse matrices; Text analysis;
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
DOI :
10.1109/CISE.2009.5362815