DocumentCode :
3262830
Title :
A new descriptive clustering algorithm based on Nonnegative Matrix Factorization
Author :
Li, Zhao ; Peng, Hong ; Wu, Xindong
Author_Institution :
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
fYear :
2008
fDate :
26-28 Aug. 2008
Firstpage :
407
Lastpage :
412
Abstract :
Nonnegative matrix factorization (NMF) provides a way for finding a part-based representation of nonnegative data. An important property of NMF is that it can produce a sparse representation of the data; however, in some applications, especially in text clustering, the sparse representation always consists of separated words, which cannot explicitly express the meaning of the basis vector. This paper presents a new descriptive clustering algorithm based on NMF, called DC-NMF that can avoid this separated word problem. In our proposed method, we embrace the phrase-by-document matrix in addition to the commonly used term-by-document matrix. Then, we use conjunct gradient descent to minimize the mean squared error objective function. Finally, we describe each cluster with the highest weighted element corresponding to one particular phrase. Our experimental results indicate that our method can obtain more ldquopurerdquo clusters than other methods.
Keywords :
data mining; document handling; gradient methods; matrix decomposition; mean square error methods; pattern clustering; DC-NMF; conjunct gradient descent; descriptive clustering algorithm; mean squared error objective function; nonnegative matrix factorization; phrase-by-document matrix; term-by-document matrix; Clustering algorithms; Computer science; Data engineering; Gene expression; Image analysis; Linear approximation; Sparse matrices; Text mining; Vectors; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing, 2008. GrC 2008. IEEE International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-2512-9
Electronic_ISBN :
978-1-4244-2513-6
Type :
conf
DOI :
10.1109/GRC.2008.4664752
Filename :
4664752
Link To Document :
بازگشت