DocumentCode
124259
Title
Text Document Latent Subspace Clustering by PLSA Factors
Author
Zhou, X.F. ; Liang, J.G. ; Hu, Ya ; Guo, Lisheng
Author_Institution
Inst. of Inf. Eng., Beijing, China
Volume
2
fYear
2014
fDate
11-14 Aug. 2014
Firstpage
442
Lastpage
448
Abstract
Text documents are often high dimensional and sparse, it is a great challenge to discover the clusters among the unlabelled text data, because there are no obvious clusters by common distance measure. In this paper we present a latent subspace clustering method to find text clusters. In our algorithm, we use latent factors extracted by probability latent semantic analysis (PLSA) to generate latent clustering subspaces, and then use the distance between sample and each latent clustering subspace as similarity for text clustering. On some text document datasets our method shows effective implementation for text clustering.
Keywords
data mining; pattern clustering; probability; semantic networks; text analysis; PLSA factors; common distance measure; latent factors; latent subspace clustering method; probability latent semantic analysis; text clustering similarity; text document datasets; text mining; unlabelled text data clusters; Accuracy; Clustering algorithms; Euclidean distance; Resource management; Semantics; Vectors; PLSA; Text clustering; subspace; text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location
Warsaw
Type
conf
DOI
10.1109/WI-IAT.2014.131
Filename
6927658
Link To Document