Title :
Text Document Latent Subspace Clustering by PLSA Factors
Author :
Zhou, X.F. ; Liang, J.G. ; Hu, Ya ; Guo, Lisheng
Author_Institution :
Inst. of Inf. Eng., Beijing, China
Abstract :
Text documents are often high dimensional and sparse, it is a great challenge to discover the clusters among the unlabelled text data, because there are no obvious clusters by common distance measure. In this paper we present a latent subspace clustering method to find text clusters. In our algorithm, we use latent factors extracted by probability latent semantic analysis (PLSA) to generate latent clustering subspaces, and then use the distance between sample and each latent clustering subspace as similarity for text clustering. On some text document datasets our method shows effective implementation for text clustering.
Keywords :
data mining; pattern clustering; probability; semantic networks; text analysis; PLSA factors; common distance measure; latent factors; latent subspace clustering method; probability latent semantic analysis; text clustering similarity; text document datasets; text mining; unlabelled text data clusters; Accuracy; Clustering algorithms; Euclidean distance; Resource management; Semantics; Vectors; PLSA; Text clustering; subspace; text mining;
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
DOI :
10.1109/WI-IAT.2014.131