• DocumentCode
    124259
  • Title

    Text Document Latent Subspace Clustering by PLSA Factors

  • Author

    Zhou, X.F. ; Liang, J.G. ; Hu, Ya ; Guo, Lisheng

  • Author_Institution
    Inst. of Inf. Eng., Beijing, China
  • Volume
    2
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    442
  • Lastpage
    448
  • Abstract
    Text documents are often high dimensional and sparse, it is a great challenge to discover the clusters among the unlabelled text data, because there are no obvious clusters by common distance measure. In this paper we present a latent subspace clustering method to find text clusters. In our algorithm, we use latent factors extracted by probability latent semantic analysis (PLSA) to generate latent clustering subspaces, and then use the distance between sample and each latent clustering subspace as similarity for text clustering. On some text document datasets our method shows effective implementation for text clustering.
  • Keywords
    data mining; pattern clustering; probability; semantic networks; text analysis; PLSA factors; common distance measure; latent factors; latent subspace clustering method; probability latent semantic analysis; text clustering similarity; text document datasets; text mining; unlabelled text data clusters; Accuracy; Clustering algorithms; Euclidean distance; Resource management; Semantics; Vectors; PLSA; Text clustering; subspace; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.131
  • Filename
    6927658