• DocumentCode
    441840
  • Title

    TCBLHT: a new method of hierarchical text clustering

  • Author

    Xu, Jian-Suo ; Wang, Li

  • Author_Institution
    Sch. of Economy & Manage., Henan Normal Univ., Xinxiang, China
  • Volume
    4
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    2178
  • Abstract
    This paper presents a new method of hierarchical text clustering based on combination of latent semantic analysis (LSA) and hierarchical TGSOM, which is called TCBLHT method. The text clustering result using traditional methods cannot show hierarchical structure, however, the hierarchical structure is very important in text clustering. The TCBLHT method can automatically achieve hierarchical text clustering, and establishes vector space model (VSM) of term weight by using the theory of LSA, then semantic relation is included in the vector space model. Both theory analysis and experimental results confirm that TCBLHT method decreases the number of vector, and enhances the efficiency and precision of text clustering.
  • Keywords
    computational linguistics; data mining; pattern clustering; text analysis; vectors; LSA; TCBLHT; hierarchical TGSOM; hierarchical text clustering; latent semantic analysis; vector space model; Clustering methods; Data mining; Frequency; Functional analysis; Machine learning; Matrix decomposition; Singular value decomposition; Statistics; Technology management; Text mining; Hierarchical TGSOM; Latent Semantic Analysis; Text Clustering; Vector Space Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527306
  • Filename
    1527306