• DocumentCode
    2331179
  • Title

    Using semantic similarity matrix for defining operations involved in NTSO for clustering 20NewsGroups

  • Author

    Jo, Taeho

  • Author_Institution
    Sch. of Comput. & Inf. Eng., Inha Univ., Incheon, South Korea
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this research, we propose the similarity matrix based version of NTSO as the approach to the text clustering. For using one of traditional approaches to text clustering, documents should be encoded into numerical vectors; encoding so causes the two main problems: the huge dimensionality and the sparse distribution. In order to solve the problems, in this research, we propose to encode documents into string vectors and use the NTSO (Neural Text Self Organization) as the string vector based neural network for the text clustering. By encoding documents into another form, we attempt to avoid the two main problems, completely. As the empirical validation, the proposed approach will be compared with others with respect to the clustering performance and speed.
  • Keywords
    matrix algebra; neural nets; pattern clustering; text analysis; vectors; 20NewsGroups; NTSO; neural network; neural text self organization; numerical vector; semantic similarity matrix; string vector; text clustering; Artificial neural networks; Clustering algorithms; Encoding; Finite element methods; Semantics; Text categorization; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2010 IEEE Congress on
  • Conference_Location
    Barcelona
  • Print_ISBN
    978-1-4244-6909-3
  • Type

    conf

  • DOI
    10.1109/CEC.2010.5586335
  • Filename
    5586335