• DocumentCode
    120659
  • Title

    A frequent term based text clustering approach using novel similarity measure

  • Author

    Reddy, G. Shrikanth ; Rajinikanth, T.V. ; Rao, A. Ananda

  • Author_Institution
    Dept. of IT, VNRVJIET, Hyderabad, India
  • fYear
    2014
  • fDate
    21-22 Feb. 2014
  • Firstpage
    495
  • Lastpage
    499
  • Abstract
    Text clustering is an unsupervised process forming its basis solely on finding the similarity relationship between documents with the output as a set of clusters [14]. In this research, a commonality measure is defined to find commonality between two text files which is used as a similarity measure. The main idea is to apply any existing frequent item finding algorithm such as apriori or fp-tree to the initial set of text files to reduce the dimension of the input text files. A document feature vector is formed for all the documents. Then a vector is formed for all the static text input files. The algorithm outputs a set of clusters from the initial input of text files considered.
  • Keywords
    pattern clustering; text analysis; unsupervised learning; commonality measure; document feature vector; fp-tree; frequent item finding algorithm; frequent term based text clustering approach; input text file dimension reduction; novel similarity measure; similarity relationship; unsupervised learning process; Algorithm design and analysis; Clustering algorithms; Conferences; Itemsets; Text categorization; Vectors; Apriori; Clustering; Commanality measure; frequent item;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advance Computing Conference (IACC), 2014 IEEE International
  • Conference_Location
    Gurgaon
  • Print_ISBN
    978-1-4799-2571-1
  • Type

    conf

  • DOI
    10.1109/IAdCC.2014.6779374
  • Filename
    6779374