A frequent term based text clustering approach using novel similarity measure

Author

Reddy, G. Shrikanth ; Rajinikanth, T.V. ; Rao, A. Ananda

Author_Institution

Dept. of IT, VNRVJIET, Hyderabad, India

fYear

2014

fDate

21-22 Feb. 2014

Firstpage

495

Lastpage

499

Abstract

Text clustering is an unsupervised process forming its basis solely on finding the similarity relationship between documents with the output as a set of clusters [14]. In this research, a commonality measure is defined to find commonality between two text files which is used as a similarity measure. The main idea is to apply any existing frequent item finding algorithm such as apriori or fp-tree to the initial set of text files to reduce the dimension of the input text files. A document feature vector is formed for all the documents. Then a vector is formed for all the static text input files. The algorithm outputs a set of clusters from the initial input of text files considered.

Keywords

pattern clustering; text analysis; unsupervised learning; commonality measure; document feature vector; fp-tree; frequent item finding algorithm; frequent term based text clustering approach; input text file dimension reduction; novel similarity measure; similarity relationship; unsupervised learning process; Algorithm design and analysis; Clustering algorithms; Conferences; Itemsets; Text categorization; Vectors; Apriori; Clustering; Commanality measure; frequent item;

fLanguage

English

Publisher

ieee

Conference_Titel

Advance Computing Conference (IACC), 2014 IEEE International

Conference_Location

Gurgaon

Print_ISBN

978-1-4799-2571-1

Type

conf

DOI

10.1109/IAdCC.2014.6779374

Filename

6779374