DocumentCode
120659
Title
A frequent term based text clustering approach using novel similarity measure
Author
Reddy, G. Shrikanth ; Rajinikanth, T.V. ; Rao, A. Ananda
Author_Institution
Dept. of IT, VNRVJIET, Hyderabad, India
fYear
2014
fDate
21-22 Feb. 2014
Firstpage
495
Lastpage
499
Abstract
Text clustering is an unsupervised process forming its basis solely on finding the similarity relationship between documents with the output as a set of clusters [14]. In this research, a commonality measure is defined to find commonality between two text files which is used as a similarity measure. The main idea is to apply any existing frequent item finding algorithm such as apriori or fp-tree to the initial set of text files to reduce the dimension of the input text files. A document feature vector is formed for all the documents. Then a vector is formed for all the static text input files. The algorithm outputs a set of clusters from the initial input of text files considered.
Keywords
pattern clustering; text analysis; unsupervised learning; commonality measure; document feature vector; fp-tree; frequent item finding algorithm; frequent term based text clustering approach; input text file dimension reduction; novel similarity measure; similarity relationship; unsupervised learning process; Algorithm design and analysis; Clustering algorithms; Conferences; Itemsets; Text categorization; Vectors; Apriori; Clustering; Commanality measure; frequent item;
fLanguage
English
Publisher
ieee
Conference_Titel
Advance Computing Conference (IACC), 2014 IEEE International
Conference_Location
Gurgaon
Print_ISBN
978-1-4799-2571-1
Type
conf
DOI
10.1109/IAdCC.2014.6779374
Filename
6779374
Link To Document