DocumentCode :
120659
Title :
A frequent term based text clustering approach using novel similarity measure
Author :
Reddy, G. Shrikanth ; Rajinikanth, T.V. ; Rao, A. Ananda
Author_Institution :
Dept. of IT, VNRVJIET, Hyderabad, India
fYear :
2014
fDate :
21-22 Feb. 2014
Firstpage :
495
Lastpage :
499
Abstract :
Text clustering is an unsupervised process forming its basis solely on finding the similarity relationship between documents with the output as a set of clusters [14]. In this research, a commonality measure is defined to find commonality between two text files which is used as a similarity measure. The main idea is to apply any existing frequent item finding algorithm such as apriori or fp-tree to the initial set of text files to reduce the dimension of the input text files. A document feature vector is formed for all the documents. Then a vector is formed for all the static text input files. The algorithm outputs a set of clusters from the initial input of text files considered.
Keywords :
pattern clustering; text analysis; unsupervised learning; commonality measure; document feature vector; fp-tree; frequent item finding algorithm; frequent term based text clustering approach; input text file dimension reduction; novel similarity measure; similarity relationship; unsupervised learning process; Algorithm design and analysis; Clustering algorithms; Conferences; Itemsets; Text categorization; Vectors; Apriori; Clustering; Commanality measure; frequent item;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advance Computing Conference (IACC), 2014 IEEE International
Conference_Location :
Gurgaon
Print_ISBN :
978-1-4799-2571-1
Type :
conf
DOI :
10.1109/IAdCC.2014.6779374
Filename :
6779374
Link To Document :
بازگشت