DocumentCode :
144467
Title :
Pattern and Cluster Mining on Text Data
Author :
Agnihotri, Deepak ; Verma, K. ; Tripathi, Priyanka
Author_Institution :
Dept. of Comput. Applic., NIT Raipur, Raipur, India
fYear :
2014
fDate :
7-9 April 2014
Firstpage :
428
Lastpage :
432
Abstract :
Due to heavy use of electronics devices nowadays most of the information is available in electronic format and a substantial portion of information is stored as text such as in news articles, technical papers, books, digital libraries, email messages, blogs, and web pages. Mining the knowledge like pattern finding or clustering of similar kind of words is one of the important issues nowadays. This paper focuses on mining the important information from the text data. This paper uses the stories data set from project Guttenbergs William Shakespeare stories dataset for experimental study. R is used as Text Mining and statistical analysis tool in Ubuntu 12.04 LTS Linux Operating System. Frequent pattern mining is used to find the frequent terms, appeared in the documents and word Association among two or more words is measured at a given threshold value. Our algorithm uses cosine similarity in order to measure the distance between the words before clustering. The algorithm may be use to find the similarity between stories, news, emails. In this paper k-means and hierarchical agglomerative clustering algorithm is used to form the cluster.
Keywords :
Linux; data mining; pattern clustering; statistical analysis; text analysis; Guttenbergs William Shakespeare stories dataset; Ubuntu 12.04 LTS Linux Operating System; cluster mining; cosine similarity; documents; frequent pattern mining; frequent terms; hierarchical agglomerative clustering algorithm; information mining; k-means clustering algorithm; knowledge mining; pattern finding; statistical analysis tool; text data; text mining; Algorithm design and analysis; Clustering algorithms; Databases; Electronic mail; Text analysis; Text mining; Clustering; TF - IDF; Word Association; stemming; stop words; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on
Conference_Location :
Bhopal
Print_ISBN :
978-1-4799-3069-2
Type :
conf
DOI :
10.1109/CSNT.2014.92
Filename :
6821432
Link To Document :
بازگشت