Title :
A real time clustering method using document index graph
Author :
Akthar, Nadeem ; Ahamad, Mohd Vasim ; Khan, Azeem Ush Shan
Author_Institution :
Dept. of Comput. Eng., Aligarh Muslim Univ., Aligarh, India
Abstract :
From a previous survey, 45% of users did not get what they are actually looking for in the web using any search engine. Suppose, you have a million of text file in your server or in your computer, then there is a need to categorize them on the basis of their content in a very efficient way. As a result, IR (Information Retrieval) tool has been developed, it provides a more effective ways for users to categorize relevant data. Most of the clustering algorithm like Vector Space Model considers only single words but it is not incremental so it can´t be applied on-line and another algorithm, STC, involves `trie´ concept to identify shared phrases suitable to apply on-line, but the main problem is, it doesn´t work for large number of data set. In this paper, we have introduced DIGE clustering algorithm which generates the clusters based on the common phrases and also on the single terms. DIGE clustering algorithm based on the DIG model for the representation of documents. The construction of DIG model is incremental, so DIGE is also capable to produce cluster using online document and also it doesn´t occupy much memory, so also applicable for offline.
Keywords :
Internet; document handling; graph theory; information retrieval; pattern clustering; DIG model; DIGE clustering algorithm; IR tool; document index graph; document representation; information retrieval tool; online document; real time clustering method; search engine; Algorithm design and analysis; Clustering algorithms; Indexes; Merging; Rivers; Search engines; Vectors; Clustering; Document Index Graph; Incremental Algorithm; Phrase Cluster; Suffix Tree Clustering; Web-Snippets;
Conference_Titel :
Data Mining and Intelligent Computing (ICDMIC), 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-1-4799-4675-4
DOI :
10.1109/ICDMIC.2014.6954222