Title :
Web Document Clustering Technique Using Case Grammar Structure
Author :
Supreethi, K.P. ; Prasad, E.V.
Author_Institution :
JNTUCEA, Anantapur
Abstract :
Most of the documents clustering techniques rely on single term analysis of the document data set, such as the Vector space model. More informative features including phrases and their weights are particularly important to achieve more accurate document clustering. Document clustering is particularly useful in many applications such as automatic categorization of documents, grouping search engine results, building taxonomy of documents and others. The motivation behind the work in this paper is that we believe that document clustering should be based not only on single word analysis, but on phrases as well. Phrase based analysis means that the similarity between documents should be based on matching phrases rather than on single words only. In this paper, we propose a system for Web clustering based on two key concepts. The first is the use of weighted phrases as an essential constituent of documents. Similarity between documents will be based on matching phrases and their weights. The second concept is the incremental clustering of documents to maximize the tightness of clusters by carefully watching the similarity distribution inside each cluster.
Keywords :
Internet; document handling; grammars; pattern clustering; search engines; Web clustering; Web document clustering technique; automatic document categorization; grammar structure; phrase based analysis; search engine; single term analysis; single word analysis; vector space model; Clustering methods; Computational intelligence; Functional analysis; Graph theory; HTML; Indexing; Search engines; System analysis and design; Taxonomy; Text analysis;
Conference_Titel :
Conference on Computational Intelligence and Multimedia Applications, 2007. International Conference on
Conference_Location :
Sivakasi, Tamil Nadu
Print_ISBN :
0-7695-3050-8
DOI :
10.1109/ICCIMA.2007.245