DocumentCode :
2724890
Title :
Distributed Document Clustering Using Word-clusters
Author :
Deb, Debzani ; Angryk, Rafal A.
Author_Institution :
Dept. of Comput. Sci., Montana State Univ., Bozeman, MT
fYear :
2007
fDate :
March 1 2007-April 5 2007
Firstpage :
376
Lastpage :
383
Abstract :
Document clustering has become an increasingly important task in analyzing huge numbers of documents distributed among various sites. The challenging aspect is to analyze this enormous number of extremely high dimensional distributed documents and to organize them in such a way that results in better search and knowledge extraction without introducing much extra cost and complexity. This paper presents a distributed document clustering approach called distributed information bottleneck (DIB). DIB adopts a two stage agglomerative information bottleneck (aIB) algorithm to generate local clusters. At the first stage, the high-dimensional document vector is significantly reduced by finding word-clusters. These word-clusters are then used to obtain document-clusters in the second stage. DIB then extracts compact but informative local models from these document-clusters and transfers them to a central site. At the global site, the local models, that are likely to describe the same document set, are first combined. The resultant local models are then clustered by using the aIB algorithm to produce a hierarchical organization of all distributed documents. Our experimental results demonstrate the robustness, efficiency and effectiveness of DIB approach to cluster distributed documents.
Keywords :
distributed processing; document handling; pattern clustering; agglomerative information bottleneck; distributed document clustering; distributed information bottleneck; high dimensional distributed documents; high-dimensional document vector; knowledge extraction; local models; word-clusters; Clustering algorithms; Computational intelligence; Computer science; Costs; Data mining; Distributed computing; IEEE online publications; Robustness; Software libraries; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
Type :
conf
DOI :
10.1109/CIDM.2007.368899
Filename :
4221323
Link To Document :
بازگشت