DocumentCode :
1550813
Title :
Decentralized Probabilistic Text Clustering
Author :
Papapetrou, Odysseas ; Siberski, Wolf ; Fuhr, Norbert
Author_Institution :
Technical University of Crete, Chania
Volume :
24
Issue :
10
fYear :
2012
Firstpage :
1848
Lastpage :
1861
Abstract :
Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly distributed environments, such as peer-to-peer networks. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 1 million peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm.
Keywords :
Clustering algorithms; Computational modeling; Frequency estimation; Indexing; Peer to peer computing; Probabilistic logic; Distributed clustering; text clustering.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2011.120
Filename :
5871622
Link To Document :
بازگشت