Title :
Self-organized hierarchical k-means file clustering algorithm based on P2P sharing directories and its application
Author :
Lei, Kai ; Han, Lu ; Chen, WenHan ; Yuan, Husheng ; Sun, Tao
Author_Institution :
Center for Internet Res. & Eng. (CIRE), Peking Univ., Shenzhen, China
Abstract :
In order to improve the recall of search results and calculate file relevancy in P2P sharing systems, a file clustering algorithm using self-organized k-means method was proposed, which is based on the hierarchical structure of file sharing directories and file names´ implication of classification. With uploaded file path information built into the indexes, a tree-structure like vector space model was designed. After analyzing the advantages and shortcomings of the traditional k-means method, we implemented a revised self-organized k-means algorithm. This algorithm can easily calculate the distances among file categories and file relevancies in a same category by adjusting two thresholds called as ¿combined factor¿ and ¿correlation factor¿. From the experiment and evaluation results, this model indicated that more target files can be found and improved recall rate to 83.54% and precision of the information retrieval to 85%.
Keywords :
indexing; pattern classification; pattern clustering; peer-to-peer computing; query formulation; tree data structures; P2P sharing directories; classification implication; combined factor; correlation factor; file names; file relevancy; indexes; information retrieval; search result; selforganized hierarchical k-means file clustering algorithm; tree-structure like vector space model; uploaded file path information; Algorithm design and analysis; Cities and towns; Clustering algorithms; Content based retrieval; File servers; Information retrieval; Internet; Peer to peer computing; Sun; Topology; P2P Search; VSM; file clustering; self-organized K-means;
Conference_Titel :
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-4754-1
Electronic_ISBN :
978-1-4244-4738-1
DOI :
10.1109/ICICISYS.2009.5357814