DocumentCode :
3726817
Title :
Malay document clustering using complete linkage clustering technique with Cosine Coefficient
Author :
Nurazzah Abd Rahman;Zainab Abu Bakar;Nurul Syeilla Syazhween Zulkefli
Author_Institution :
Department of Computer Science, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia
fYear :
2015
Firstpage :
103
Lastpage :
107
Abstract :
Finding useful and relevant information is a very challenging task to the user. The retrieval system usually responded with a long listed documents which are not necessarily relevant to the user´s need. Document clustering is a special technique that can sort out the documents effectively so that documents in the same cluster are similar to each other and documents in different cluster are dissimilar to each other. This paper focuses on document clustering for Malay test collection. It consists of 2028 Malay translated Hadith documents from book Sahih Bukhari. This paper presents the results using Complete Linkage Clustering algorithm with Cosine Coefficient on Malay translated Hadith documents. The evaluation of the experiments uses Recall (R), Precision (P) and Effectiveness (E) measure. The experiments is conducted on 100 clusters, 50 clusters and 20 clusters. It shows that the smaller the size of clusters, Recall (R) will increase, but Precision (P) will decrease. Results for Effectiveness (E) measure compared to the non-clustered documents show that applying clustering algorithm will improved the effectiveness of searching process. For this experiment 20 clusters is rather effective compared to the others.
Keywords :
"Clustering algorithms","Couplings","Conferences","Open systems","Partitioning algorithms","Search engines","Indexes"
Publisher :
ieee
Conference_Titel :
Open Systems (ICOS), 2015 IEEE Confernece on
Type :
conf
DOI :
10.1109/ICOS.2015.7377286
Filename :
7377286
Link To Document :
بازگشت