DocumentCode :
594765
Title :
Efficient incremental phrase-based document clustering
Author :
Bakr, A.M. ; Yousri, Noha A. ; Ismail, Muhammad Ali
Author_Institution :
Comput. & Syst. Eng., Univ. of Alexandria, Alexandria, Egypt
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
517
Lastpage :
520
Abstract :
Document clustering has become inevitable for applications that aim to extract information from huge corpuses. Such applications face two main challenges; one is the efficient representation of the documents, along with using an efficient similarity measure, and the second is dealing with the dynamic nature of the corpus. In this paper, an efficient document clustering model is introduced for incrementally storing and updating clusters of a dataset. A new phrase-based similarity method is developed along with the model to calculate the similarity between documents and clusters. Experimental results show that the new clustering model can achieve more accurate results than the traditional algorithms.
Keywords :
information retrieval; pattern clustering; text analysis; corpus; dataset clustering; document representation; incremental phrase-based document clustering; information extraction; phrase-based similarity method; similarity measure; Accuracy; Clustering algorithms; Computational modeling; Equations; Indexes; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460185
Link To Document :
بازگشت