Title of article :
Improved Suffix Tree Clustering for Efficient Document Clustering
Author/Authors :
Sonia، نويسنده , , Niranjan Kumar، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Abstract :
Document clustering is a technology that puts pages into groups and is useful for categorizing, organizing, and refining search results. When clustering using only documents, Suffix Tree Clustering (STC) outperforms other clustering algorithms by making use of phrases andallowing clusters to overlap. STC is a linear time clustering which is based on identifying phrases that are common to groups of documents. STCtreats a document as a string, making use of proximity information between words, at the same time, it is incremental. Suffix Tree Clustering hasbeen proved to be a good approach for documents clustering. This paper introduces the suffix tree based document clustering with clusterranking function and a new ranked list of clusters after applying in the algorithm is introduced to overcome the problems with overlappingclusters. Using this method, we can get a better clustering result and effective number of clusters
Keywords :
Suffix Tree , document clustering , STC , Web Document , Ranked cluster
Journal title :
International Journal of Advanced Research in Computer Science
Journal title :
International Journal of Advanced Research in Computer Science