Title :
Clustering of web search results based on an Iterative Fuzzy C-means Algorithm and Bayesian Information Criterion
Author :
Cobos, Carlos ; Mendoza, M. ; Leon, Errol ; Manic, Milos ; Herrera-Viedma, Enrique
Author_Institution :
Comput. Sci. Dept., Univ. del Cauca, Popayan, Colombia
Abstract :
The clustering of web search has become a very interesting research area among academic and scientific communities involved in information retrieval. Clustering of web search result systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for web document clustering already exist, but results show there is room for more to be done. This paper introduces a new description-centric algorithm for clustering of web results called IFCWR. IFCWR initially selects a maximum estimated number of clusters using Forgy´s strategy, then it iteratively merges clusters until results cannot be improved. Every merge operation implies the execution of Fuzzy C-Means for clustering results of web search and the calculus of Bayesian Information Criterion for automatically evaluating the best solution and number of clusters. IFCWR was compared against other established web document clustering algorithms, among them: Suffix Tree Clustering and Lingo. Comparison was executed on AMBIENT and MORESQUE datasets, using precision, recall, f-measure, SSLk and other metrics. Results show a considerable improvement in clustering quality and performance.
Keywords :
Internet; belief networks; document handling; iterative methods; pattern clustering; AMBIENT dataset; Bayesian information criterion; IFCWR; Lingo; MORESQUE dataset; SSLk metric; Web clustering engines; Web document clustering; Web search results clustering; clustering performance; clustering quality; description-centric algorithm; f-measure metric; iterative fuzzy C-means algorithm; merge operation; precision metric; recall metric; suffix tree clustering; Accuracy; Algorithm design and analysis; Bayes methods; Clustering algorithms; Educational institutions; Partitioning algorithms; Web search; bayesian information criterion; fuzzy c-means; web document clustering;
Conference_Titel :
IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint
Conference_Location :
Edmonton, AB
DOI :
10.1109/IFSA-NAFIPS.2013.6608452