Title :
Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion
Author :
Cobos, Carlos ; Andrade, Jennifer ; Constain, William ; Mendoza, Martha ; León, Elizabeth
Author_Institution :
Univ. of Cauca, Popayan, Colombia
Abstract :
This paper introduces a new description-centric algorithm for web document clustering based on the hybridization of the Global-Best Harmony Search with the K-means algorithm, Frequent Term Sets and Bayesian Information Criterion. The new algorithm defines the number of clusters automatically. The Global-Best Harmony Search provides a global strategy for a search in the solution space, based on the Harmony Search and the concept of swarm intelligence. The K-means algorithm is used to find the optimum value in a local search space. Bayesian Information Criterion is used as a fitness function, while FP-Growth is used to reduce the high dimensionality in the vocabulary. This resulting algorithm, called IGBHSK, was tested with data sets based on Reuters-21578 and DMOZ, obtaining promising results (better precision results than a Singular Value Decomposition algorithm). Also, it was also then evaluated by a group of users.
Keywords :
Bayes methods; Internet; document handling; particle swarm optimisation; pattern clustering; search problems; Bayesian information criterion; Web document clustering; description-centric algorithm; frequent term sets; global-best harmony search; k-means clustering; local search space; swarm intelligence; vocabulary; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Heuristic algorithms; Nickel; Partitioning algorithms; Vocabulary;
Conference_Titel :
Evolutionary Computation (CEC), 2010 IEEE Congress on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6909-3
DOI :
10.1109/CEC.2010.5586109