Title :
Web Pages Classification and Clustering by Means of Genetic Algorithms: A Variable Size Page Representing Approach
Author :
Hossaini, Zahra ; Rahmani, Amir Masoud ; Setayeshi, Dr Saied
Author_Institution :
Sci. & Res. Branch, Islamic Azad Univ., Tehran, Iran
Abstract :
Arranging mass of data in related groups is an important way that helps us to decide about them better, clustering and classification are two efficient methods of grouping huge volume of data, most of clustering and classification methods that work on Web pages grouping problems, use fixed size vectors in their learning algorithm. In the real world of WWW this assumption is not reliable. In this paper we use genetic algorithm (GA) for classification and clustering, the algorithm works on variable size vectors. At the GA part we combined standard crossover and mutation operators with K-means algorithm, for improving diversity and correctness of results. By means of this method more accurate classes are achieved, and their subclasses are defined as clusters. This method shows more accurate results than fixed size methods, the accuracy rate is about 90.7%, and also overload of unnecessary elements in vectors is bypassed.
Keywords :
Internet; dictionaries; document handling; genetic algorithms; learning (artificial intelligence); mathematical operators; pattern classification; pattern clustering; K-means algorithm; WWW; Web pages classification; Web pages clustering; Web pages grouping problem; World Wide Web; fixed size vector; genetic algorithm; learning algorithm; mutation operator; standard crossover operator; variable size page representing approach; Genetic algorithms; Web pages; classification; clustering; genetic algorithm; variable size vector;
Conference_Titel :
Computational Intelligence for Modelling Control & Automation, 2008 International Conference on
Conference_Location :
Vienna
Print_ISBN :
978-0-7695-3514-2
DOI :
10.1109/CIMCA.2008.151