DocumentCode :
2663649
Title :
Web Pages Classification and Clustering by Means of Genetic Algorithms: A Variable Size Page Representing Approach
Author :
Hossaini, Zahra ; Rahmani, Amir Masoud ; Setayeshi, Dr Saied
Author_Institution :
Sci. & Res. Branch, Islamic Azad Univ., Tehran, Iran
fYear :
2008
fDate :
10-12 Dec. 2008
Firstpage :
436
Lastpage :
440
Abstract :
Arranging mass of data in related groups is an important way that helps us to decide about them better, clustering and classification are two efficient methods of grouping huge volume of data, most of clustering and classification methods that work on Web pages grouping problems, use fixed size vectors in their learning algorithm. In the real world of WWW this assumption is not reliable. In this paper we use genetic algorithm (GA) for classification and clustering, the algorithm works on variable size vectors. At the GA part we combined standard crossover and mutation operators with K-means algorithm, for improving diversity and correctness of results. By means of this method more accurate classes are achieved, and their subclasses are defined as clusters. This method shows more accurate results than fixed size methods, the accuracy rate is about 90.7%, and also overload of unnecessary elements in vectors is bypassed.
Keywords :
Internet; dictionaries; document handling; genetic algorithms; learning (artificial intelligence); mathematical operators; pattern classification; pattern clustering; K-means algorithm; WWW; Web pages classification; Web pages clustering; Web pages grouping problem; World Wide Web; fixed size vector; genetic algorithm; learning algorithm; mutation operator; standard crossover operator; variable size page representing approach; Genetic algorithms; Web pages; classification; clustering; genetic algorithm; variable size vector;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence for Modelling Control & Automation, 2008 International Conference on
Conference_Location :
Vienna
Print_ISBN :
978-0-7695-3514-2
Type :
conf
DOI :
10.1109/CIMCA.2008.151
Filename :
5172665
Link To Document :
بازگشت