Title :
Page clustering using a distance based algorithm
Author :
Mojica, Jairo Andrés ; Rojas, Diego Alexander ; Gómez, Jonatan ; González, Fabio
Author_Institution :
Intelligent Syst. Res. Lab., Nat. Univ. of Colombia, Colombia
fDate :
31 Oct.-2 Nov. 2005
Abstract :
This paper presents an application of a clustering algorithm based on gravitational forces to the problem of Web page clustering in a dynamic environment. The proposed algorithm uses a modification of the gravitational algorithm proposed by Gomez et al. but using only the distance measures (a notion of space is not required). This approach is useful when similarities (and/or then distances) between pages can be defined and compute quickly, but the definition of a space is computationally expensive. Experiments with data representing real URL´s and sessions are performed, and a comparison with the incremental connected components algorithm, which has been previously used to solve this problem, is done.
Keywords :
Internet; Web sites; data mining; pattern clustering; URL; Web page clustering; data representation; distance based algorithm; dynamic environment; gravitational algorithm; Clustering algorithms; Content based retrieval; Data mining; Extraterrestrial measurements; Information management; Information retrieval; Intelligent systems; Statistical analysis; Web mining; Web pages;
Conference_Titel :
Web Congress, 2005. LA-WEB 2005. Third Latin American
Print_ISBN :
0-7695-2471-0
DOI :
10.1109/LAWEB.2005.27