Title :
Automatic topic identification using webpage clustering
Author :
He, Xiaofeng ; Ding, Chris H Q ; Zha, Hongyuan ; Simon, Horst D.
Author_Institution :
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Abstract :
Grouping Web pages into distinct topics is one way of organizing the large amount of retrieved information on the Web. In this paper, we report that, based on a similarity metric, which incorporates textual information, hyperlink structure and co-citation relations, an unsupervised clustering method can automatically and effectively identify relevant topics, as shown in experiments on several retrieved sets of Web pages. The clustering method is a state-of-art spectral graph partitioning method based on the normalized cut criterion first developed for image segmentation
Keywords :
information analysis; information resources; information retrieval; pattern clustering; Web page clustering; automatic topic identification; co-citation relations; hyperlink structure; normalized cut criterion; similarity metric; spectral graph partitioning method; textual information; unsupervised clustering method; Clustering algorithms; Clustering methods; Computer science; Image segmentation; Information retrieval; Laboratories; Organizing; Search engines; Taxonomy; Web sites;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989518