Title :
Clustering Hyperlinks for Topic Extraction: An Exploratory Analysis
Author :
Villarreal, Sara Elena Gaza ; Elizalde, Lorena Martínez ; Viveros, Adriana Canseco
Author_Institution :
Tecnol. de Monterrey, Monterrey, Mexico
Abstract :
In a Web of increasing size and complexity, a key issue is automatic document organization, which includes topic extraction in collections. Since we consider topics as document clusters with semantic properties, we are concerned with exploring suitable clustering techniques for their identification on hyperlinked environments (where we only regard structural information). For this purpose, three algorithms (PDDP, k-means, and graph local clustering) were executed over a document subset of an increasingly popular corpus: Wikipedia. Results were evaluated with unsupervised metrics (cosine similarity, semantic relatedness, Jaccard index) and suggest that promising results can be produced for this particular domain.
Keywords :
Web sites; document handling; PDDP; Wikipedia; automatic document organization; clustering techniques; document clusters; graph local clustering; hyperlinked environments; hyperlinks clustering; k means; topic extraction; unsupervised metrics; Artificial intelligence; Clustering algorithms; Clustering methods; Data mining; Data visualization; Information retrieval; Partitioning algorithms; Semantic Web; Testing; Wikipedia; Wikipedia; graph local clustering; k-means; principal direction divisive partitioning; topic detection;
Conference_Titel :
Artificial Intelligence, 2009. MICAI 2009. Eighth Mexican International Conference on
Conference_Location :
Guanajuato
Print_ISBN :
978-0-7695-3933-1
DOI :
10.1109/MICAI.2009.20