مرکز منطقه ای اطلاع رساني علوم و فناوري - Semiautomatic Extraction of Topic Maps from Web Pages Using Clustering with Web Contents and Structure

DocumentCode :

2732045

Title :

Semiautomatic Extraction of Topic Maps from Web Pages Using Clustering with Web Contents and Structure

Author :

Mase, Motohiro ; Yamada, Seiji ; Nitta, Katsumi

Author_Institution :

Tokyo Inst. of Technol., Yokohama

fYear :

2007

fDate :

5-12 Nov. 2007

Firstpage :

208

Lastpage :

211

Abstract :

In this paper, we describe a method to semi- automatically extract Topic Maps from a set of Web pages. We introduce the following two points to the existing clustering method: The first is merging only the linked Web pages, to extract the underlying relationship of the topics. The second is introducing the similarity by contents of Web pages and the types of links, and the distance between the directories in which the pages are located, to generate dense clusters. We generate the topic map by assuming the clusters as topics, the edges as associations, the Web pages related to the topic as occurrences from the result of clustering. We experimentally extracted the topic map and evaluated it.

Keywords :

Web sites; data mining; Web contents; Web pages; Web structure; clustering method; semiautomatic extraction; topic maps; Conferences; Intelligent agent; Web pages; information extractionTopic Mapsclustering;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Web Intelligence and Intelligent Agent Technology Workshops, 2007 IEEE/WIC/ACM International Conferences on

Conference_Location :

Silicon Valley, CA

Print_ISBN :

0-7695-3028-1

Type :

conf

DOI :

10.1109/WI-IATW.2007.85

Filename :

4427573

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2732045