مرکز منطقه ای اطلاع رساني علوم و فناوري - Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus

DocumentCode :

3539336

Title :

Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus

Author :

de Silva, Lakdeepal ; Jayaratne, Lakshman

Author_Institution :

Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka

fYear :

2009

fDate :

4-6 Aug. 2009

Firstpage :

446

Lastpage :

451

Abstract :

This paper introduces WikiOnto: a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus derived from Wikipedia. Based on the Wikipedia XML Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using natural language processing (NLP) and other machine learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well.

Keywords :

XML; learning (artificial intelligence); natural language processing; ontologies (artificial intelligence); Wikipedia XML Corpus; machine learning techniques; natural language processing; ontologies; preprocessed document corpus; semi-automatic extraction; semi-structured document corpora; Buildings; Data mining; Machine learning; Natural language processing; Ontologies; Prototypes; Relational databases; Web pages; Wikipedia; XML;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the

Conference_Location :

London

Print_ISBN :

978-1-4244-4456-4

Electronic_ISBN :

978-1-4244-4457-1

Type :

conf

DOI :

10.1109/ICADIWT.2009.5273871

Filename :

5273871

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3539336