مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning taxonomic relations from a set of text documents

DocumentCode :

3639667

Title :

Learning taxonomic relations from a set of text documents

Author :

Mari-Sanna Paukkeri;Alberto Pérez García-Plaza;Sini Pessala;Timo Honkela

Author_Institution :

Aalto University School of Science and Technology, Adaptive Informatics Research Centre, P.O. Box 15400, FI-00076, Finland

fYear :

2010

Firstpage :

105

Lastpage :

112

Abstract :

This paper presents a methodology for learning taxonomic relations from a set of documents that each explain one of the concepts. Three different feature extraction approaches with varying degree of language independence are compared in this study. The first feature extraction scheme is a language-independent approach based on statistical keyphrase extraction, and the second one is based on a combination of rule-based stemming and fuzzy logic-based feature weighting and selection. The third approach is the traditional tf-idf weighting scheme with commonly used rule-based stemming. The concept hierarchy is obtained by combining Self-Organizing Map clustering with agglomerative hierarchical clustering. Experiments are conducted for both English and Finnish. The results show that concept hierarchies can be constructed automatically also by using statistical methods without heavy language-specific preprocessing.

Keywords :

"Ontologies","Feature extraction","Encyclopedias","Internet","Electronic publishing","Taxonomy"

Publisher :

ieee

Conference_Titel :

Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on

ISSN :

2157-5525

Print_ISBN :

978-1-4244-6432-6

Electronic_ISBN :

2157-5533

Type :

conf

DOI :

10.1109/IMCSIT.2010.5679865

Filename :

5679865

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3639667