DocumentCode :
3639667
Title :
Learning taxonomic relations from a set of text documents
Author :
Mari-Sanna Paukkeri;Alberto Pérez García-Plaza;Sini Pessala;Timo Honkela
Author_Institution :
Aalto University School of Science and Technology, Adaptive Informatics Research Centre, P.O. Box 15400, FI-00076, Finland
fYear :
2010
Firstpage :
105
Lastpage :
112
Abstract :
This paper presents a methodology for learning taxonomic relations from a set of documents that each explain one of the concepts. Three different feature extraction approaches with varying degree of language independence are compared in this study. The first feature extraction scheme is a language-independent approach based on statistical keyphrase extraction, and the second one is based on a combination of rule-based stemming and fuzzy logic-based feature weighting and selection. The third approach is the traditional tf-idf weighting scheme with commonly used rule-based stemming. The concept hierarchy is obtained by combining Self-Organizing Map clustering with agglomerative hierarchical clustering. Experiments are conducted for both English and Finnish. The results show that concept hierarchies can be constructed automatically also by using statistical methods without heavy language-specific preprocessing.
Keywords :
"Ontologies","Feature extraction","Encyclopedias","Internet","Electronic publishing","Taxonomy"
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
ISSN :
2157-5525
Print_ISBN :
978-1-4244-6432-6
Electronic_ISBN :
2157-5533
Type :
conf
DOI :
10.1109/IMCSIT.2010.5679865
Filename :
5679865
Link To Document :
بازگشت