Title of article :
A statistical approach to knowledge discovery: Bootstrap analysis of language models for knowledge base population from unstructured text
Author/Authors :
Momtazi, Saeedeh Department of Computer Engineering and Information Technology - Amirkabir University of Technology, Tehran , Moradiannasab, Omid Department of Computational Linguistics and Phonetics - Saarland University, Saarbrucken, Germany
Abstract :
In this paper, we propose a novel approach for knowledge discovery from textual data. The generated knowledge base can be used as one of the main components in the cognitive process of question answering systems. The proposed model automatically extract relations between named enti- ties in Persian. Our proposed model is a bootstrapping approach based on n-gram model to nd the representative textual patterns of relations as n-grams in order to extract new knowledge about given named entities. The main motivation for this work is the characteristic of the sentence structure in Persian which, in contrary to English sentences, is in subject- object-verb format. The proposed approach is a purely statistical one and no background knowledge of the target language is required. This makes our method applicable to any open domain relation extraction task. How- ever, as for our test-bed, we focus on the domain of biographical data of international poets and scientists to build a knowledge base about them. Qualitative evaluations based on human assessment is an evidence for the ecacy of our method.
Keywords :
Computational linguistics , information extraction , statistical language modeling , n-gram Model , relation extraction , textual pattern acquisition
Journal title :
Scientia Iranica(Transactions D: Computer Science and Electrical Engineering)