DocumentCode :
3686724
Title :
Concepts extraction from unstructured Polish texts: A rule based approach
Author :
Piotr Szwed
Author_Institution :
AGH University of Science and Technology, Poland
fYear :
2015
Firstpage :
355
Lastpage :
364
Abstract :
We present recently developed solution allowing extraction of concepts from unstructured Polish texts with special focus on correct morphological forms of obtained concept names. As Polish is a highly inflected language, detected names need to be transformed following Polish grammar rules. We propose a user-friendly method for specification of transformation patterns, which is based on a simple annotations language. Annotations prepared by a user are compiled into transformation rules. During the concept extraction process the input document is split into sentences and the rules are applied to sequences of words comprised in sentences. Recognized strings forming concept names are aggregated at various levels and assigned with scores. We report also results of initial experiments performed on a medical text.
Keywords :
"Dictionaries","Speech","Compounds","Grammar","Libraries","Feature extraction","Ontologies"
Publisher :
ieee
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on
Type :
conf
DOI :
10.15439/2015F280
Filename :
7321465
Link To Document :
بازگشت