Title :
Knowledge-based Named Entity recognition in Polish
Author_Institution :
Jagiellonian Univ., Kraków, Poland
Abstract :
This document describes an algorithm aimed at recognizing Named Entities in Polish text, which is powered by two knowledge sources: the Polish Wikipedia and the Cyc ontology. Besides providing the rough types for the recognized entities, the algorithm links them to the Wikipedia pages and assigns precise semantic types taken from Cyc. The algorithm is verified against manually identified Named Entities in the one-million sub-corpus of the National Corpus of Polish.
Keywords :
Web sites; natural language processing; ontologies (artificial intelligence); pattern classification; text analysis; Cyc ontology; Polish National Corpus; Polish Wikipedia; Polish text; knowledge-based named entity recognition; Electronic publishing; Encyclopedias; Hidden Markov models; Internet; Ontologies; Semantics;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on