Title :
Text segmentation in Polish
Author_Institution :
Wroclaw Univ., Poland
Abstract :
In the paper a great importance of text segmentation in natural language engineering and in artificial intelligence systems has been pointed out. It has been shown that in Polish all punctuation marks that end sentences have also other functions in sentences. In this context various approaches to sentence boundary disambiguation have been presented. Taking features of Polish into consideration, text tokenization has been analysed. The direction of empirical research on Polish texts segmentation based on the analysis contained in this paper has been drawn. Also the list of Polish abbreviations that have the same spelling as some common words has been presented.
Keywords :
grammars; natural languages; text analysis; Polish text segmentation; artificial intelligence system; natural language engineering; punctuation mark; Algorithm design and analysis; Artificial intelligence; Computer applications; Computer interfaces; Internet; Natural language processing; Natural languages; Paper technology; Text processing; User interfaces;
Conference_Titel :
Intelligent Systems Design and Applications, 2005. ISDA '05. Proceedings. 5th International Conference on
Print_ISBN :
0-7695-2286-6
DOI :
10.1109/ISDA.2005.89