DocumentCode :
3059458
Title :
Text segmentation in Polish
Author :
Mazur, Pawel P.
Author_Institution :
Wroclaw Univ., Poland
fYear :
2005
fDate :
8-10 Sept. 2005
Firstpage :
43
Lastpage :
48
Abstract :
In the paper a great importance of text segmentation in natural language engineering and in artificial intelligence systems has been pointed out. It has been shown that in Polish all punctuation marks that end sentences have also other functions in sentences. In this context various approaches to sentence boundary disambiguation have been presented. Taking features of Polish into consideration, text tokenization has been analysed. The direction of empirical research on Polish texts segmentation based on the analysis contained in this paper has been drawn. Also the list of Polish abbreviations that have the same spelling as some common words has been presented.
Keywords :
grammars; natural languages; text analysis; Polish text segmentation; artificial intelligence system; natural language engineering; punctuation mark; Algorithm design and analysis; Artificial intelligence; Computer applications; Computer interfaces; Internet; Natural language processing; Natural languages; Paper technology; Text processing; User interfaces;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications, 2005. ISDA '05. Proceedings. 5th International Conference on
Print_ISBN :
0-7695-2286-6
Type :
conf
DOI :
10.1109/ISDA.2005.89
Filename :
1578758
Link To Document :
بازگشت