Title :
Building a Rule-Based Malay Text Segmentation Tool
Author :
Ranaivo-Malançon, Bali
Author_Institution :
Fac. of Comput. Sci. & Inf. Technol., Univ. Malaysia Sarawak, Kuching, Malaysia
Abstract :
This paper presents the different problems that need to be taken into account in building a rule-based Malay text segmentation tool that can split a text into sentences and tokens. The tool was compared to English and Malay tokenisers to highlight the characteristics of Malay texts.
Keywords :
natural language processing; text analysis; English tokeniser; Malay text characteristics; Malay tokeniser; rule-based Malay text segmentation tool; text sentence; text token; Buildings; Cleaning; Compounds; Context; Tagging; Terminology; White spaces; Malay sentence splitter; Malay tokeniser; Text segmentation;
Conference_Titel :
Asian Language Processing (IALP), 2011 International Conference on
Conference_Location :
Penang
Print_ISBN :
978-1-4577-1733-8
DOI :
10.1109/IALP.2011.42