DocumentCode
2910973
Title
Building a Rule-Based Malay Text Segmentation Tool
Author
Ranaivo-Malançon, Bali
Author_Institution
Fac. of Comput. Sci. & Inf. Technol., Univ. Malaysia Sarawak, Kuching, Malaysia
fYear
2011
fDate
15-17 Nov. 2011
Firstpage
276
Lastpage
279
Abstract
This paper presents the different problems that need to be taken into account in building a rule-based Malay text segmentation tool that can split a text into sentences and tokens. The tool was compared to English and Malay tokenisers to highlight the characteristics of Malay texts.
Keywords
natural language processing; text analysis; English tokeniser; Malay text characteristics; Malay tokeniser; rule-based Malay text segmentation tool; text sentence; text token; Buildings; Cleaning; Compounds; Context; Tagging; Terminology; White spaces; Malay sentence splitter; Malay tokeniser; Text segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2011 International Conference on
Conference_Location
Penang
Print_ISBN
978-1-4577-1733-8
Type
conf
DOI
10.1109/IALP.2011.42
Filename
6121520
Link To Document