• DocumentCode
    2910973
  • Title

    Building a Rule-Based Malay Text Segmentation Tool

  • Author

    Ranaivo-Malançon, Bali

  • Author_Institution
    Fac. of Comput. Sci. & Inf. Technol., Univ. Malaysia Sarawak, Kuching, Malaysia
  • fYear
    2011
  • fDate
    15-17 Nov. 2011
  • Firstpage
    276
  • Lastpage
    279
  • Abstract
    This paper presents the different problems that need to be taken into account in building a rule-based Malay text segmentation tool that can split a text into sentences and tokens. The tool was compared to English and Malay tokenisers to highlight the characteristics of Malay texts.
  • Keywords
    natural language processing; text analysis; English tokeniser; Malay text characteristics; Malay tokeniser; rule-based Malay text segmentation tool; text sentence; text token; Buildings; Cleaning; Compounds; Context; Tagging; Terminology; White spaces; Malay sentence splitter; Malay tokeniser; Text segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2011 International Conference on
  • Conference_Location
    Penang
  • Print_ISBN
    978-1-4577-1733-8
  • Type

    conf

  • DOI
    10.1109/IALP.2011.42
  • Filename
    6121520