• DocumentCode
    618799
  • Title

    A hybrid approach to Lao word segmentation using longest syllable level matching with named entities recognition

  • Author

    Srithirath, Arounyadeth ; Seresangtakul, Pusadee

  • Author_Institution
    Dept. of Comput. Sci., Khon Kaen Univ., Khon Kaen, Thailand
  • fYear
    2013
  • fDate
    15-17 May 2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The Lao language is written without words delimiter which makes it extremely difficult to process. The development of automatic word segmentation for natural language processing for the Lao language is an essential but challenging task. This paper proposes a longest syllable level match with named entities recognition approach for Lao word segmentation. Syllables were first extracted from the input text and then longest matching was applied. This is one of the techniques in the Dictionary Based approach with named entities recognition being used to combine them to form the words. The performance result obtained from this approach, in precision and recall, was 85.21% and 92.36%, respectively.
  • Keywords
    dictionaries; natural language processing; pattern matching; Lao language; Lao word segmentation; automatic word segmentation; dictionary based approach; longest syllable level matching; named entities recognition approach; natural language processing; syllable extraction; Dictionaries; Educational institutions; Indexes; Natural language processing; Nickel; Lao word segmentation; dictionary based; longest matching; named entities recognition; syllable extraction; tokenization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2013 10th International Conference on
  • Conference_Location
    Krabi
  • Print_ISBN
    978-1-4799-0546-1
  • Type

    conf

  • DOI
    10.1109/ECTICon.2013.6559585
  • Filename
    6559585