• DocumentCode
    2337360
  • Title

    A comparison of collation algorithm for Myanmar language

  • Author

    Yuzana ; Tun, Khin Marlar

  • Author_Institution
    Univ. of Comput. Studies, Yangon
  • fYear
    2008
  • fDate
    13-16 Nov. 2008
  • Firstpage
    538
  • Lastpage
    543
  • Abstract
    Myanmar language has no white spaces and word boundary. There is lack of support in Unicode database application such as collation and searching. Powerful collation strategy has necessitated to the all embracing research in the locality of natural language processing. Consequently, we propose a new collation algorithm MyCollate2 extend from MyCollate1 for Myanmar language. This collation algorithm is based on heuristics chart or table. This method foremost slices the syllables of names and then collates them according to the traditional standard Myanmar language dictionary book order. Propose new heuristics chart can work well not only for syllable segmentation but also for collation of words. This algorithm can collate Myanmar names as well as Myanmar words with complex syllable structure such as Pali, Pali loan styles, subscript styles and kinzi styles. This paper tested with Myanmar name, Pali words from Damma books and dictionary words from dictionary book. The experimental result shows that syllable slicing accuracy get 99.55% compare with others and show slicing performance. Collation accuracy gets 95.88% and is better accuracy than previous collation algorithm MyCollate1.
  • Keywords
    dictionaries; natural language processing; Damma books; MyCollate1; MyCollate2; Myanmar language dictionary; Myanmar name; Pali loan styles; Pali words; collation algorithm; collation strategy; dictionary words; heuristics chart; heuristics table; kinzi styles; natural language processing; subscript styles; syllable segmentation; syllable slicing; unicode database application; Books; Clustering algorithms; Databases; Dictionaries; Information retrieval; Libraries; Natural language processing; Natural languages; Sorting; White spaces;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2008. ICDIM 2008. Third International Conference on
  • Conference_Location
    London
  • Print_ISBN
    978-1-4244-2916-5
  • Electronic_ISBN
    978-1-4244-2917-2
  • Type

    conf

  • DOI
    10.1109/ICDIM.2008.4746740
  • Filename
    4746740