• DocumentCode
    2541864
  • Title

    A new approach to sort Unicode Bengali text

  • Author

    Rahman, Md Ahsanur ; Sattar, Md Abdus

  • Author_Institution
    Dept. of CSE, Bangladesh Univ. of Eng. & Technol., Dhaka
  • fYear
    2008
  • fDate
    20-22 Dec. 2008
  • Firstpage
    628
  • Lastpage
    630
  • Abstract
    Character order in unicode for Bengali is different from the sorting order suggested by the governing authority. As a result, simple letter by letter comparison does not yield correct order of Bengali words. The presence of modifier characters in Bengali made the situation more complicated. The objective of our study is to adapt the suggested collation order for unicode represented Bengali text while achieving maximum possible efficiency. Here we propose an algorithm for this purpose. The proposed algorithm is applicable to any chosen sorting order. Also it compares words in O(1) time, irrespective of their lengths. Thus complexity of sorting texts is always O(n log n).
  • Keywords
    computational complexity; natural language processing; text analysis; O(n log n); character order; unicode Bengali text; Dictionaries; Natural languages; Sorting; Standardization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2008. ICECE 2008. International Conference on
  • Conference_Location
    Dhaka
  • Print_ISBN
    978-1-4244-2014-8
  • Electronic_ISBN
    978-1-4244-2015-5
  • Type

    conf

  • DOI
    10.1109/ICECE.2008.4769285
  • Filename
    4769285