DocumentCode
2541864
Title
A new approach to sort Unicode Bengali text
Author
Rahman, Md Ahsanur ; Sattar, Md Abdus
Author_Institution
Dept. of CSE, Bangladesh Univ. of Eng. & Technol., Dhaka
fYear
2008
fDate
20-22 Dec. 2008
Firstpage
628
Lastpage
630
Abstract
Character order in unicode for Bengali is different from the sorting order suggested by the governing authority. As a result, simple letter by letter comparison does not yield correct order of Bengali words. The presence of modifier characters in Bengali made the situation more complicated. The objective of our study is to adapt the suggested collation order for unicode represented Bengali text while achieving maximum possible efficiency. Here we propose an algorithm for this purpose. The proposed algorithm is applicable to any chosen sorting order. Also it compares words in O(1) time, irrespective of their lengths. Thus complexity of sorting texts is always O(n log n).
Keywords
computational complexity; natural language processing; text analysis; O(n log n); character order; unicode Bengali text; Dictionaries; Natural languages; Sorting; Standardization;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering, 2008. ICECE 2008. International Conference on
Conference_Location
Dhaka
Print_ISBN
978-1-4244-2014-8
Electronic_ISBN
978-1-4244-2015-5
Type
conf
DOI
10.1109/ICECE.2008.4769285
Filename
4769285
Link To Document