Title :
Segmenting bangla text for optical recognition
Author :
Sattar, Md A. ; Mahmud, Khaled ; Arafat, Humayun ; Zaman, A. F M Noor Uz
Author_Institution :
Dept. of Comput. Sci. & Eng., Bangladesh Univ. of Eng. & Technol. (BUET), Dhaka
Abstract :
One of the important reasons for poor recognition rate in optical character recognition (OCR) system is the error in character segmentation. Existence of different type of characters in the scanned documents is a major problem to design an effective character segmentation procedure. In this paper, a new technique is presented for identification and segmentation of Bengali printed characters. This paper focuses on the segmentation of printed Bengali characters for efficient recognition of the characters. Our Line segmentation success rate is 99.7 % for 1000 lines, we have tested. Our Word segmentation success rate is 99.8 % for 4900 words tested. From the experiment we noticed that isolated characters fall into isolated group in 99.50 % cases. Most of the errors come from connected characters and characters having tau in front of them as segmenting tau we take the help of width. From the experiment we noticed that most of the errors came from components having multi-touching points between two characters.
Keywords :
natural language processing; optical character recognition; text analysis; Bengali printed characters; character segmentation procedure; line segmentation; optical character recognition; text segmentation; word segmentation; Character recognition; Computer errors; Computer science; Handwriting recognition; Image segmentation; Optical character recognition software; Optical distortion; Optical noise; Testing; Text recognition; Bangla Language Processing; Bangla OCR; Bangla Text segmentation;
Conference_Titel :
Computer and information technology, 2007. iccit 2007. 10th international conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4244-1550-2
Electronic_ISBN :
978-1-4244-1551-9
DOI :
10.1109/ICCITECHN.2007.4579373