DocumentCode :
258849
Title :
A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images
Author :
Angadi, S.A. ; Kodabagi, M.M.
Author_Institution :
Dept. of Comput. Sci. & Eng., Basaveshwar Eng. Coll., Bagalkot, India
fYear :
2014
fDate :
8-10 Jan. 2014
Firstpage :
42
Lastpage :
49
Abstract :
Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240x320, 600x800 and 900x1200. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.
Keywords :
character recognition; feature extraction; image resolution; image segmentation; natural language processing; pattern clustering; statistics; text analysis; Kannada text; character extraction; character image extraction; k-means clustering; line extraction; low resolution display board images; pixel distribution statistics; reliable extraction; robust segmentation technique; vertical profile features; word extraction; Accuracy; Algorithm design and analysis; Equations; Feature extraction; Image resolution; Image segmentation; Vectors; Display Boards; K-Means Clustering; Low Resolution Images; Projection Profile Features; Segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal and Image Processing (ICSIP), 2014 Fifth International Conference on
Conference_Location :
Jeju Island
Type :
conf
DOI :
10.1109/ICSIP.2014.11
Filename :
6754849
Link To Document :
بازگشت