Title :
Binarization of Color Characters in Scene Images Using k-means Clustering and Support Vector Machines
Author :
Kita, Kohei ; Wakahara, Toru
Author_Institution :
Fac. of Comput. & Inf. Sci., Hosei Univ., Koganei, Japan
Abstract :
This paper proposes a new technique for binalizing multicolored characters subject to heavy degradations. The key ideas are threefold. The first is generation of tentatively binarized images via every dichotomization of k clusters obtained by k-means clustering in the HSI color space. The total number of tentatively binarized images equals 2k-2. The second is use of support vector machines (SVM) to determine whether and to what degree each tentatively binarized image represents a character or non-character. We feed the SVM with mesh and weighted direction code histogram features to output the degree of “character-likeness.” The third is selection of a single binarized image with the maximum degree of “character likeness” as an optimal binarization result. Experiments using a total of 1000 single-character color images extracted from the ICDAR 2003 robust OCR dataset show that the proposed method achieves a correct binarization rate of 93.7%.
Keywords :
document image processing; image colour analysis; pattern clustering; support vector machines; text analysis; color characters binarization; k-means clustering; multicolored characters; scene images; support vector machines; Character recognition; Feature extraction; Histograms; Image color analysis; Pixel; Support vector machines; Training data; binarization of multicolored characters; figure-ground discrimination; k-means clustering; support vector machines;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.779