Title :
A language independent text segmentation technique based on naive bayes classifier
Author :
Bidgoli, A.M. ; Boraghi, M.
Author_Institution :
North Tehran Branch, Islamic Azad Univ., Tehran, Iran
Abstract :
One of the important stages for optical character recognition system is text components segmentation from non-text components of input images. In this paper a machine learning technique based on a naive bayes classifier is developed for text components segmentation. In training stage, a simple procedure is used to generate a large collection of training data sets for learning the classifier. A collection of manuscript and printed Persian and English pictorial Images that have been manually separated, have been used for training. A proper post-processing is applied to improve the segmentation results. Several representative document images scanned from Persian, English and Chinese handwritings and printed documents are employed to verify the effectiveness of the developed algorithm.
Keywords :
Bayes methods; character recognition; document image processing; image segmentation; learning (artificial intelligence); Chinese handwritings; English pictorial Images; Persian pictorial Images; document images; language independent text segmentation technique; machine learning technique; naive bayes classifier; nontext components; optical character recognition system; text components segmentation; training data sets; Classification algorithms; Equations; Image edge detection; Image segmentation; Mathematical model; Training; Training data; Documents Image Analyses; Naive Bayes Classifier; OCR; Text Segmentation;
Conference_Titel :
Signal and Image Processing (ICSIP), 2010 International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4244-8595-6
DOI :
10.1109/ICSIP.2010.5697433