Title :
Integrating co-training and recognition for text detection
Author :
Wu, Wen ; Chen, Datong ; Yang, Jie
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Training a good text detector requires a large amount of labeled data, which can be very expensive to obtain. Co-training has been shown to be a powerful semi-supervised learning tool for solving many problems using a large amount of unlabeled data. However, augmented data from a co-training process could potentially degrade the performance of classifiers due to added noises from unlabeled data. This paper makes two contributions by proposing a modified co-training scheme for text detection. First, to get cleaner augmented data, the new algorithm integrates some authority knowledge of unlabeled data into co-training. Text recognition output of each selected unlabeled image patch is used as the authority that is combined with classifier prediction to decide if the sample will be added to the augmented set. Second, instead of evenly combining predictions of two co-training classifiers, a weighted combination is learned and used to produce the final prediction. Contributions of the new algorithm have been evaluated on a standard text detection dataset.
Keywords :
character recognition; image classification; learning (artificial intelligence); text analysis; augmented data; authority knowledge; classifier prediction; modified cotraining scheme; semisupervised learning tool; standard text detection dataset; text recognition; unlabeled image patch; weighted combination; Computer science; Degradation; Detectors; Image edge detection; Semisupervised learning; Supervised learning; Testing; Text recognition; Training data; Videos;
Conference_Titel :
Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on
Print_ISBN :
0-7803-9331-7
DOI :
10.1109/ICME.2005.1521634