مرکز منطقه ای اطلاع رساني علوم و فناوري - A Bayesian framework for fusing multiple word knowledge models in videotext recognition

DocumentCode :

1643797

Title :

A Bayesian framework for fusing multiple word knowledge models in videotext recognition

Author :

Zhang, DongQing ; Chang, Shih-Fu

Author_Institution :

Dept. of Electr. Eng., Columbia Univ., New York, NY, USA

Volume :

fYear :

2003

Abstract :

Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. The proposed methods also reduce false videotext detection significantly, with a false alarm rate of 8.2% without substantial loss of recall.

Keywords :

belief networks; computer vision; image resolution; learning (artificial intelligence); optical character recognition; text analysis; video coding; Bayesian framework; Bayesian model; British National Corpus; EM; Expectation-Maximization; OCR; back-off smoothing approach; closed caption word; cluttered background; font style; image interpolation; learning approach; lexicon correction; mixture model; multimodality language model; multiple frame averaging; multiple knowledge combination; multiple word knowledge model; optical character recognition; unique time distance distribution model; videotext recognition; videotext word; word recognition; Bayesian methods; Character recognition; Dictionaries; Fuses; Image recognition; Interpolation; Layout; Optical character recognition software; Smoothing methods; Video sharing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on

ISSN :

1063-6919

Print_ISBN :

0-7695-1900-8

Type :

conf

DOI :

10.1109/CVPR.2003.1211512

Filename :

1211512

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1643797