Title :
Extended character defect model for recognition of text from maps
Author :
Pezeshk, Aria ; Tutwiler, Richard L.
Author_Institution :
Appl. Res. Lab., Pennsylvania State Univ., State College, PA, USA
Abstract :
Topographic maps contain a small amount of text compared to other forms of printed documents. Furthermore, the text and graphical components typically intersect with one another thus making the extraction of text a very difficult task. Creating training sets with a suitable size from the actual characters in maps would therefore require the laborious processing of many maps with similar features and the manual extraction of character samples. This paper extends the types of defects represented by Baird´s document image degradation model in order to create pseudo randomly generated training sets that closely mimic the various artifacts and defects encountered in characters extracted from maps. Two Hidden Markov Models are then trained and used to recognize the text. Tests performed on extracted street labels show an improvement in performance from 88.4% when only the original Baird´s model is used to a character recognition rate of 93.2% when the extended defect model is used for training.
Keywords :
cartography; document image processing; hidden Markov models; learning (artificial intelligence); text analysis; document image degradation model; extended character defect model; hidden Markov models; pseudo randomly generated training sets; text recognition; topographic maps; Artificial neural networks; Character recognition; Data mining; Degradation; Feature extraction; Graphics; Hidden Markov models; Image recognition; Optical character recognition software; Text recognition; Hidden Markov Models; document image degradation model; feature extraction; text recognition;
Conference_Titel :
Image Analysis & Interpretation (SSIAI), 2010 IEEE Southwest Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-7801-9
DOI :
10.1109/SSIAI.2010.5483913