Title :
Lexicon assistance reduces manual verification of OCR output
Author :
Hauser, S.E. ; Browne, A.C. ; Thoma, G.R. ; McCray, A.T.
Author_Institution :
Lister Hill Nat. Center for Biomed. Commun., Nat. Libr. of Med., Bethesda, MD, USA
Abstract :
An OCR system chosen for its high recognition rate and low percent of false positives also assigns low confidence values to many characters that are actually correct. Human operators must verify all words containing low-confidence characters. We describe the creation of a lexicon optimized for automatically selectively resetting confidence values to high, thus reducing operator verification time. Two word lists, OCR Correct and OCR Incorrect, were extracted from files that had already been processed and verified, and became the standard for comparing candidate lexicons. A lexicon was selected from several candidate word lists maintained by the National Library of Medicine (NLM). In operation for about six months, lexicon-assisted verification has been reducing the number of words requiring operator verification by over 50%
Keywords :
glossaries; list processing; optical character recognition; National Library of Medicine; automatic selective value resetting; candidate word lists; confidence values; false positives; lexicon-assisted OCR output verification; low-confidence characters; manual verification; operator verification time; optical character recognition; recognition rate; Abstracts; Biomedical communication; Biomedical imaging; Character recognition; Humans; Image converters; Image databases; Libraries; Optical character recognition software; Research and development;
Conference_Titel :
Computer-Based Medical Systems, 1998. Proceedings. 11th IEEE Symposium on
Conference_Location :
Lubbock, TX
Print_ISBN :
0-8186-8564-6
DOI :
10.1109/CBMS.1998.701267