Title :
The implementation methodology for a CD-ROM English document database
Author :
Phillips, Ihsin T. ; Ha, Jaekyu ; Haralick, Robert M. ; Dori, Dov
Author_Institution :
Dept. of Comput. Sci., Seattle Univ., WA, USA
Abstract :
Producing a database of scanned document images for development or evaluation of OCR and document image understanding algorithms is neither easy nor inexpensive. The authors first briefly describe the makeup of a database of scanned document images of scientific and technical documents written in English which are being produced in a CD-ROM format. Then, the authors concentrate on the implementation methodology used to prepare the database. The methodology gives the protocols for each step of the database preparation, and the error model used for the estimation of the ground-truth errors that may exist in the database is discussed
Keywords :
CD-ROMs; database management systems; document handling; image scanners; natural languages; optical character recognition; CD-ROM English document database; CD-ROM format; OCR; database preparation; document image understanding algorithms; error model; ground-truth errors; implementation methodology; scanned document images; technical documents; CD-ROMs; Character recognition; Computer science; Image databases; Image recognition; Intelligent systems; Optical character recognition software; Permission; Software algorithms; Software performance;
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
DOI :
10.1109/ICDAR.1993.395690