DocumentCode :
3306869
Title :
Post-processing of OCR results for automatic indexing
Author :
Wiedenhöfer, Lars ; Hein, Hans-Günther ; Dengel, Andreas
Author_Institution :
German Res. Center for Artificial Intelligence, Kaiserslautern, Germany
Volume :
2
fYear :
1995
fDate :
14-16 Aug 1995
Firstpage :
592
Abstract :
The indexing of inaccurately recognized OCR text yields unsatisfactory results, where the quality of the index terms decreases rapidly when the quality of the documents get worse. Index terms of OCR processed documents can be used for archiving or classification tasks. We present an indexing component whose input are character hypothesis lattices which are post-processed by a generate-and-test component feeding a morphology, a rule based substitution system, and a trigram correction component with word candidates. Stop words are filtered by a Levenshtein-based elimination routine. The recognized words are subsequently processed by our indexing component. Our system minimizes the number of generated index terms which are correct German words. The experiments have shown an increase in accuracy of next to 10%
Keywords :
classification; document image processing; indexing; knowledge based systems; optical character recognition; vocabulary; German words; OCR result post-processing; automatic indexing; character hypothesis lattices; classification; document archiving; document quality; elimination routine; experiments; generate-and-test; inaccurately recognized OCR text; index terms; morphology; quality; rule based substitution system; trigram correction; word candidates; word recognition; Abstracts; Artificial intelligence; Character generation; Computer errors; Information retrieval; Lattices; Machine assisted indexing; Morphology; Optical character recognition software; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
Type :
conf
DOI :
10.1109/ICDAR.1995.601966
Filename :
601966
Link To Document :
بازگشت