Title :
READLEX: a lexicon for the recognition and analysis of structured documents
Author_Institution :
German Res. Center for Artificial Intelligence, Kaiserslautern, Germany
Abstract :
This paper describes the architecture of a lexicon system called READLEX dealing with requirements of both text recognition and text analysis in document analysis. In order to meet these requirements, we have developed a concept for the automatic acquisition and generation of the lexicon. The heart of the lexicon system is based on redundant hash addressing techniques. Currently, the lexicon is used for the contextual post-processing of OCR results as well as the categorization of texts within structured documents. Other components for document analysis such as the address parser and a text pattern matcher also make use of the lexicon
Keywords :
document image processing; knowledge acquisition; optical character recognition; OCR; READLEX; address parser; contextual post-processing; document analysis; lexicon; redundant hash addressing; structured documents; text analysis; text pattern matcher; text recognition; Artificial intelligence; Character recognition; Dictionaries; Heart; Natural languages; Optical character recognition software; Pareto analysis; Space technology; Text analysis; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.601956