Title :
Document analysis-from pixels to contents
Author :
Schürmann, Jürgen ; Bartneck, Norbert ; Bayer, Thomas ; Franke, Jürgen ; Mandler, Eberhard ; Oberländer, Matthias
Author_Institution :
Daimler-Benz Inst. for Inf. Technol., Ulm, Germany
fDate :
7/1/1992 12:00:00 AM
Abstract :
The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document´s pixel representation into an equivalent knowledge network representation holding the document´s content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward
Keywords :
document image processing; knowledge representation; optical character recognition; ASCII representation; OCR; character objects; conceptual framework; contextual processing; document analysis; elementary geometric objects; equivalent knowledge network representation; pixel representation; Availability; Character recognition; Context modeling; Humans; Information analysis; Information processing; Pattern analysis; Pattern recognition; Text analysis; Workstations;
Journal_Title :
Proceedings of the IEEE