DocumentCode :
778911
Title :
Document analysis-from pixels to contents
Author :
Schürmann, Jürgen ; Bartneck, Norbert ; Bayer, Thomas ; Franke, Jürgen ; Mandler, Eberhard ; Oberländer, Matthias
Author_Institution :
Daimler-Benz Inst. for Inf. Technol., Ulm, Germany
Volume :
80
Issue :
7
fYear :
1992
fDate :
7/1/1992 12:00:00 AM
Firstpage :
1101
Lastpage :
1119
Abstract :
The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document´s pixel representation into an equivalent knowledge network representation holding the document´s content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward
Keywords :
document image processing; knowledge representation; optical character recognition; ASCII representation; OCR; character objects; conceptual framework; contextual processing; document analysis; elementary geometric objects; equivalent knowledge network representation; pixel representation; Availability; Character recognition; Context modeling; Humans; Information analysis; Information processing; Pattern analysis; Pattern recognition; Text analysis; Workstations;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/5.156473
Filename :
156473
Link To Document :
بازگشت