DocumentCode
778911
Title
Document analysis-from pixels to contents
Author
Schürmann, Jürgen ; Bartneck, Norbert ; Bayer, Thomas ; Franke, Jürgen ; Mandler, Eberhard ; Oberländer, Matthias
Author_Institution
Daimler-Benz Inst. for Inf. Technol., Ulm, Germany
Volume
80
Issue
7
fYear
1992
fDate
7/1/1992 12:00:00 AM
Firstpage
1101
Lastpage
1119
Abstract
The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document´s pixel representation into an equivalent knowledge network representation holding the document´s content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward
Keywords
document image processing; knowledge representation; optical character recognition; ASCII representation; OCR; character objects; conceptual framework; contextual processing; document analysis; elementary geometric objects; equivalent knowledge network representation; pixel representation; Availability; Character recognition; Context modeling; Humans; Information analysis; Information processing; Pattern analysis; Pattern recognition; Text analysis; Workstations;
fLanguage
English
Journal_Title
Proceedings of the IEEE
Publisher
ieee
ISSN
0018-9219
Type
jour
DOI
10.1109/5.156473
Filename
156473
Link To Document