DocumentCode :
2629591
Title :
Text string extraction within mixed-mode documents
Author :
Hönes, Frank ; Lichter, Jürgen
Author_Institution :
German Res. Center for Artificial Intelligence, Kairserslautern, Germany
fYear :
1993
fDate :
20-22 Oct 1993
Firstpage :
655
Lastpage :
659
Abstract :
Digitized images of printed documents typically consist of a mixture of text, graphics, and image elements. For proper processing and efficient representation, these elements have to be separated. For most applications it is sufficient to separate between text and non-text, because text captures the most information. The authors describe the implementation and performance of a robust algorithm for text string extraction which is completely independent from text orientation and may deal with text in various font styles and sizes. Text objects may be nested in non-text areas and inverse printing can also be analyzed. It should be mentioned that no recognition of individual characters is performed. The classification is only based on rough image features
Keywords :
document handling; document image processing; optical character recognition; string matching; font sizes; font styles; graphics; image elements; inverse printing; mixed-mode documents; printed documents; rough image features; text; text orientation; text string extraction; Artificial intelligence; Character recognition; Data mining; Filtering; Graphics; Image analysis; Independent component analysis; Noise reduction; Robustness; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
Type :
conf
DOI :
10.1109/ICDAR.1993.395652
Filename :
395652
Link To Document :
بازگشت