مرکز منطقه ای اطلاع رساني علوم و فناوري - Text string extraction within mixed-mode documents

DocumentCode :

2629591

Title :

Text string extraction within mixed-mode documents

Author :

Hönes, Frank ; Lichter, Jürgen

Author_Institution :

German Res. Center for Artificial Intelligence, Kairserslautern, Germany

fYear :

1993

fDate :

20-22 Oct 1993

Firstpage :

655

Lastpage :

659

Abstract :

Digitized images of printed documents typically consist of a mixture of text, graphics, and image elements. For proper processing and efficient representation, these elements have to be separated. For most applications it is sufficient to separate between text and non-text, because text captures the most information. The authors describe the implementation and performance of a robust algorithm for text string extraction which is completely independent from text orientation and may deal with text in various font styles and sizes. Text objects may be nested in non-text areas and inverse printing can also be analyzed. It should be mentioned that no recognition of individual characters is performed. The classification is only based on rough image features

Keywords :

document handling; document image processing; optical character recognition; string matching; font sizes; font styles; graphics; image elements; inverse printing; mixed-mode documents; printed documents; rough image features; text; text orientation; text string extraction; Artificial intelligence; Character recognition; Data mining; Filtering; Graphics; Image analysis; Independent component analysis; Noise reduction; Robustness; Text analysis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on

Conference_Location :

Tsukuba Science City

Print_ISBN :

0-8186-4960-7

Type :

conf

DOI :

10.1109/ICDAR.1993.395652

Filename :

395652

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2629591