DocumentCode :
478631
Title :
Word Extraction Method by Generating Multiple Character Hypotheses
Author :
Takebe, Hiroaki ; Fujimoto, Katsuhito
fYear :
2008
fDate :
16-19 Sept. 2008
Firstpage :
299
Lastpage :
306
Abstract :
It is necessary to extract precisely words of headers and data for recognizing logical structure of form images. However, word extraction often fails because of layout analysis or character recognition error, which leads correct character hypotheses not to be generated. We propose a word extraction method which generates multiple character hypotheses and extracts their combinations which correspond with the character orders of words. Firstly character hypotheses which overlap with each other are generated by combinatorial recognition of connected components and their combinations which correspond with words are extracted by clique extraction from a graph. And then, character hypotheses are generated by recognition with limited target and their combinations which correspond with words areextracted by matching between lattices based on local optimum, in which variety of recognition results and regular expression of words are considered. We confirmed the effect of our method by the experiment for form images.
Keywords :
Character generation; Data mining; Discrete cosine transforms; Filters; Image edge detection; Image segmentation; Layout; Machine learning; Pulse modulation; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara, Japan
Print_ISBN :
978-0-7695-3337-7
Type :
conf
DOI :
10.1109/DAS.2008.35
Filename :
4669974
Link To Document :
بازگشت