DocumentCode :
183279
Title :
Flexible Sequence Matching Technique: Application to Word Spotting in Degraded Documents
Author :
Mondal, Tanmoy ; Ragot, N. ; Ramel, Jean-Yves ; Pal, Umapada
Author_Institution :
Lab. d´Inf., Univ. Francois Rabelais, Tours, France
fYear :
2014
fDate :
1-4 Sept. 2014
Firstpage :
210
Lastpage :
215
Abstract :
In this paper, a new sequence-matching algorithm, called as Flexible Sequence Matching (FSM) algorithm is proposed. FSM combines several abilities of other sequence matching algorithms (especially DTW, CDP and MVM) that could be configured depending on the application domain. Its generality and robustness comes from its ability to find sub sequences (as in CDP), to skip outliers inside the match sequences (as in MVM) and to match multiple elements with a single one (as in CDP and DTW). These properties make it extremely suitable for robust word spotting. More precisely, the FSM algorithm has the capability to retrieve a query inside a line or piece of line. This facility is useful as word segmentation process may not work accurately or when only line segmentation information is available. Furthermore, thanks to its skipping capability, that makes the proposed FSM algorithm less sensible to local variations in the spelling of words, and also to local degradation effects. Finally, its multiple matching facilities (many to one and one to many matching) are useful in case of different length of target and query sequences due to the variability in scale factor. We demonstrate the superiority of proposed FSM algorithm in specific cases such as incorrect word segmentation and word level local variations. When different experiments were performed using handwritten George Washington dataset and also on historical typewritten document images, quite promising results were obtained.
Keywords :
document image processing; feature extraction; image matching; word processing; CDP algorithm; DTW algorithm; FSM algorithm; MVM algorithm; application domain; flexible sequence matching algorithm; handwritten George Washington dataset; historical typewritten document images; query retrieval; query sequences; scale factor; word segmentation process; word spotting; Algorithm design and analysis; Computer architecture; Dynamic programming; Feature extraction; Image segmentation; Pattern recognition; Robustness; Continuous dynamic programming (CDP); Degraded historical document; Dynamic time warping (DTW); Elastic matching; Handwritten documents; Minimal variance matching (MVM); Sequence alignment; Word spotting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location :
Heraklion
ISSN :
2167-6445
Print_ISBN :
978-1-4799-4335-7
Type :
conf
DOI :
10.1109/ICFHR.2014.43
Filename :
6981022
Link To Document :
بازگشت