DocumentCode :
592008
Title :
Fast Keyword Searching Using ´BoostMap´ Based Embedding
Author :
Saabni, Raid ; Bronstein, Alexander
Author_Institution :
Triangle R&D Center, Tel-Aviv Univ., Kafr, Israel
fYear :
2012
fDate :
18-20 Sept. 2012
Firstpage :
734
Lastpage :
739
Abstract :
Dynamic Time Warping (DTW), is a simple but efficient technique for matching sequences with rigid deformation. Therefore, it is frequently used for matching shapes in general, and shapes of handwritten words in Document Image Analysis tasks. As DTW is computationally expensive, efficient algorithms for fast computation are crucial. Retrieving images from large scale datasets using DTW, suffers from the constraint of linear searching of all sample in the datasets. Fast approximation algorithms for image retrieval are mostly based on normed spaces where the triangle inequality holds, which is unfortunately not the case with the DTW metric. In this paper we present a novel approach for fast search of handwritten words within large datasets of shapes. The presented approach is based on the Boost-Map [1] algorithm, for embedding the feature space with the DTW measurement to an euclidean space and use the Local Sensitivity Hashing algorithm (LSH) to rank the k-nearest neighbors of a query image. The algorithm, first, processes and embeds objects of the large data sets to a normed space. Fast approximation of k-nearest neighbors using LSH on the embedding space, generates the top kranked samples which are examined using the real DTW distance to give final accurate results. We demonstrate our method on a database of 45; 800 images of word-parts extracted from the IFN/ENIT database [11] and images collected from 51 different writers. Our method achieves a speedup of 4 orders of magnitude over the exact method, at the cost of only a 2:2% reduction in accuracy.
Keywords :
approximation theory; document image processing; handwritten character recognition; image classification; image matching; image retrieval; learning (artificial intelligence); visual databases; Boost-Map algorithm; Boost-Map based embedding; DTW technique; Euclidean space; IFN-ENIT database; LSH algorithm; document image analysis; dynamic time warping; fast approximation algorithm; handwritten word; image retrieval; k-nearest neighbor; keyword search; linear searching constraint; local sensitivity hashing algorithm; query image; sequence matching; shape matching; triangle inequality; Approximation methods; Extraterrestrial measurements; Feature extraction; Shape; Training; Vectors; Adaboost; BoostMap; Dynamic Time Warping; Embedding; Nearest Neighbor; Word Searching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location :
Bari
Print_ISBN :
978-1-4673-2262-1
Type :
conf
DOI :
10.1109/ICFHR.2012.204
Filename :
6424484
Link To Document :
بازگشت