Title :
Word image matching using dynamic time warping
Author :
Rath, Toni M. ; Manmatha, R.
Author_Institution :
Center for Intelligent Inf. Retrieval, Univ. of Massachusetts, Amherst, MA, USA
Abstract :
Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labor and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed: clusters with occurrences of the same word in a collection are established using image matching. By annotating "interesting" clusters, an index can be built automatically. We present an algorithm for matching handwritten words in noisy historical documents. The segmented word images are preprocessed to create sets of 1-dimensional features, which are then compared using dynamic time warping. We present experimental results on two different data sets from the George Washington collection. Our experiments show that this algorithm performs better and is faster than competing matching techniques.
Keywords :
character recognition; computer vision; document image processing; feature extraction; handwriting recognition; image matching; image segmentation; indexing; library automation; pattern clustering; George Washington collection; cluster annotation; collection index; data set; dynamic time warping; electronic media; handwriting recognition; handwritten historical manuscript; handwritten word matching; image preprocessing; library; noisy historical document; scanned version; word image matching; word image segmentation; word occurrence cluster; word spotting; Character recognition; Clustering algorithms; Handwriting recognition; Image matching; Image recognition; Image segmentation; Indexing; Information retrieval; Libraries; Optical character recognition software;
Conference_Titel :
Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
Print_ISBN :
0-7695-1900-8
DOI :
10.1109/CVPR.2003.1211511