DocumentCode
1723414
Title
Document Retrieval with Unlimited Vocabulary
Author
Ranjan, Viresh ; Harit, Gaurav ; Jawahar, C.V.
Author_Institution
CVIT, IIIT, Hyderabad, India
fYear
2015
Firstpage
741
Lastpage
748
Abstract
In this paper, we describe a classifier based retrieval scheme for efficiently and accurately retrieving relevant documents. We use SVM classifiers for word retrieval, and argue that the classifier based solutions can be superior to the OCR based solutions in many practical situations. We overcome the practical limitations of the classifier based solution in terms of limited vocabulary support, and availability of training data. In order to overcome these limitations, we design a one-shot learning scheme for dynamically synthesizing classifiers. Given a set of SVM classifiers, we appropriately join them to create novel classifiers. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. We validate our method on multiple datasets, and compare it with popular alternatives like OCR and word spotting. Even on a language like English, where OCRs have been fairly advanced, our method yields comparable or even superior results. Our results are significant since we do not use any language specific post-processing for obtaining this performance. For better accuracy of the retrieved list, we use query expansion. This also allows us to seamlessly adapt our solution to new fonts, styles and collections.
Keywords
document handling; learning (artificial intelligence); pattern classification; query processing; support vector machines; vocabulary; SVM classifiers; classifier based retrieval scheme; document retrieval; one-shot learning scheme; query expansion; unlimited vocabulary; word retrieval; Accuracy; Optical character recognition software; Strips; Support vector machines; Training; Training data; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on
Conference_Location
Waikoloa, HI
Type
conf
DOI
10.1109/WACV.2015.104
Filename
7045958
Link To Document