• Title of article

    Efficient segmentation-free keyword spotting in historical document collections

  • Author/Authors

    Rusiٌol، نويسنده , , Marçal and Aldavert، نويسنده , , David and Toledo، نويسنده , , Ricardo and Lladَs، نويسنده , , Josep، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2015
  • Pages
    11
  • From page
    545
  • To page
    555
  • Abstract
    In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches.
  • Keywords
    Historical documents , Segmentation-free , Dense SIFT features , latent semantic analysis , Product quantization , Keyword spotting
  • Journal title
    PATTERN RECOGNITION
  • Serial Year
    2015
  • Journal title
    PATTERN RECOGNITION
  • Record number

    1879918