• DocumentCode
    2199324
  • Title

    A Full-Text Search System for Images of Hand-Written Cursive Documents

  • Author

    Imura, Hajime ; Tanaka, Yuzuru

  • Author_Institution
    Dept. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan
  • fYear
    2010
  • fDate
    16-18 Nov. 2010
  • Firstpage
    640
  • Lastpage
    645
  • Abstract
    We propose a full-text search technique for image-scanned documents that does not recognize individual characters. The system is as fast as a full-text search of machine-readable documents. Such a system is important when working with historical handwritten manuscripts. The proposed method works independently of differences in language and font because it uses a new pseudo-coding scheme based on the statistical features of character shapes. We evaluated our method in recall-precision curves for n-gram-based query strings in Japanese manuscripts and word-based query strings in English manuscripts using two types of image features and two different pseudo-coding schemes. Results demonstrate that the precision reached over 50% at a recall point of 80% for 3-gram queries in the Japanese manuscripts. Results also indicate that our pseudo-code is suitable for applications that use machine-learning techniques. The combination of an HMM-based filtering method and our pseudo-code can significantly improve performance in terms of retrieval precision.
  • Keywords
    document image processing; feature extraction; handwritten character recognition; hidden Markov models; image retrieval; learning (artificial intelligence); natural languages; statistical analysis; text analysis; word processing; English manuscript; HMM-based filtering; Japanese manuscript; character shape; full text search system; hand-written cursive document image; handwritten manuscript; image features; image scanned document; machine learning technique; machine readable document; n gram-based query string; pseudocoding scheme; recall precision curve; statistical feature; word-based query string; Full-text Search; Performance Evaluation; Word Spotting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
  • Conference_Location
    Kolkata
  • Print_ISBN
    978-1-4244-8353-2
  • Type

    conf

  • DOI
    10.1109/ICFHR.2010.105
  • Filename
    5693636