• DocumentCode
    3341187
  • Title

    Keyword Matching in Historical Machine-Printed Documents Using Synthetic Data, Word Portions and Dynamic Time Warping

  • Author

    Konidaris, T. ; Gatos, B. ; Perantonis, S.J. ; Kesidis, A.

  • Author_Institution
    Comput. Intell. Lab., Nat. Center for Sci. Res. "Demokritos", Athens
  • fYear
    2008
  • fDate
    16-19 Sept. 2008
  • Firstpage
    539
  • Lastpage
    545
  • Abstract
    In this paper we propose a novel and efficient technique for finding keywords typed by the user in digitised machine-printed historical documents using the dynamic time warping (DTW) algorithm. The method uses word portions located at the beginning and end of each segmented word of the processed documents and try to estimate the position of the first and last characters in order to reduce the list of candidate words. Since DTW can become computational intensive in large datasets the proposed method manages to significantly prune the list of candidate words thus, speeding up the entire process. Word length is also used as a means of further reducing the data to be processed. Results are improved in terms of time and efficiency compared to those produced if no pruning is done to the list of candidate words.
  • Keywords
    document handling; digitised machine-printed historical documents; dynamic time warping; historical machine-printed documents; keyword matching; synthetic data; word length; word portions; Character recognition; Computational intelligence; Histograms; Image segmentation; Informatics; Laboratories; Optical character recognition software; Optical feedback; Text analysis; Typesetting; Dynamic Time Warping; Historical Documents; Indexing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
  • Conference_Location
    Nara
  • Print_ISBN
    978-0-7695-3337-7
  • Type

    conf

  • DOI
    10.1109/DAS.2008.64
  • Filename
    4670004