• DocumentCode
    183247
  • Title

    Applications of Recurrent Neural Network Language Model in Offline Handwriting Recognition and Word Spotting

  • Author

    Nan Li ; Jinying Chen ; Huaigu Cao ; Bing Zhang ; Natarajan, Prem

  • Author_Institution
    Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2014
  • fDate
    1-4 Sept. 2014
  • Firstpage
    134
  • Lastpage
    139
  • Abstract
    The recurrent neural network language model (RNNLM) is a discriminative, non-Markovian model that can capture long-span word history in natural language. It has been proved to be successful in automatic speech recognition and machine translation. In this work, we applied RNNLM to the n-best rescoring stage of the state-of-the-art BBN Byblos OCR (optical character recognition) system for handwriting recognition.1 With RNNLM scores as additional features, our system achieved significant improvement (p <; 0.001), a 3.5% relative reduction on OCR word error rate, compared with a high baseline that uses n-gram language model for rescoring. We have also developed a novel method to integrate the OCR n-best RNNLM scores into the word posterior probabilities in OCR confusion networks, which resulted in consistent observable improvements in word spotting for OCR´ed handwritten documents, as measured by both mean average precision (MAP) and detection-error tradeoff (DET) curves.
  • Keywords
    document image processing; handwriting recognition; language translation; optical character recognition; recurrent neural nets; word processing; BBN Byblos OCR; DET curves; MAP; OCR confusion networks; OCR handwritten documents; OCR word error rate; RNNLM scores; automatic speech recognition; detection-error tradeoff curves; long-span word history; machine translation; mean average precision; nonMarkovian model; offline handwriting recognition; optical character recognition system; recurrent neural network language model; word posterior probabilities; word spotting; Character recognition; Handwriting recognition; Hidden Markov models; Lattices; Optical character recognition software; Recurrent neural networks; Training; information retrieval; keyword search; optical character recognition; recurrent neural networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
  • Conference_Location
    Heraklion
  • ISSN
    2167-6445
  • Print_ISBN
    978-1-4799-4335-7
  • Type

    conf

  • DOI
    10.1109/ICFHR.2014.30
  • Filename
    6981009