• DocumentCode
    3488807
  • Title

    Detecting OOV Names in Arabic Handwritten Data

  • Author

    Jinying Chen ; Prasad, Ranga ; Huaigu Cao ; Natarajan, Prem

  • Author_Institution
    Dept. of Speech, Language & Multimedia, Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    994
  • Lastpage
    998
  • Abstract
    This paper presents a novel approach to detect Arabic OOV names from OCR´ed handwritten documents. In our approach, OOV names are searched for using approximate string match on character consensus networks (cnets). The retrieved regions are re-ranked using novel features representing the quality of the match and the likelihood of the detected region to be an OOV name. Our features that encode word boundary information into the approximate match algorithm significantly improve mean average precision (MAP) by 12.2% (absolute gains) for rank cut-off 100 (48.2% vs. 36.0%) and 11.9% for cut-off 1000 (47.0% vs. 35.1%) over the baseline system. Discriminative reranking based on maximum entropy classification using novel features, such as the probability of a retrieved region being an OOV name (called OOV name probability) from a conditional random field model, further improve MAP by 2.3% (absolute gains) for cut-off 100 and 3.0% for cut-off 1000. The improvements are consistent in DET (Detection Error Tradeoff) curves. Our results show that character cnet based OOV name search benefits clearly from the approximate match using word boundary information and the reranking algorithm. Our experiments also show that OOV name probability is very useful for reranking.
  • Keywords
    document image processing; handwritten character recognition; image matching; maximum likelihood estimation; object detection; Arabic OOV names detection; DET curve; MAP; OCR handwritten documents; OOV name probability; approximate string match; character consensus networks; conditional random field model; detected region likelihood; detection error tradeoff; feature reranking; maximum entropy classification; mean average precision; optical character recognition; out-of-vocabulary name detection; retrieved region probability; word boundary information; Data models; Feature extraction; Hidden Markov models; Lattices; Optical character recognition software; Speech; Training; OOV name detection; consensus network; handwritten; reranking; spotting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.200
  • Filename
    6628765