• DocumentCode
    3141226
  • Title

    Advances in the BBN BYBLOS OCR system

  • Author

    Lu, Zhidong ; Schwartz, Richard ; Natarajan, Premkumar ; Bazzi, Issam ; Makhoul, John

  • Author_Institution
    GTE Corp., Cambridge, MA, USA
  • fYear
    1999
  • fDate
    20-22 Sep 1999
  • Firstpage
    337
  • Lastpage
    340
  • Abstract
    We present some recent advances in the BBN BYBLOS OCR system. This OCR system can be used to recognize Arabic, Chinese, and English with high accuracy. A major change in the system is the use of continuous-density HMMs, which allow us to take advantage of a large amount of training data and to use unsupervised adaptation methods to improve accuracy in many cases, e.g., on degraded data. Another advance is the substantial increase in recognition speed. With this increased speed, the system is fast enough for practical use on Arabic and English data. The extension of the system to Chinese further demonstrated the language independence of this system and showed that this system can be used on languages with large character sets and complicated character structures. The Chinese OCR system yielded high accuracy on newspaper data
  • Keywords
    character sets; document image processing; hidden Markov models; optical character recognition; Arabic; BBN BYBLOS; Chinese; English; OCR; character sets; continuous-density hidden Markov model; newspaper data; optical character recognition; unsupervised adaptation methods; Character recognition; Degradation; Feature extraction; Hidden Markov models; Linear discriminant analysis; Optical character recognition software; Robustness; Speech recognition; Training data; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    0-7695-0318-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1999.791793
  • Filename
    791793