• DocumentCode
    3020569
  • Title

    Identifying script on word-level with informational confidence

  • Author

    Jaeger, Stefan ; Ma, Huanfeng ; Doermann, David

  • Author_Institution
    Inst. for Adv. Comput. Studies, Maryland Univ., College Park, MD, USA
  • fYear
    2005
  • fDate
    29 Aug.-1 Sept. 2005
  • Firstpage
    416
  • Abstract
    In this paper, we present a multiple classifier system for script identification. Applying a Gabor filter analysis of textures on word-level, our system identifies Latin and non-Latin words in bilingual printed documents. The classifier system comprises four different architectures based on nearest neighbors, weighted Euclidean distances, Gaussian mixture models, and support vector machines. We report results for Arabic, Chinese, Hindi, and Korean script. Moreover, we show that combining informational confidence values using sum-rule can consistently outperform the best single recognition rate.
  • Keywords
    Gabor filters; Gaussian processes; document image processing; image texture; natural languages; pattern classification; support vector machines; Gabor filter analysis; Gaussian mixture model; bilingual printed document; informational confidence; multiple classifier system; nearest neighbor method; script identification; support vector machine; weighted Euclidean distance; word-level texture; Dictionaries; Educational institutions; Euclidean distance; Gabor filters; Nearest neighbor searches; Neural networks; Optical character recognition software; Support vector machine classification; Support vector machines; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
  • ISSN
    1520-5263
  • Print_ISBN
    0-7695-2420-6
  • Type

    conf

  • DOI
    10.1109/ICDAR.2005.134
  • Filename
    1575580