• DocumentCode
    419649
  • Title

    Adaptive word style classification using a Gaussian mixture model

  • Author

    Ma, Huanfeng ; Doermann, David

  • Author_Institution
    Inst. for Adv. Comput. Studies, Maryland Univ., College Park, MD, USA
  • Volume
    2
  • fYear
    2004
  • fDate
    23-26 Aug. 2004
  • Firstpage
    606
  • Abstract
    We present a new approach to detect bold and italic words in scanned documents. Under the assumption that OCR results are available, features used for classification are selected automatically using feature selection. For each scanned page, a Gaussian mixture model is constructed for characters with the same character code, and word styles are determined using a weighted majority vote. We applied this method to a variety of documents and compared the results with current commercial OCR software that provides style information. The experimental results show that our method performs better.
  • Keywords
    Gaussian processes; optical character recognition; text analysis; word processing; Gaussian mixture model; adaptive word style classification; feature selection; weighted majority vote; Application software; Character recognition; Dictionaries; Educational institutions; Gabor filters; Optical character recognition software; Pattern recognition; Printing; Text recognition; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2128-2
  • Type

    conf

  • DOI
    10.1109/ICPR.2004.1334321
  • Filename
    1334321