• DocumentCode
    3488307
  • Title

    Greedy Search for Active Learning of OCR

  • Author

    Agarwal, Abhishek ; Garg, Radhika ; Chaudhury, Santanu

  • Author_Institution
    Dept. of Electr. Eng., Indian Inst. of Technol., Delhi, New Delhi, India
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    837
  • Lastpage
    841
  • Abstract
    Active learning and crowd sourcing are becoming increasingly popular in the machine learning community for fast and cost effective generation of labels for large volumes of data. However, such labels may be noisy. So, it becomes important to ignore the noisy labels for building of a good classifier. We propose a framework for finding the best possible augmentation of a classifier for the character recognition problem using minimum number of crowd labeled samples. The approach inherently rejects the noisy data and tries to accept a subset of correctly labeled data to maximize the classifier performance.
  • Keywords
    image classification; learning (artificial intelligence); optical character recognition; search problems; OCR; active learning; character recognition problem; classifier; crowd labeled samples; greedy search; noisy data rejection; optical character recognition; Accuracy; Character recognition; Noise; Noise measurement; Optical character recognition software; Support vector machines; Training; Character recognition; Indian scripts; active learning; crowd sourcing; greedy search; incremental SVM;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.171
  • Filename
    6628736