• DocumentCode
    2147601
  • Title

    Document Image Classification and Labeling Using Multiple Instance Learning

  • Author

    Kumar, Jayant ; Pillai, Jaishanker ; Doermann, David

  • Author_Institution
    Inst. of Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    1059
  • Lastpage
    1063
  • Abstract
    The labeling of large sets of images for training or testing analysis systems can be a very costly and time-consuming process. Multiple instance learning (MIL) is a generalization of traditional supervised learning which relaxes the need for exact labels on training instances. Instead, the labels are required only for a set of instances known as bags. In this paper, we apply MIL to the retrieval and localization of signatures and the retrieval of images containing machine-printed text, and show that a gain of 15-20% in performance can be achieved over the supervised learning with weak-labeling. We also compare our approach to supervised learning with fully annotated training data and report a competitive accuracy for MIL. Using our experiments on real-world datasets, we show that MIL is a good alternative when the training data has only document-level annotation.
  • Keywords
    document image processing; image classification; image retrieval; learning (artificial intelligence); document image classification; document image labeling; document-level annotation; fully annotated training data; image retrieval; machine-printed text; multiple instance learning; signature localization; signature retrieval; testing analysis systems; traditional supervised learning; training analysis systems; weak-labeling; Feature extraction; Handwriting recognition; Histograms; Image segmentation; Supervised learning; Support vector machines; Training; Document Image Labeling; Machine-print Documents; Signature Detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.214
  • Filename
    6065472