• DocumentCode
    1488676
  • Title

    Active Learning With Sampling by Uncertainty and Density for Data Annotations

  • Author

    Zhu, Jingbo ; Wang, Huizhen ; Tsou, Benjamin K. ; Ma, Matthew

  • Author_Institution
    Natural Language Process. Lab., Northeastern Univ., Shenyang, China
  • Volume
    18
  • Issue
    6
  • fYear
    2010
  • Firstpage
    1323
  • Lastpage
    1331
  • Abstract
    To solve the knowledge bottleneck problem, active learning has been widely used for its ability to automatically select the most informative unlabeled examples for human annotation. One of the key enabling techniques of active learning is uncertainty sampling, which uses one classifier to identify unlabeled examples with the least confidence. Uncertainty sampling often presents problems when outliers are selected. To solve the outlier problem, this paper presents two techniques, sampling by uncertainty and density (SUD) and density-based re-ranking. Both techniques prefer not only the most informative example in terms of uncertainty criterion, but also the most representative example in terms of density criterion. Experimental results of active learning for word sense disambiguation and text classification tasks using six real-world evaluation data sets demonstrate the effectiveness of the proposed methods.
  • Keywords
    learning (artificial intelligence); natural language processing; text analysis; uncertainty handling; active learning; data annotations; density based reranking; knowledge bottleneck problem; text classification tasks; uncertainty sampling; word sense disambiguation; Active learning; density-based re-ranking; sampling by uncertainty and density; text classification; uncertainty sampling; word sense disambiguation (WSD);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2033421
  • Filename
    5272205