• DocumentCode
    2865681
  • Title

    Text classification with evolving label-sets

  • Author

    Godbole, Shantanu ; Ramakrishnan, Ganesh ; Sarawagi, Sunita

  • Author_Institution
    IIT Bombay, India
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    We introduce the evolving label-set problem encountered in building real-world text classification systems. This problem arises when a text classification system trained on a label-set encounters documents of unseen classes at deployment time. We design a class-detector module that monitors unlabeled data, detects new classes, and suggests them to the administrator for inclusion in the label-set. We propose abstractions that group together tokens under human understandable concepts and provide a mechanism of assigning importance to unseen terms. We present generative algorithms leveraging the notion of support of documents in a model for (1) selecting documents of proposed new classes, and (2) automatically triggering detection of new classes. Experiments on three real world taxonomies show that our methods select new class documents with high precision, and trigger emergence of new classes with low false-positive and false-negative rates.
  • Keywords
    classification; text analysis; class-detector module; document selection; evolving label-sets; generative algorithm; text classification; Algorithm design and analysis; Australia; Buildings; Constitution; Data mining; Error analysis; Humans; Robustness; Taxonomy; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.143
  • Filename
    1565743