• DocumentCode
    2129583
  • Title

    One-Class Classification of Text Streams with Concept Drift

  • Author

    Zhang, Yang ; Li, Xue ; Orlowska, Maria

  • Author_Institution
    Univ. of Queensland, Brisbane, QLD
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    116
  • Lastpage
    125
  • Abstract
    Research on streaming data classification has been mostly based on the assumption that data can be fully labelled. However, this is impractical. Firstly it is impossible to make a complete labelling before all data has arrived. Secondly it is generally very expensive to obtain fully labelled data by using man power. Thirdly user interests may change with time so the labels issued earlier may be inconsistent with the labels issued later - this represents concept drift. In this paper, we consider the problem of one-class classification on text stream with respect to concept drift where a large volume of documents arrives at a high speed and with change of user interests and data distribution. In this case, only a small number of positively labelled documents is available for training. We propose a stacking style ensemble-based approach and have compared it to all other window-based approaches, such as single window, fixed window, and full memory approaches. Our experiment results demonstrate that the proposed ensemble approach outperforms all other approaches.
  • Keywords
    text analysis; data distribution; one-class classification; positively labelled documents; stacking style ensemble-based approach; streaming data classification; text streams; window-based approach; Conferences; Current measurement; Data mining; Feedback; Information retrieval; Information technology; Labeling; Natural languages; Stacking; Text categorization; Concept Drift; One-class Classification; Text Stream;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-0-7695-3503-6
  • Electronic_ISBN
    978-0-7695-3503-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2008.54
  • Filename
    4733929