• DocumentCode
    2772013
  • Title

    Mining Data Streams with Labeled and Unlabeled Training Examples

  • Author

    Zhang, Peng ; Zhu, Xingquan ; Li Guo

  • Author_Institution
    Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    627
  • Lastpage
    636
  • Abstract
    In this paper, we propose a framework to build prediction models from data streams which contain both labeled and unlabeled examples. We argue that due to the increasing data collection ability but limited resources for labeling, stream data collected at hand may only have a small number of labeled examples, whereas a large portion of data remain unlabeled but can be beneficial for learning. Unleashing the full potential of the unlabeled instances for stream data mining is, however, a significant challenge, consider that even fully labeled data streams may suffer from the concept drifting, and inappropriate uses of the unlabeled samples may only make the problem even worse. To build prediction models, we first categorize the stream data into four different categories, each of which corresponds to the situation where concept drifting may or may not exist in the labeled and unlabeled data. After that, we propose a relational k-means based transfer semi-supervised SVM learning framework (RK-TS3VM), which intends to leverage labeled and unlabeled samples to build prediction models. Experimental results and comparisons on both synthetic and real-world data streams demonstrate that the proposed framework is able to help build prediction models more accurate than other simple approaches can offer.
  • Keywords
    data mining; learning (artificial intelligence); RK-TS3VM; SVM learning framework; data stream mining; prediction model; relational k-means algorithm; transfer semisupervised learning; unlabeled training samples; Association rules; Australia; Availability; Computers; Data mining; Labeling; Predictive models; Support vector machines; Virtual manufacturing; Warehousing; data stream; support vector machines; unlabeled samples;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.76
  • Filename
    5360289