• DocumentCode
    1948139
  • Title

    A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

  • Author

    Malbasa, Vuk ; Vucetic, Slobodan

  • Author_Institution
    Temple Univ., Philadelphia
  • fYear
    2007
  • fDate
    12-17 Aug. 2007
  • Firstpage
    2200
  • Lastpage
    2204
  • Abstract
    Resource-constrained data mining introduces many constraints when learning from large datasets. It is often not practical or possible to keep the entire data set in main memory and often the data could be observed in a single run in the order in which they are presented. Traditional reservoir-based approaches perform well in this situation. One drawback of these approaches is that the examples not included in the final reservoir are often ignored. To remedy this situation we propose a modification to the baseline reservoir algorithm. Instead of keeping the actual target values of reservoir examples, an estimate of their conditional expectation is kept and updated online as new data are observed from the stream. The estimate is obtained by averaging target values of the similar examples. The proposed algorithm uses a paired t-test to determine the similarity threshold. Thorough evaluation on generated two dimensional data shows that the proposed algorithm is producing reservoirs with considerably reduced target noise. This property allows training of significantly improved prediction models as compared with the baseline reservoir-based approach.
  • Keywords
    adaptive estimation; data mining; learning (artificial intelligence); sampling methods; statistical testing; baseline reservoir-based approach; conditional expectation adaptive estimation; learning algorithm; paired t-test; prediction model; reservoir sampling algorithm; resource-constrained data mining; similarity threshold; Adaptive estimation; Capacity planning; Data mining; Neural networks; Noise generators; Noise reduction; Predictive models; Reservoirs; Sampling methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2007. IJCNN 2007. International Joint Conference on
  • Conference_Location
    Orlando, FL
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-1379-9
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2007.4371299
  • Filename
    4371299