DocumentCode
1948139
Title
A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation
Author
Malbasa, Vuk ; Vucetic, Slobodan
Author_Institution
Temple Univ., Philadelphia
fYear
2007
fDate
12-17 Aug. 2007
Firstpage
2200
Lastpage
2204
Abstract
Resource-constrained data mining introduces many constraints when learning from large datasets. It is often not practical or possible to keep the entire data set in main memory and often the data could be observed in a single run in the order in which they are presented. Traditional reservoir-based approaches perform well in this situation. One drawback of these approaches is that the examples not included in the final reservoir are often ignored. To remedy this situation we propose a modification to the baseline reservoir algorithm. Instead of keeping the actual target values of reservoir examples, an estimate of their conditional expectation is kept and updated online as new data are observed from the stream. The estimate is obtained by averaging target values of the similar examples. The proposed algorithm uses a paired t-test to determine the similarity threshold. Thorough evaluation on generated two dimensional data shows that the proposed algorithm is producing reservoirs with considerably reduced target noise. This property allows training of significantly improved prediction models as compared with the baseline reservoir-based approach.
Keywords
adaptive estimation; data mining; learning (artificial intelligence); sampling methods; statistical testing; baseline reservoir-based approach; conditional expectation adaptive estimation; learning algorithm; paired t-test; prediction model; reservoir sampling algorithm; resource-constrained data mining; similarity threshold; Adaptive estimation; Capacity planning; Data mining; Neural networks; Noise generators; Noise reduction; Predictive models; Reservoirs; Sampling methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2007. IJCNN 2007. International Joint Conference on
Conference_Location
Orlando, FL
ISSN
1098-7576
Print_ISBN
978-1-4244-1379-9
Electronic_ISBN
1098-7576
Type
conf
DOI
10.1109/IJCNN.2007.4371299
Filename
4371299
Link To Document