DocumentCode :
730314
Title :
Risk-averse online learning under mean-variance measures
Author :
Vakili, Sattar ; Qing Zhao
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, Davis, Davis, CA, USA
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
1911
Lastpage :
1915
Abstract :
We study risk-averse multi-armed bandit problems under mean-variance measures. We consider two risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second model, the quantity of interest is the total reward at the end of the time horizon and the objective is to minimize the mean-variance of the total reward. Under both models, we establish asymptotic as well as finite-time lower bounds on regret and develop online learning a time horizon algorithms that achieve the lower bounds.
Keywords :
learning (artificial intelligence); minimisation; risk analysis; finite time lower bound; mean variance measure; mean variance minimisation; online learning; risk averse multi-armed bandit problem; risk mitigation model; time horizon; Multi-armed bandit; mean-variance; regret; risk-aversion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178303
Filename :
7178303
Link To Document :
بازگشت