DocumentCode :
1795686
Title :
Adaptive demand response: Online learning of restless and controlled bandits
Author :
Qingsi Wang ; Mingyan Liu ; Mathieu, Johanna L.
Author_Institution :
Univ. of Michigan, Ann Arbor, MI, USA
fYear :
2014
fDate :
3-6 Nov. 2014
Firstpage :
752
Lastpage :
757
Abstract :
The capabilities of electric loads participating in load curtailment programs are often unknown until the loads have been told to curtail (i.e., deployed) and observed. In programs in which payments are made each time a load is deployed, we aim to pick the “best” loads to deploy in each time step. Our choice is a tradeoff between exploration and exploitation, i.e., curtailing poorly characterized loads in order to better characterize them in the hope of benefiting in the future versus curtailing well-characterized loads so that we benefit now. We formulate this problem as a multi-armed restless bandit problem with controlled bandits. In contrast to past work that has assumed all load parameters are known allowing the use of optimization approaches, we assume the parameters of the controlled system are unknown and develop an online learning approach. Our problem has two features not commonly addressed in the bandit literature: the arms/processes evolve according to different probabilistic laws depending on the control, and the reward/feedback observed by the decision-maker is the total realized curtailment, not the curtailment of each load. We develop an adaptive demand response learning algorithm and an extended version that works with aggregate feedback, both aimed at approximating the Whittle index policy. We show numerically that the regret of our algorithms with respect to the Whittle index policy is of logarithmic order in time, and significantly outperforms standard learning algorithms like UCB1.
Keywords :
demand side management; learning (artificial intelligence); optimisation; power engineering computing; probability; Whittle index policy; adaptive demand response learning algorithm; aggregate feedback; controlled bandits; decision-maker; electric loads; load curtailment programs; load parameters; multiarmed restless bandit problem; online learning approach; optimization approach; probabilistic laws; standard UCB1 learning algorithms; Aggregates; Heuristic algorithms; Indexes; Load management; Load modeling; Markov processes; Process control;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Smart Grid Communications (SmartGridComm), 2014 IEEE International Conference on
Conference_Location :
Venice
Type :
conf
DOI :
10.1109/SmartGridComm.2014.7007738
Filename :
7007738
Link To Document :
بازگشت