مرکز منطقه ای اطلاع رساني علوم و فناوري - Adaptive demand response: Online learning of restless and controlled bandits

DocumentCode :

1795686

Title :

Adaptive demand response: Online learning of restless and controlled bandits

Author :

Qingsi Wang ; Mingyan Liu ; Mathieu, Johanna L.

Author_Institution :

Univ. of Michigan, Ann Arbor, MI, USA

fYear :

2014

fDate :

3-6 Nov. 2014

Firstpage :

752

Lastpage :

757

Abstract :

The capabilities of electric loads participating in load curtailment programs are often unknown until the loads have been told to curtail (i.e., deployed) and observed. In programs in which payments are made each time a load is deployed, we aim to pick the “best” loads to deploy in each time step. Our choice is a tradeoff between exploration and exploitation, i.e., curtailing poorly characterized loads in order to better characterize them in the hope of benefiting in the future versus curtailing well-characterized loads so that we benefit now. We formulate this problem as a multi-armed restless bandit problem with controlled bandits. In contrast to past work that has assumed all load parameters are known allowing the use of optimization approaches, we assume the parameters of the controlled system are unknown and develop an online learning approach. Our problem has two features not commonly addressed in the bandit literature: the arms/processes evolve according to different probabilistic laws depending on the control, and the reward/feedback observed by the decision-maker is the total realized curtailment, not the curtailment of each load. We develop an adaptive demand response learning algorithm and an extended version that works with aggregate feedback, both aimed at approximating the Whittle index policy. We show numerically that the regret of our algorithms with respect to the Whittle index policy is of logarithmic order in time, and significantly outperforms standard learning algorithms like UCB1.

Keywords :

demand side management; learning (artificial intelligence); optimisation; power engineering computing; probability; Whittle index policy; adaptive demand response learning algorithm; aggregate feedback; controlled bandits; decision-maker; electric loads; load curtailment programs; load parameters; multiarmed restless bandit problem; online learning approach; optimization approach; probabilistic laws; standard UCB1 learning algorithms; Aggregates; Heuristic algorithms; Indexes; Load management; Load modeling; Markov processes; Process control;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Smart Grid Communications (SmartGridComm), 2014 IEEE International Conference on

Conference_Location :

Venice

Type :

conf

DOI :

10.1109/SmartGridComm.2014.7007738

Filename :

7007738

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1795686