DocumentCode :
3388874
Title :
Can Reinforcement Learning Always Provide the Best Policy
Author :
Duan, Zhansheng ; Chen, Huimin
Author_Institution :
Department of Electrical Engineering, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148; College of Electronic and Information Engineering, Xi´´an Jiaotong University.
fYear :
2007
fDate :
26-29 Aug. 2007
Firstpage :
224
Lastpage :
228
Abstract :
Reinforcement learning deals with how to find the best policy under uncertain environment to maximize some notion of long term reward. In sequential decision making, it is often expected that the best policy can be designed by choosing appropriate reward or penalty for each action. In this paper, we provide a counterexample to show that the best sequential decision rule can not be obtained by the choice of any reward function in the reinforcement learning framework. In fact, the best policy, namely, the randomized sequential probability ratio test, can only be learned via a rather unconventional formulation of the reinforcement learning. The implication to the design of classifier combining method is also discussed.
Keywords :
Decision making; Detectors; Lakes; Learning; Probability; Sensor systems; Sequential analysis; Signal processing; Statistics; Target recognition; Reinforcement learning; classifier combining; sequential decision;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Statistical Signal Processing, 2007. SSP '07. IEEE/SP 14th Workshop on
Conference_Location :
Madison, WI, USA
Print_ISBN :
978-1-4244-1198-6
Electronic_ISBN :
978-1-4244-1198-6
Type :
conf
DOI :
10.1109/SSP.2007.4301252
Filename :
4301252
Link To Document :
بازگشت