Title :
Can Reinforcement Learning Always Provide the Best Policy
Author :
Duan, Zhansheng ; Chen, Huimin
Author_Institution :
Department of Electrical Engineering, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148; College of Electronic and Information Engineering, Xi´´an Jiaotong University.
Abstract :
Reinforcement learning deals with how to find the best policy under uncertain environment to maximize some notion of long term reward. In sequential decision making, it is often expected that the best policy can be designed by choosing appropriate reward or penalty for each action. In this paper, we provide a counterexample to show that the best sequential decision rule can not be obtained by the choice of any reward function in the reinforcement learning framework. In fact, the best policy, namely, the randomized sequential probability ratio test, can only be learned via a rather unconventional formulation of the reinforcement learning. The implication to the design of classifier combining method is also discussed.
Keywords :
Decision making; Detectors; Lakes; Learning; Probability; Sensor systems; Sequential analysis; Signal processing; Statistics; Target recognition; Reinforcement learning; classifier combining; sequential decision;
Conference_Titel :
Statistical Signal Processing, 2007. SSP '07. IEEE/SP 14th Workshop on
Conference_Location :
Madison, WI, USA
Print_ISBN :
978-1-4244-1198-6
Electronic_ISBN :
978-1-4244-1198-6
DOI :
10.1109/SSP.2007.4301252