مرکز منطقه ای اطلاع رساني علوم و فناوري - Can Reinforcement Learning Always Provide the Best Policy

DocumentCode :

3388874

Title :

Can Reinforcement Learning Always Provide the Best Policy

Author :

Duan, Zhansheng ; Chen, Huimin

Author_Institution :

Department of Electrical Engineering, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148; College of Electronic and Information Engineering, Xi´´an Jiaotong University.

fYear :

2007

fDate :

26-29 Aug. 2007

Firstpage :

224

Lastpage :

228

Abstract :

Reinforcement learning deals with how to find the best policy under uncertain environment to maximize some notion of long term reward. In sequential decision making, it is often expected that the best policy can be designed by choosing appropriate reward or penalty for each action. In this paper, we provide a counterexample to show that the best sequential decision rule can not be obtained by the choice of any reward function in the reinforcement learning framework. In fact, the best policy, namely, the randomized sequential probability ratio test, can only be learned via a rather unconventional formulation of the reinforcement learning. The implication to the design of classifier combining method is also discussed.

Keywords :

Decision making; Detectors; Lakes; Learning; Probability; Sensor systems; Sequential analysis; Signal processing; Statistics; Target recognition; Reinforcement learning; classifier combining; sequential decision;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Statistical Signal Processing, 2007. SSP '07. IEEE/SP 14th Workshop on

Conference_Location :

Madison, WI, USA

Print_ISBN :

978-1-4244-1198-6

Electronic_ISBN :

978-1-4244-1198-6

Type :

conf

DOI :

10.1109/SSP.2007.4301252

Filename :

4301252

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3388874