Title :
A memory-based reinforcement learning algorithm for partially observable Markovian decision processes
Author :
Zheng, Lei ; Cho, Siu-Yeung ; Quek, Chai
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
Abstract :
This paper presents a modified version of U-tree (A.K. McCallum, 1996), a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Treepsilas model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-tree, it is feasible and preferable to adopt the classical dynamic programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.
Keywords :
Markov processes; decision theory; dynamic programming; learning (artificial intelligence); mathematics computing; trees (mathematics); U-tree; car-driving task; dynamic programming algorithm; partially observable Markovian decision processes; reinforcement learning algorithm; selective perception; short-term memory; Learning; Neural networks; Average Reward; Dynamic Programming; Partially Obersvable Markovian Decision Processs; Reinforcement Learning Algorithm;
Conference_Titel :
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-1820-6
Electronic_ISBN :
1098-7576
DOI :
10.1109/IJCNN.2008.4633888