Title :
Opposition-Based Q(λ) with Non-Markovian Update
Author :
Shokri, Maryam ; Tizhoosh, Hamid R. ; Kamel, Mohamed S.
Author_Institution :
Dept. of Syst. Design Eng., Waterloo Univ.
Abstract :
The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins´ Q(λ) reflected a remarkable increase in performance for the OQ(λ) algorithm. However the Markovian update of opposition traces is an issue which is investigated in this paper. It has been assumed that the opposite state can be presented to the agent. This may limit the usability of the technique to deterministic environments. In order to relax this assumption the non-Markovian opposition-based Q(λ) (NOQ(λ)) is introduced in this work. The new method is a hybrid of Markovian update for eligibility traces and non-Markovian-based update for opposition traces. The experimental results show improvements of learning speed for the proposed technique compared to Q(λ) and OQ(λ). The new technique performs faster than OQ(λ) algorithm with the same success rate and can be employed for broader range of applications since it does not require determining state transition
Keywords :
Markov processes; learning (artificial intelligence); Markovian update; eligibility trace; learning speed; nonMarkovian opposition trace; nonMarkovian update; opposite state; Design engineering; Dynamic programming; Laboratories; Learning; Machine intelligence; Pattern analysis; State-space methods; System analysis and design; Systems engineering and theory; Usability;
Conference_Titel :
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0706-0
DOI :
10.1109/ADPRL.2007.368201