DocumentCode :
716494
Title :
Reducing hardware experiments for model learning and policy optimization
Author :
Sehoon Ha ; Yamane, Katsu
Author_Institution :
Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
2015
fDate :
26-30 May 2015
Firstpage :
2620
Lastpage :
2626
Abstract :
Conducting hardware experiment is often expensive in various aspects such as potential damage to the robot and the number of people required to operate the robot safely. Computer simulation is used in place of hardware in such cases, but it suffers from so-called simulation bias in which policies tuned in simulation do not work on hardware due to differences in the two systems. Model-free methods such as Q-Learning, on the other hand, do not require a model and therefore can avoid this issue. However, these methods typically require a large number of experiments, which may not be realistic for some tasks such as humanoid robot balancing and locomotion. This paper presents an iterative approach for learning hardware models and optimizing policies with as few hardware experiments as possible. Instead of learning the model from scratch, our method learns the difference between a simulation model and hardware. We then optimize the policy based on the learned model in simulation. The iterative approach allows us to collect wider range of data for model refinement while improving the policy.
Keywords :
humanoid robots; iterative methods; learning (artificial intelligence); Q-learning model-free methods; computer simulation; hardware experiment reduction; hardware model learning; humanoid robot balancing; iterative approach; locomotion; model learning; policy optimization; simulation bias; Computational modeling; Data models; Hardware; Noise; Optimization; Predictive models; Robots;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Robotics and Automation (ICRA), 2015 IEEE International Conference on
Conference_Location :
Seattle, WA
Type :
conf
DOI :
10.1109/ICRA.2015.7139552
Filename :
7139552
Link To Document :
بازگشت