본문 바로가기

Deep Learning/강화학습

[2018.08] SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

728x90

Model-based reinforcement learning (RL) has proven to be a data-efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy. We evaluate our approach on a range of robotics tasks, including manipulation with a real-world robotic arm directly from images. We find that our method produces substantially better final performance than other model-based RL methods while being significantly more efficient than model-free RL.

Figure 2. A high-level schematic of the proposed methods.

URL: arxiv.org/abs/1808.09105
Author: Zhang et al.
Topic: Latent Dynamics
Conference: ICML 2019

1. Background

Model-based reinforcement learning (RL) methods use known or learned models in a variety of ways, such as planning through the model and generating synthetic experience (Sutton, 1990; Kober et al., 2013). On simple, low-dimensional tasks, model-based approaches have demonstrated remarkable data efficiency. However, for more complex domains, one of the main difficulties in applying model-based methods is modeling bias: if control or policy learning is performed against an imperfect model, performance in the real world will typically degrade with model inaccuracy (Deisenroth et al., 2014). Many model-based methods rely on accurate forward prediction for planning (Nagabandi et al., 2018; Chua et al., 2018), and for image-based domains, this precludes the use of simple models which will introduce significant modeling bias. However, complex, expressive models must typically be trained on very large datasets, corresponding to days to weeks of data collection, in order to generate accurate forward predictions of images (Finn & Levine, 2017; Pinto & Gupta, 2016; Agrawal et al., 2016).

How can we use model-based methods to learn from images with similar data efficiency as we have seen in simpler domains? In our work, we focus on removing the need for accurate forward prediction, using what we term local models methods. These methods use simple models, typically linear models, to provide gradient directions for local policy improvement, rather than forward prediction and planning (Todorov & Li, 2005; Levine & Abbeel, 2014). Thus, local model methods circumvent the need for accurate predictive models, but these methods cannot be directly applied to image-based tasks because image dynamics, even locally speaking, are highly non-linear.

Our main contribution is a representation learning and model-based RL procedure, which we term stochastic optimal control with latent representations (SOLAR), that jointly optimizes a latent representation and model such that inference produces local models that provide good gradient directions for policy improvement. SOLAR is able to learn policies directly from high-dimensional image observations in several domains, including a real robotic arm stacking blocks and pushing objects with only one to two hours of data collection.