본문 바로가기

Deep Learning/강화학습

(18)
[2018.02] Diversity is all you need: Learning skills without a reward function Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose “Diversity is All You Need”(DIAYN), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information-theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this s..
[2019.03] Model-Based Reinforcement Learning for Atari Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction – substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works an..
[2020.05] Planning to explore via self-supervised world models Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During expl..
[2018.08] SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning Model-based reinforcement learning (RL) has proven to be a data-efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observ..
[2017.05] Constrained Policy Optimization For many applications of reinforcement learning, it can be more convenient to specify a constraint along with a reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016; Schulman et al., 2015; Lillicrap et al., 2016; Levine et al., 2016) have enabled new capabilities in h..
[2018.03] Policy Optimization with Demonstration Exploration remains a significant challenge to reinforcement learning methods, especially in environments where reward signals are sparse. Recent methods of learning from demonstrations have shown to be promising in overcoming exploration difficulties but typically require considerable high-quality demonstrations that are difficult to collect. We propose to effectively leverage available demonst..
[2018.02] IMPALA: scalable distributed Deep-RL with importance weighted actor-Learner architectures One of the main challenges in training a single agent on many tasks at once is scalability. Since the current state-of-the-art methods like A3C (Mnih et al., 2016) or UNREAL (Jaderberg et al., 2017b) can require as much as a billion frames and multiple days to master a single domain, training them on tens of domains at once is too slow to be practical. This paper introduces a new distributed age..
[2017.05] Curiosity-driven exploration by self-supervised prediction In extremely sparse reward settings, some kind of reward shaping is required, providing the manually designed with relatively denser reward signal. This reward shaping requires some domain-specific knowledge and it sometimes against the long-term goal of the agent, so it is inpractical to use in some cases. This paper proposes that curiosity can serve as an intrinsic reward signal to enable the ..