본문 바로가기

분류 전체보기

(40)
[2018.08] SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning Model-based reinforcement learning (RL) has proven to be a data-efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observ..
[2021.03] A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions Previously related surveys have begun to classify existing work mainly based on the key components of NAS: search space, search strategy, and evaluation strategy. While this classification method is more intuitive, it is difficult for readers to grasp the challenges and the landmark work involved. Therefore, in this survey, we provide a new perspective: beginning with an overview of the characte..
[2017.05] Constrained Policy Optimization For many applications of reinforcement learning, it can be more convenient to specify a constraint along with a reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016; Schulman et al., 2015; Lillicrap et al., 2016; Levine et al., 2016) have enabled new capabilities in h..
[2019.05] NAS-Bench-101: Towards Reproducible Neural Architecture Search The paper introduces NAS-Bench-101, the first public architecture dataset for NAS research. 423k unique convolutional architectures were trained and evaluated three times on CIFAR-10 and saved as tabular data. This allows querying the precomputed dataset of a diverse range of models in milliseconds. Because NAS-Bench-101 exhaustively evaluates a search space, it permits, for the first time, a co..
[2018.03] Policy Optimization with Demonstration Exploration remains a significant challenge to reinforcement learning methods, especially in environments where reward signals are sparse. Recent methods of learning from demonstrations have shown to be promising in overcoming exploration difficulties but typically require considerable high-quality demonstrations that are difficult to collect. We propose to effectively leverage available demonst..
[2018.02] IMPALA: scalable distributed Deep-RL with importance weighted actor-Learner architectures One of the main challenges in training a single agent on many tasks at once is scalability. Since the current state-of-the-art methods like A3C (Mnih et al., 2016) or UNREAL (Jaderberg et al., 2017b) can require as much as a billion frames and multiple days to master a single domain, training them on tens of domains at once is too slow to be practical. This paper introduces a new distributed age..
[2017.05] Curiosity-driven exploration by self-supervised prediction In extremely sparse reward settings, some kind of reward shaping is required, providing the manually designed with relatively denser reward signal. This reward shaping requires some domain-specific knowledge and it sometimes against the long-term goal of the agent, so it is inpractical to use in some cases. This paper proposes that curiosity can serve as an intrinsic reward signal to enable the ..
Neural tangent kernel (NTK) and beyond "Meanwhile, we see the evolving development of deep learning theory on neural networks. NTK (neural tangent kernel) is proposed to characterize the gradient descent training dynamics of infinite wide (Jacot et al., 2018) or finite wide deep networks (Hanin & Nica, 2019). Wide networks are also proved to evolve as linear models under gradient descent (Lee et al., 2019). This is further leveraged ..