1. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. >>
A. 0 >> Abstract
1. reinforcement learning often compromise the autonomy RL 损害学习过程的自主性 >>
a) Deep reinforcement learning 通过DRL缓解 >>
(1) sample complexity. 受限于高样本复杂度 >>
i) simple tasks >>
ii) simulated settings >>
2. this paper >>
a) deep Q-functions >>
(1) policy updates asynchronously. 异步汇集策略 >>
B. 1 >> I. INTRODUCTION
1. deep Q-functions >>
a) without user-provided demonstrations >>
2. challenges >>
high sample-complexity
a) 深度确定策略梯度算法 >> Deep Deterministic Policy Gradient algorithm (DDPG)
b) 规范化优势函数算法 >> (NAF)
c) 并行化 >> parallelizing the algorithm
3. main contribution >>
a) a demonstration of asynchronous deep reinforcement learning using our parallel NAF algorithm across a cluster of robots. 在一个机器人集群中使用我们的并行NAF算法来演示异步深度强化学习 >>
b) a simple and effective safety mechanism for constraining exploration at training time 一个简单有效的安全机制来约束训练时的探索 >>
C. 2 >> II. RELATED WORK
1. used low-dimensional policy representations >>
Many of the RL
a) high- dimensional systems >>
recently