《Socially Aware Motion Planning with Deep Reinforcement Learning》论文笔记

2020-06-06

Socially Aware Motion Planning with Deep Reinforcement Learning >>

A. 0 >> Abstract

1. important things >>

a) model subtle human behaviors >>

2. Traditional stu_method >> using feature-matching techniques to describe and imitate human paths

3. This paper >> not to do

B. 1 >> INTRODUCTION

1. challenging >>

b) not known >>

pedestrians’ intents

2. traditional ways >> avoiding collision

a) disadvantage >> generate unsafe/unnatural movements

3. Other ways >> predicted paths

pedestrians’ hidden intents

a) collision-free path >>

(1) freezing robot problem, >>

4. 解决办法 >> account for cooperation

a) model/anticipate the impact of the robot’s motion on the nearby pedestrians. 建立/预测及机器人运动对附近行人的影响 >>

a) Type 1 >> model-based

(1) extensions of multiagent collision avoidance >>

(2) 缺陷 1 >> unclear

precise geometric rules

i) 缺陷 2 >> oscillatory paths

b) Type 2 >> learning-based

(1) a policy that emulates human behaviors by matching feature statistics 通过匹配特征统计信息来制定一种模拟人类行为的策略 >>

i) 例子 >> Inverse Reinforcement Learning 逆强化学习

(2) 优 >> more closely resemble human behaviors 更接近人类行为的路径

(3) 缺 >> higher computational cost 更高的计算成本

i) different environments 不同环境的适用性存疑 >>

a) solving a cooperative collision avoidance problem. 解决类人导航从解决协同避碰问题 >>

7. 本文主要贡献 >> main contributions

a) socially aware collision avoidance 社会意识避免碰撞 >>

b) a symmetrical neural network 发展对称神经网络结构 —> 多智能体场景 >>

(1) multiagent (n > 2) scenarios >>

c) demonstrating 演示 >>

C. 2 >> II. BACKGROUND

1. 碰撞避免（with DRL） >> A. Collision Avoidance with Deep Reinforcement Learning

a) a sequential decision making problem >>

b) 挑战 >> A major challenge in finding the optimal value function

(1) joint state sjnis a continuous, high-dimensional vector, 联合状态是一个连续高维向量 >>

i) t impractical to discretize and enumerate the state space 离散化枚举空间不切实际 >>

c) 解决方法 >> deep neural networks

d) this work extends the collision avoidance with deep reinforcement learning framework (CADRL) [14] to characterize and induce socially aware behaviors in multiagent systems. 本研究扩展了深度强化学习框架的碰撞避免机制，以刻画和诱导多智能体系统中的社会感知行为 >>

e) recent works >>

(1) in unknown static environments >>

(2) computing control inputs directly from raw sensor data >>

(1) cooperative >>

(2) time- efficient >>

(3) two properties >>

i) min-time reward function >>

ii) reciprocity assumption 互易性假设 >>

D. 3 >> III. APPROACH

1. 双 -> 多 >> two-agent

multiagent

a) To induce a particular norm, a small bias can be introduced in the RL training process in favor of one set of behaviors over others. >>

b) defining the penalty set Snorm affect the rate of convergence. >>

3. B. Training a Multiagent Value Network >>

a) two important modifications >>

(1) 1 >> two experience sets, E, Eb, are used to distinguish between trajectories

that reached the goals and those that ended in a collision 使用两个经验集来区分达到目标的轨迹和以碰撞结束的轨迹

(2) 2 >> during the training process, trajectories generated by SA-CADRL are reflected in the x-axis with probability 生成的轨迹以概率的形式反映在x轴上

b) This procedure exploits symmetry in the problem to explore different topologies more efficiently. 利用对称性，更有效地探索不同的拓扑 >>

c) an n-agent network can be used to generate trajectories for scenarios with fewer agents 对于更少的代理，使用 n 代理网络可以生成场景轨迹 >>

(1) The use of this parametrization avoids the need for training many different networks. 这种参数化的使用避免了训练许多不同的网络的需要 >>

E. 4 >> IV. RESULTS

1. A. Computational Details >>

2. B. Simulation Results >>

3. C. Hardware Experiment >>

F. 5 >> V. CONCLUSION

1. SA-CADRL >>

2. multiagent >>

jsonContent: meta: false pages: false posts: title: true date: true path: true text: false raw: false content: false slug: false updated: false comments: false link: false permalink: false excerpt: false categories: false tags: true