《Decentralized Non-communicating Multiagent Collision Avoidance》论文笔记

2020-06-03

Decentralized Non-communicating Multiagent Collision Avoidance >>

A. 0 >>
Abstract

Problem >>
feasible, collision-free paths for multiagent systems

a) non- communicating scenarios is >>
unobservable

Solution in this paper >>
online computation to
offline learning

a) value network that encodes the estimated time to the goal given an agent’s joint configuration (positions and velocities) with its neighbors. 该方法开发了一个值网络，在给定代理和位置速度下，对目标的估计时间进行编码 >>

(1) efficient >>
queries
collision- free velocity vector

(2) considers >>
other agents’ motion

B. 1 >>
INTRODUCTION

CORE >>
Collision avoidance

a) collision-free, time efficient paths >>

(1) computationally tractable >>

reliable communication >>
centralized planner.
collision avoidance

a) Way1 >>
separation constraints

(1) Problem >>
computationally prohibitive

b) Way2 >>
distributed algorithms based on message-passing schemes

(1) advantage >>
without needing to form a joint optimization

non-communicating >>
non-communicating collision avoidance

a) reaction-based >>

(1) example >>
reciprocal velocity obstacle (RVO)

i) shortcoming 1 >>
oscillatory
unnatural behaviors

ii) Shortcoming 2 >>
short-sighted

b) trajectory- based >>

(1) explicitly account for evolution of the joint (agent and neighbors) future states by anticipating other agents’ motion >>

(2) freezing robot problem >>

i) solution >>
account for interactions

(a) cause new problem >>
exacerbates the computational problem.

方法：强化学习 >>
to offload the expensive online computation to an offline training procedure.

a) 解释 >>
a computationally efficient (i.e., real-time implementable) interaction rule by learning a value function that implicitly encodes cooperative behaviors. 通过学习隐式编码协作行为的值函数实现实时的交互规则

main contributions >>

a) 1. DRL 碰撞避免方法；2.智能体通用化；3.运动约束扩展公式；4.改进仿真结果 >>

C. 2 >>
II. PROBLEM FORMULATION

2.1 顺序决策 >>
A. Sequential Decision Making

a) partially-observable sequential decision making problem. 可以表述为部分可观测的序列决策问题。 >>

(1) Sot : 可测量可观察部分 >>

(2) Sht : 代理本身知道但隐藏部分 >>

(3) A common assumption is reciprocity, that is π = ˜ π, 假设：相互作用 >>

i) main difficulty is in handling the uncertainty in the other agent’s hidden intents 难点时隐藏的意图 >>

(4) Markovian policy, π(s0:t, ˜ so 0:t) = π(st, ˜ so） >>

i) an agent chooses a collision-free velocity that is closest to its preferred velocity 选择朝向其目标的速度 >>

(a) a fast update rate to react quickly to the other agent’s motion 通过快速的更新率来快速响应另一个代理的运动。 >>

(b) unnatural trajectories 缺点：短视而产生非自然轨迹 >>

(5) 基于轨迹的方法 >>
Trajectory-based methods

i) two steps >>

(a) step 1 >>
the other agent’s hidden state is inferred from its observed trajectory,ˆ˜ sh t= f(˜ so 0:t), where f(·) is a inference function 根据观察的轨迹，推断另一个主体的隐藏状态

(b) step 2 >>
a centralized path planning algorithm, π(s0:t, ˜ so 0:t) = πcentral(st, ˜ so t,ˆ˜ sh t), is employed to find jointly feasible paths 集中式路径规划算法，来寻找共同可行的路径。计算成本高

(6) 基于强化学习的方法 >>
reinforcement learning

i) pre-computing a value function >>

2.2 强化学习 >>
B. Reinforcement Learning

a) solving sequential decision making problems with unknown state-transition dynamics 解决具有未知状态转移动力学的序列决策问题 >>

D. 3 >>
III. APPROACH
an algorithm for solving the two- agent RL problem
gen- eralizes its solution (policy) to multiagent collision avoidance 两智能体推广到多智能体碰撞避免中

A. Parametrization >>
B. Generating Paths Using a Value Network >>

a) by repeatedly maximizing an one-step lookahead value 通过重复最大化一步向前值来生成其目标的路径 >>

b) filtered velocity >>

C. Training a Value Network >>

a) First, the training trajectories do not have to be optimal. 对由基线策略生成的一组轨迹惊醒监督训练来初始化值网络 >>

(1) ORCA 最优互反碰撞避免 >>

b) Second, the initialization training step is not simply emulating the ORCA policy. Rather, it learns a time to goal estimate (value function), 初始化还学习一个目标时间估计用来形成新的轨迹跟踪算法 >>

(1) rein- forcement learning >>

c) hird, this learned value function is likely to be suboptimal. 可能是次优的 >>

合并运动约束 >>
D. Incorporating Kinematic Constraints

a) 当 agent 的航向角与目标不完全一致时， agent 可以在原地旋转。它既可以全速行进同时转向目标；也可以原地旋转再直线行进 >>

(1) CADRL 在上述行为平衡 >>

i) 当细线显示出它的航向角时，红色的代理选择首先打开该点，然后在它的航向角与目标完全对齐之前开始移动。 >>

E. Multiagent Collision Avoidance >>

a) two-agent value network >>

E. 4 >>
IV. RESULTS

A. Computational Complexity >>

a) 5.7ms >>

b) 62ms >>

c) offline training (Algorithm 2) took less than three hours >>

B. Performance Comparison on a Crossing Scenario >>

F. 5 >>
V. CONCLUSION

一对代理模拟相互导航生成值网络。双代理—> 多代理。 ORCA 多 26% 的质量。 >>