- Decentralized Non-communicating Multiagent Collision Avoidance >>
- Problem >>
feasible, collision-free paths for multiagent systems
a) non- communicating scenarios is >>
unobservable
- Solution in this paper >>
online computation to
offline learning
a) value network that encodes the estimated time to the goal given an agent’s joint configuration (positions and velocities) with its neighbors. 该方法开发了一个值网络,在给定代理和位置速度下,对目标的估计时间进行编码 >>
(1) efficient >>
queries
collision- free velocity vector
(2) considers >>
other agents’ motion
B. 1 >>
INTRODUCTION
- CORE >>
Collision avoidance
a) collision-free, time efficient paths >>
(1) computationally tractable >>
- reliable communication >>
centralized planner.
collision avoidance
a) Way1 >>
separation constraints
(1) Problem >>
computationally prohibitive
b) Way2 >>
distributed algorithms based on message-passing schemes
(1) advantage >>
without needing to form a joint optimization
- non-communicating >>
non-communicating collision avoidance
a) reaction-based >>
(1) example >>
reciprocal velocity obstacle (RVO)
i) shortcoming 1 >>
oscillatory
unnatural behaviors
ii) Shortcoming 2 >>
short-sighted
b) trajectory- based >>
(1) explicitly account for evolution of the joint (agent and neighbors) future states by anticipating other agents’ motion >>
(2) freezing robot problem >>
i) solution >>
account for interactions
(a) cause new problem >>
exacerbates the computational problem.
- 方法:强化学习 >>
to offload the expensive online computation to an offline training procedure.
a) 解释 >>
a computationally efficient (i.e., real-time implementable) interaction rule by learning a value function that implicitly encodes cooperative behaviors. 通过学习隐式编码协作行为的值函数实现实时的交互规则
- main contributions >>
a) 1. DRL 碰撞避免方法;2.智能体通用化;3.运动约束扩展公式;4.改进仿真结果 >>
C. 2 >>
II. PROBLEM FORMULATION
- 2.1 顺序决策 >>
A. Sequential Decision Making
a) partially-observable sequential decision making problem. 可以表述为部分可观测的序列决策问题。 >>
(1) Sot : 可测量可观察部分 >>
(2) Sht : 代理本身知道但隐藏部分 >>
(3) A common assumption is reciprocity, that is π = ˜ π, 假设:相互作用 >>
i) main difficulty is in handling the uncertainty in the other agent’s hidden intents 难点时隐藏的意图 >>
(4) Markovian policy, π(s0:t, ˜ so 0:t) = π(st, ˜ so) >>
i) an agent chooses a collision-free velocity that is closest to its preferred velocity 选择朝向其目标的速度 >>
(a) a fast update rate to react quickly to the other agent’s motion 通过快速的更新率来快速响应另一个代理的运动。 >>
(b) unnatural trajectories 缺点:短视而产生非自然轨迹 >>
(5) 基于轨迹的方法 >>
Trajectory-based methods
i) two steps >>
(a) step 1 >>
the other agent’s hidden state is inferred from its observed trajectory,ˆ˜ sh t= f(˜ so 0:t), where f(·) is a inference function 根据观察的轨迹,推断另一个主体的隐藏状态
(b) step 2 >>
a centralized path planning algorithm, π(s0:t, ˜ so 0:t) = πcentral(st, ˜ so t,ˆ˜ sh t), is employed to find jointly feasible paths 集中式路径规划算法,来寻找共同可行的路径。 计算成本高
(6) 基于强化学习的方法 >>
reinforcement learning
i) pre-computing a value function >>
- 2.2 强化学习 >>
B. Reinforcement Learning
a) solving sequential decision making problems with unknown state-transition dynamics 解决具有未知状态转移动力学的序列决策问题 >>
D. 3 >>
III. APPROACH
an algorithm for solving the two- agent RL problem
gen- eralizes its solution (policy) to multiagent collision avoidance 两智能体推广到多智能体碰撞避免中
A. Parametrization >>
B. Generating Paths Using a Value Network >>
a) by repeatedly maximizing an one-step lookahead value 通过重复最大化一步向前值来生成其目标的路径 >>
b) filtered velocity >>
- C. Training a Value Network >>
a) First, the training trajectories do not have to be optimal. 对由基线策略生成的一组轨迹惊醒监督训练来初始化值网络 >>
(1) ORCA 最优互反碰撞避免 >>
b) Second, the initialization training step is not simply emulating the ORCA policy. Rather, it learns a time to goal estimate (value function), 初始化还学习一个目标时间估计 用来形成新的轨迹跟踪算法 >>
(1) rein- forcement learning >>
c) hird, this learned value function is likely to be suboptimal. 可能是次优的 >>
- 合并运动约束 >>
D. Incorporating Kinematic Constraints
a) 当 agent 的航向角与目标不完全一致时, agent 可以在原地旋转。它既可以全速行进同时转向目标;也可以原地旋转再直线行进 >>
(1) CADRL 在上述行为平衡 >>
i) 当细线显示出它的航向角时,红色的代理选择首先打开该点,然后在它的航向角与目标完全对齐之前开始移动。 >>
- E. Multiagent Collision Avoidance >>
a) two-agent value network >>
E. 4 >>
IV. RESULTS
- A. Computational Complexity >>
a) 5.7ms >>
b) 62ms >>
c) offline training (Algorithm 2) took less than three hours >>
- B. Performance Comparison on a Crossing Scenario >>
F. 5 >>
V. CONCLUSION
- 一对代理模拟相互导航生成值网络。双代理—> 多代理。 ORCA 多 26% 的质量。 >>