论文阅读: Infrastructure-Based Calibration of a Multi-Camera Rig

Abstract

在线标定很重要．

但是目前的方法都计算量都很高．

我们的方案不需要标定板之类的东西．

我们的方案不需要假设相机有重合的FOV,也不需要任何的初始猜测. 当相机模组行驶穿过之前建过地图的区域, 我们就用地图和同步的相机图像匹配. 最后我们找到相机位姿和内点2D-3D匹配.

1. Introduction

机器人系统的大量普及会引起一系列本质的问题--关于long-term autonomy of robotic systems.

我们提出Infrastructure-Based Calibration.

标定只有在相机内/外参允许2D点和3D点准确关联的时候才准确.

环境因素例如温度变化, 震动等会让相机外参比内参比起起始值偏移容易的多.

这这篇文章中, 我们集中估计相机外参, 然后假设相机内参是常数.

我们展现了一个自标定算法对于特殊运动的需求来bootstrap这个过程

SLAM-based的方法虽然不需要一个先验地图, 但是需要一个帧间匹配的穷举搜索, 还需要回环检测(有时候会失效).

通过基于先验地图, 我们移除了寻找帧间匹配和回环的需求, 我们也不需要做全局BA. 我们方案更加简单, 鲁邦, 计算更加轻量.

"The world is a giant chessboard."

2. Platform

我们用Kannala-Brandt camera model. 有8个参数, \(k_1, k_2, k_3, k_4, m_u, m_v, u_0, v_0\)
\[
\begin{aligned} \theta &=\arccos \frac{Z}{\|P\|} \\ \phi &=\arctan \frac{Y}{X} \\ r(\theta) &=\theta+k_{1} \theta^{3}+k_{2} \theta^{5}+k_{3} \theta^{7}+k_{4} \theta^{9} \end{aligned}
\]

\[
\begin{array}{l}{\left[\begin{array}{l}{x} \\ {y}\end{array}\right]=r(\theta)\left[\begin{array}{l}{\cos \phi} \\ {\sin \phi}\end{array}\right]} \\ {\left[\begin{array}{l}{u} \\ {v} \\ {1}\end{array}\right]=\left[\begin{array}{ccc}{m_{u}} & {0} & {u_{0}} \\ {0} & {m_{v}} & {v_{0}} \\ {0} & {0} & {1}\end{array}\right]\left[\begin{array}{l}{x} \\ {y} \\ {1}\end{array}\right]}\end{array}
\]

这里, \(r(\theta)\)是图像点和主点在归一化平面的距离.

当然, 给定图像点, 我们也可以计算对应点光线.
\[
\left[\begin{array}{l}{x} \\ {y} \\ {1}\end{array}\right]=\left[\begin{array}{ccc}{m_{u}} & {0} & {u_{0}} \\ {0} & {m_{v}} & {v_{0}} \\ {0} & {0} & {1}\end{array}\right]^{-1}\left[\begin{array}{l}{u} \\ {v} \\ {1}\end{array}\right]
\]

\[
\begin{aligned} d &=\sqrt{x^{2}+y^{2}} \\ &=\theta+k_{1} \theta^{3}+k_{2} \theta^{5}+k_{3} \theta^{7}+k_{4} \theta^{9} \end{aligned}
\]

\[
\begin{aligned} \phi &=\left\{\begin{array}{ll}{0} & {\text { if } d=0} \\ {\arctan \frac{y}{x}} & {\text { otherwise }}\end{array}\right.\\\left[\begin{array}{l}{X} \\ {Y} \\ {Z}\end{array}\right] &=\left[\begin{array}{c}{\sin \theta \cos \phi} \\ {\sin \theta \sin \phi} \\ {\cos \theta}\end{array}\right] \end{aligned}
\]

初始的时候, 我们设置
\[
k_{1}=k_{2}=k_{3}=k_{4}=0, m_{u}=m_{v}=f
\]

\[
u_{0}=\frac{w}{2} \text { and } v_{0}=\frac{h}{2}
\]

4. Infrastructure-Based Calibration

我们用一个标定已经进行过的来建立一个稀疏特征图.

非线性优化步骤优化了camera-rig位姿和外参位姿.

A. Building A Sparse Feature Map

B. Visual Localization

视觉定位用图像和map作为输入.

对于每一张图, 我们用词典书来寻找n个最像的图
2D-2D特征匹配
用EPnP. 如果内点数量不够(>25)的话, 我们认为这个相机位姿是未知的.
我们在下面两个条件被满足的时候会存储这个相机位姿:
- 最少两个相机位姿被找到
- 当前相机位姿和之前的相机位姿

C. Inferring Camera Extrinsics and Rig Poses

D. Non-Linear Refinement

优化相机外参和rig poses．
\[
\min _{P_{i}, T_{c}} \sum_{c, i, p} \rho\left(\left\|\pi\left(C_{c}, P_{i}, T_{c}, X_{p}\right)-p_{c i p}\right\|^{2}\right)
\]
\(\pi\): projection fuction, \(X_p\)是点被camera \(c\)看到, \(C_c\)相机内参, \(T_c\): 外参.

\(p_{cip}\): 图像坐标.

E. Hand-Eye Calibration

如果有里程计的话, 我们也可以选择获得rig-odometry transform, 然后camera-odometry transform.

我们计算rig-odometry transform通过手眼标定问题的最小二乘方法.

5. Implementation

我们用了SURF来检测和计算描述子.

用了DBoW2来做词典树.

非线性优化用ceres写的.

6. Experiments and Results

我们在室内的停车厂和室外的市区场景都测了,