Setup


In this section we introduce the data collection rig, used to capture synchronized RGB, depth, and 360° video data. Our setup combines two main components: a ZED X stereo camera with an integrated IMU, and an Insta360 X4 action camera, mounted together on a compact and portable rig.

Camera Overview

Recording Process

  1. Calibration Targets: We place calibration boards around the environment.
  2. Dual Recording: We start the 360 and stereo recordings in parallel. A laser flash visible in both cameras is used to synchronize the video streams.
  3. Two-Part Capture:
    • First, a full-scene trajectory is recorded using the 360 camera.
    • Then, a close-up pass of the calibration boards is captured to aid in optimizing pose estimation and establishing ground-truth positions.

Camera Pose & Coordinate Frames

Understanding the spatial relationship between the different camera views is essential for sensor fusion and pose estimation.

Coordinate Frame Conventions:
All frames follow a standard right-handed coordinate system.
Red = X-axis, Green = Z-axis, Blue = Y-axis

Relative Transformations

Each transformation below is represented as a 4×4 matrix in the form:
\[ \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ 0 & 1 \end{bmatrix} \]
where \( \mathbf{R} \in \mathbb{R}^{3 \times 3} \) is a rotation matrix and \( \mathbf{t} \in \mathbb{R}^{3 \times 1} \) is a translation vector. We have the following transformations: