Setup
    
  
    
     In this section we introduce the data collection rig, used to capture synchronized RGB, depth, and 360° video data.
      Our setup combines two main components: a ZED X stereo camera with an integrated IMU, and an Insta360 X4 action camera, mounted together on a compact and portable rig.
    
  
    Camera Overview
    
      - Insta360 X4 Camera: Captures ultra-high-resolution 360° videos at 60 FPS. It is used to collect immersive views of the scene and to provide ground-truth camera poses from a user-perspective and ground-truth calibration view.
- ZED X Stereo Camera: Provides high-resolution RGB-D frames (1920×1200 at 60 FPS) with per-pixel depth and confidence estimates. It also includes an IMU tightly synced to the stereo video stream.
- NVIDIA Jetson Orin NX: Mounted in a backpack with a custom power setup and external SSD, this embedded device handles high-throughput data recording from the ZED camera.
Recording Process
    
      - Calibration Targets: We place calibration boards around the environment.
- Dual Recording: We start the 360 and stereo recordings in parallel. A laser flash visible in both cameras is used to synchronize the video streams.
- Two-Part Capture: 
        
          - First, a full-scene trajectory is recorded using the 360 camera.
- Then, a close-up pass of the calibration boards is captured to aid in optimizing pose estimation and establishing ground-truth positions.
 
Camera Pose & Coordinate Frames
    Understanding the spatial relationship between the different camera views is essential for sensor fusion and pose estimation.
    Coordinate Frame Conventions:
    All frames follow a standard right-handed coordinate system.
    Red = X-axis, Green = Z-axis, Blue = Y-axis
    
  
    Relative Transformations
    Each transformation below is represented as a 4×4 matrix in the form:
        
        \[ 
            \begin{bmatrix}
            \mathbf{R} & \mathbf{t} \\
            0 & 1
            \end{bmatrix}
            \]
        
        
            where 
            \( \mathbf{R} \in \mathbb{R}^{3 \times 3} \)
            is a rotation matrix and 
            \( \mathbf{t} \in \mathbb{R}^{3 \times 1} \)
            is a translation vector. We have the following transformations:
          
   
      - 360 GT View ↔ 360 User View
 These views are two perspectives from the 360 camera:
          - GT View faces downward toward the calibration board (ground truth reference).
- User View faces outward, representing the user's visual experience.
- The 4×4 relative transformation from the 360 GT View coordinate system to the 360 User View coordinate system is provided as relative_pose_gt_to_user.npy.
 
- 360 GT View ↔ ZED Left Camera
 The ZED camera is mounted above the 360 camera, slightly angled forward.
          - It allows mapping between the depth/IMU data and the 360 user perspective.
- Essential for unified pose estimation and trajectory reconstruction.
- The 4×4 relative transformation from the 360 GT View coordinate system to the ZED Left Camera coordinate system is provided as relative_pose_gt_to_zed.npy.
 
- ZED Left Camera ↔ ZED IMU
 The ZED X camera includes a built-in IMU that is factory-calibrated and tightly synchronized with the stereo camera.
            -         The 4×4 transformation between the ZED Left camera and its IMU is provided as relative_pose_zed_to_imu.npy.
 More information about the ZED IMU can be found on the 
        official StereoLabs documentation.