Skip to main content

Chapter 2: Vision Systems: Cameras & Depth Sensing

Learning Objectives

After completing this chapter, you will be able to:

  • Understand different camera models and their applications in robotics
  • Perform camera calibration for accurate perception
  • Explain stereo vision principles and depth estimation techniques
  • Work with RGB-D sensors and their data processing
  • Implement traditional and modern computer vision approaches
  • Apply vision-based perception to robotics problems

Camera Models and Representation

Cameras are among the most important sensors for robot perception. Understanding how they work is crucial for effective vision-based systems.

Pinhole Camera Model

The pinhole camera model is the simplest mathematical model of a camera. It describes how a 3D point in the world is projected onto the 2D image plane:

  • 3D point: (X, Y, Z)
  • 2D image point: (u, v)
  • Relationship: u = f * X/Z + cx, v = f * Y/Z + cy

Where:

  • f is the focal length
  • (cx, cy) is the principal point

Distortion Models

Real cameras deviate from the ideal pinhole model due to lens distortion. Two main types are:

  • Radial distortion: Caused by the curved lens surfaces
  • Tangential distortion: Caused by misalignment of lens elements

Camera Calibration

Camera calibration is the process of determining the intrinsic and extrinsic parameters of a camera.

Intrinsic Parameters

  • Focal length (fx, fy)
  • Principal point (cx, cy)
  • Skew coefficient (s)
  • Distortion coefficients (k1, k2, p1, p2, k3)

Extrinsic Parameters

  • Rotation matrix R (3x3)
  • Translation vector t (3x1)

Calibration Process

  1. Use a calibration pattern (e.g., checkerboard)
  2. Capture multiple images from different viewpoints
  3. Detect calibration pattern corners
  4. Estimate camera parameters using optimization

Stereo Vision and Depth Estimation

Stereo vision uses two or more cameras to estimate depth by triangulation.

Stereo Geometry

  • Baseline: Distance between camera centers
  • Epipolar constraint: Points in one image correspond to lines in the other
  • Disparity: Difference in position of corresponding points

Depth from Disparity

depth = (baseline * focal_length) / disparity

Stereo Matching Challenges

  • Occlusions: Parts visible in one camera but not the other
  • Textureless regions: Areas without distinctive features
  • Specular surfaces: Reflective surfaces that appear differently

RGB-D Sensors

RGB-D sensors provide both color (RGB) and depth (D) information simultaneously.

Common RGB-D Sensors

  • Microsoft Kinect: Uses structured light
  • Intel RealSense: Uses stereo vision or structured light
  • Apple LiDAR: Uses time-of-flight technology

RGB-D Data Processing

  • Registration: Aligning RGB and depth images
  • Point cloud generation: Converting depth images to 3D points
  • Filtering: Removing noise and outliers from depth data

Traditional Computer Vision Approaches

Feature Detection and Description

  • SIFT (Scale-Invariant Feature Transform): Detects and describes local features
  • SURF (Speeded Up Robust Features): Faster alternative to SIFT
  • ORB (Oriented FAST and Rotated BRIEF): Efficient binary descriptor

Image Processing Techniques

  • Edge detection: Canny, Sobel operators
  • Corner detection: Harris corner detector
  • Template matching: Finding patterns in images

Modern Computer Vision Approaches

Deep Learning-Based Methods

  • Object detection: YOLO, R-CNN families
  • Semantic segmentation: U-Net, DeepLab
  • Pose estimation: Estimating object or human poses

Convolutional Neural Networks (CNNs)

  • Feature extraction through convolutional layers
  • Hierarchical feature learning
  • End-to-end training for perception tasks

Practical Exercise

Implement a simple camera calibration using OpenCV:

  1. Print a checkerboard pattern
  2. Capture at least 10 images from different angles
  3. Use OpenCV's calibration functions to estimate camera parameters
  4. Undistort an image using the computed parameters

Integration with ROS 2

In ROS 2, camera data is typically published as:

  • sensor_msgs/Image: Raw image data
  • sensor_msgs/CameraInfo: Camera calibration parameters
  • sensor_msgs/PointCloud2: 3D point cloud data from depth sensors

Summary

Vision systems are crucial for robot perception, providing rich information about the environment. Understanding both traditional and modern approaches allows you to choose the right technique for your application. In the next chapter, we'll explore how to combine multiple sensors for robust perception.