Chapter 2: Vision Systems: Cameras & Depth Sensing

Learning Objectives

After completing this chapter, you will be able to:

Understand different camera models and their applications in robotics
Perform camera calibration for accurate perception
Explain stereo vision principles and depth estimation techniques
Work with RGB-D sensors and their data processing
Implement traditional and modern computer vision approaches
Apply vision-based perception to robotics problems

Camera Models and Representation

Cameras are among the most important sensors for robot perception. Understanding how they work is crucial for effective vision-based systems.

Pinhole Camera Model

The pinhole camera model is the simplest mathematical model of a camera. It describes how a 3D point in the world is projected onto the 2D image plane:

3D point: (X, Y, Z)
2D image point: (u, v)
Relationship: u = f * X/Z + cx, v = f * Y/Z + cy

Where:

f is the focal length
(cx, cy) is the principal point

Distortion Models

Real cameras deviate from the ideal pinhole model due to lens distortion. Two main types are:

Radial distortion: Caused by the curved lens surfaces
Tangential distortion: Caused by misalignment of lens elements

Camera Calibration

Camera calibration is the process of determining the intrinsic and extrinsic parameters of a camera.

Intrinsic Parameters

Focal length (fx, fy)
Principal point (cx, cy)
Skew coefficient (s)
Distortion coefficients (k1, k2, p1, p2, k3)

Extrinsic Parameters

Rotation matrix R (3x3)
Translation vector t (3x1)

Calibration Process

Use a calibration pattern (e.g., checkerboard)
Capture multiple images from different viewpoints
Detect calibration pattern corners
Estimate camera parameters using optimization

Stereo Vision and Depth Estimation

Stereo vision uses two or more cameras to estimate depth by triangulation.

Stereo Geometry

Baseline: Distance between camera centers
Epipolar constraint: Points in one image correspond to lines in the other
Disparity: Difference in position of corresponding points

Depth from Disparity

depth = (baseline * focal_length) / disparity

Stereo Matching Challenges

Occlusions: Parts visible in one camera but not the other
Textureless regions: Areas without distinctive features
Specular surfaces: Reflective surfaces that appear differently

RGB-D Sensors

RGB-D sensors provide both color (RGB) and depth (D) information simultaneously.

Common RGB-D Sensors

Microsoft Kinect: Uses structured light
Intel RealSense: Uses stereo vision or structured light
Apple LiDAR: Uses time-of-flight technology

RGB-D Data Processing

Registration: Aligning RGB and depth images
Point cloud generation: Converting depth images to 3D points
Filtering: Removing noise and outliers from depth data

Traditional Computer Vision Approaches

Feature Detection and Description

SIFT (Scale-Invariant Feature Transform): Detects and describes local features
SURF (Speeded Up Robust Features): Faster alternative to SIFT
ORB (Oriented FAST and Rotated BRIEF): Efficient binary descriptor

Image Processing Techniques

Edge detection: Canny, Sobel operators
Corner detection: Harris corner detector
Template matching: Finding patterns in images

Modern Computer Vision Approaches

Deep Learning-Based Methods

Object detection: YOLO, R-CNN families
Semantic segmentation: U-Net, DeepLab
Pose estimation: Estimating object or human poses

Convolutional Neural Networks (CNNs)

Feature extraction through convolutional layers
Hierarchical feature learning
End-to-end training for perception tasks

Practical Exercise

Implement a simple camera calibration using OpenCV:

Print a checkerboard pattern
Capture at least 10 images from different angles
Use OpenCV's calibration functions to estimate camera parameters
Undistort an image using the computed parameters

Integration with ROS 2

In ROS 2, camera data is typically published as:

sensor_msgs/Image: Raw image data
sensor_msgs/CameraInfo: Camera calibration parameters
sensor_msgs/PointCloud2: 3D point cloud data from depth sensors

Summary

Vision systems are crucial for robot perception, providing rich information about the environment. Understanding both traditional and modern approaches allows you to choose the right technique for your application. In the next chapter, we'll explore how to combine multiple sensors for robust perception.

Chapter 2: Vision Systems: Cameras & Depth Sensing

Learning Objectives​

Camera Models and Representation​

Pinhole Camera Model​

Distortion Models​

Camera Calibration​

Intrinsic Parameters​

Extrinsic Parameters​

Calibration Process​

Stereo Vision and Depth Estimation​

Stereo Geometry​

Depth from Disparity​

Stereo Matching Challenges​

RGB-D Sensors​

Common RGB-D Sensors​

RGB-D Data Processing​

Traditional Computer Vision Approaches​

Feature Detection and Description​

Image Processing Techniques​

Modern Computer Vision Approaches​

Deep Learning-Based Methods​

Convolutional Neural Networks (CNNs)​

Practical Exercise​

Integration with ROS 2​

Summary​