Chapter 2: Vision Systems: Cameras & Depth Sensing
Learning Objectives
After completing this chapter, you will be able to:
- Understand different camera models and their applications in robotics
- Perform camera calibration for accurate perception
- Explain stereo vision principles and depth estimation techniques
- Work with RGB-D sensors and their data processing
- Implement traditional and modern computer vision approaches
- Apply vision-based perception to robotics problems
Camera Models and Representation
Cameras are among the most important sensors for robot perception. Understanding how they work is crucial for effective vision-based systems.
Pinhole Camera Model
The pinhole camera model is the simplest mathematical model of a camera. It describes how a 3D point in the world is projected onto the 2D image plane:
- 3D point: (X, Y, Z)
- 2D image point: (u, v)
- Relationship: u = f * X/Z + cx, v = f * Y/Z + cy
Where:
- f is the focal length
- (cx, cy) is the principal point
Distortion Models
Real cameras deviate from the ideal pinhole model due to lens distortion. Two main types are:
- Radial distortion: Caused by the curved lens surfaces
- Tangential distortion: Caused by misalignment of lens elements
Camera Calibration
Camera calibration is the process of determining the intrinsic and extrinsic parameters of a camera.
Intrinsic Parameters
- Focal length (fx, fy)
- Principal point (cx, cy)
- Skew coefficient (s)
- Distortion coefficients (k1, k2, p1, p2, k3)
Extrinsic Parameters
- Rotation matrix R (3x3)
- Translation vector t (3x1)
Calibration Process
- Use a calibration pattern (e.g., checkerboard)
- Capture multiple images from different viewpoints
- Detect calibration pattern corners
- Estimate camera parameters using optimization
Stereo Vision and Depth Estimation
Stereo vision uses two or more cameras to estimate depth by triangulation.
Stereo Geometry
- Baseline: Distance between camera centers
- Epipolar constraint: Points in one image correspond to lines in the other
- Disparity: Difference in position of corresponding points
Depth from Disparity
depth = (baseline * focal_length) / disparity
Stereo Matching Challenges
- Occlusions: Parts visible in one camera but not the other
- Textureless regions: Areas without distinctive features
- Specular surfaces: Reflective surfaces that appear differently
RGB-D Sensors
RGB-D sensors provide both color (RGB) and depth (D) information simultaneously.
Common RGB-D Sensors
- Microsoft Kinect: Uses structured light
- Intel RealSense: Uses stereo vision or structured light
- Apple LiDAR: Uses time-of-flight technology
RGB-D Data Processing
- Registration: Aligning RGB and depth images
- Point cloud generation: Converting depth images to 3D points
- Filtering: Removing noise and outliers from depth data
Traditional Computer Vision Approaches
Feature Detection and Description
- SIFT (Scale-Invariant Feature Transform): Detects and describes local features
- SURF (Speeded Up Robust Features): Faster alternative to SIFT
- ORB (Oriented FAST and Rotated BRIEF): Efficient binary descriptor
Image Processing Techniques
- Edge detection: Canny, Sobel operators
- Corner detection: Harris corner detector
- Template matching: Finding patterns in images
Modern Computer Vision Approaches
Deep Learning-Based Methods
- Object detection: YOLO, R-CNN families
- Semantic segmentation: U-Net, DeepLab
- Pose estimation: Estimating object or human poses
Convolutional Neural Networks (CNNs)
- Feature extraction through convolutional layers
- Hierarchical feature learning
- End-to-end training for perception tasks
Practical Exercise
Implement a simple camera calibration using OpenCV:
- Print a checkerboard pattern
- Capture at least 10 images from different angles
- Use OpenCV's calibration functions to estimate camera parameters
- Undistort an image using the computed parameters
Integration with ROS 2
In ROS 2, camera data is typically published as:
sensor_msgs/Image: Raw image datasensor_msgs/CameraInfo: Camera calibration parameterssensor_msgs/PointCloud2: 3D point cloud data from depth sensors
Summary
Vision systems are crucial for robot perception, providing rich information about the environment. Understanding both traditional and modern approaches allows you to choose the right technique for your application. In the next chapter, we'll explore how to combine multiple sensors for robust perception.