MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Research Abstracts Home

CSAIL Digital Archive

Research Activities

CSAIL Home

horizontal line

Research Abstracts - 2007
horizontal line

horizontal line

Perception and Navigation for the DARPA Urban Challenge

David C. Moore, John Leonard & Seth Teller

Overview

The DARPA Urban Challenge is a competition between autonomous vehicles to navigate a 60-mile course in a mock urban environment, obey traffic laws, plan routes, and visit a series of checkpoints, all without human intervention. The competition, which takes place on November 3, 2007, differs from the previous DARPA Grand Challenges in 2004 and 2005, in that it requires vehicles to rely primarily on local perception for navigation instead of GPS. This new emphasis on perception arises because driving situations such as navigating intersections and following lane markings require the vehicle to respond and make decisions in real time based on the perceived state of other vehicles and road conditions.

A number of CSAIL students and faculty are participating in the MIT Urban Challenge Team because these perception challenges offer opportunities for new research in the areas of computer vision, robotic navigation, and sensor fusion. Specifically, the work described herein focuses on two main contributions to the project: Lane following using computer vision and a novel strategy for managing sensor data in a GPS-deprived environment. This work has an emphasis on speed, system engineering, and robustness to enable it to perform under the real-world conditions required by the Challenge.

Coordinate Frames for Managing Sensor Data

Our perception system for the Challenge has numerous sensor modalities: Planar laser scanners (LIDAR), cameras, RADAR, GPS, an inertial measurement unit (IMU), and wheel odometry. The typical approach to dealing with this type of data is to project it all into a map of the environment and then perform reasoning about vehicle control directly on the map. This provides a convenient abstraction barrier between the different sensors and the planner. However, care must be taken when generating this map. Ideally, we would like the map to exactly represent the true state of the world, where each piece of data is registered with its position relative to the Earth. In reality, GPS is one of our most unreliable sensors due to multipath and obstruction. Thus, we construct the map in a local frame that is derived from purely relative observations and only translate to GPS coordinates as needed by mission requirements such as hitting checkpoints and parking (which occur infrequently).


Figure 1 The left figure shows three of our coordinate frames, the global frame (registered to GPS), the local frame (registered to perceptual data), and the body frame. On the right hand side, a GPS error is shown to corrupt the transformation between the local and global frame, but perceptual data (the observed line markings) remains properly registered to the vehicle's position.

Our perception software deals with four different coordinate frames. The local frame is nominally fixed relative to the Earth, and the vehicle's position in it is updated using dead-reckoning, fusing odometry and IMU data. In this frame, the vehicle's position is defined to be exact, with no covariance estimate. It has the property that over short time scales, the vehicle's motion will vary smoothly and be close to reality. Self-consistency over long time scales is not a goal of the local frame.

The global frame is fixed relative to GPS coordinates. We compute the transformation between the global and local frame as new GPS measurements arrive. It will be self-consistent over long time scales, but there may be errors and discontinuities over short time scales.

The body frame has its origin at the center of the vehicle's rear axle, and moves relative to the local frame as the vehicle moves. The chassis frame has an origin somewhere on the vehicle's rigid structure. It is very similar to the body frame, except moves slightly relative to the body frame because of the suspension system between wheels and chassis. The transformation between the two is measured over time using sensors in the suspension.

Sensor measurements from the cameras, LIDARs, and RADARs comes in relative to the chassis frame (we consider the sensors to be fixed relative to the chassis). We then transform from chassis to body, then from body to local, and then insert this information into the local map. It will remain accurate for short time scales because of the high accuracy of dead-reckoning. Over longer time scales, data will be discarded from the local map, since the vehicle will have moved where it is no longer relevant. In this way, perceptual information cannot be corrupted by GPS errors, because if a gross GPS error occurs, the perception information will stay fixed relative to the vehicle in the local frame, as shown in Figure 1.

From time to time, we will need to perform actions that are specified by DARPA in the GPS frame. This includes hitting checkpoints and parking. When this need arises, we will transform the target point from the global frame to the local frame, and then let the planner run. Although this transformation is sensitive to GPS errors, it is a necessary compromise since the DARPA specifies these mission goals only in the global frame.

Lane Finding with Computer Vision

One of the Challenge's new requirements beyond previous challenges is that the vehicle must be able to cope with distant GPS waypoints by following the natural curves of the road in between. We achieve this by detecting line markings using computer vision applied to camera data.

Figure 2 A preliminary result showing detected line markings (labeled with bright green) overlayed on a video sequence captured from the vehicle. Click to see full video.

Our algorithm consists of several steps:

1. Matched Filter: The image is filtered with a horizontal kernel tuned to the size of the expected lane width at that row in the image, based upon the expected location of the ground plane and the intrinsic and extrinsic parameters of the camera. This kernel greatly increases the signal-to-noise ratio of line-marking-like features.

2. Feature Detection: Line-like features are located by looking for peaks in the previous-steps filter response that coincide with large image gradients perpendicular to the expected direction of the lane projected into the image.

3. RANSAC: Random Sampling Consensus (RANSAC) is used to select groups of line-like features that are self-consistent in such a way that they indicate a single line marking according to a model of how a line marking should be shaped (a spline with some level of expected smoothness).

4. Hypothesis Tracking: The lines detected using RANSAC are tracked over time as the vehicle moves to ensure that the line remains consistent. This smooths our estimate of the lane position over time and provides good initial guesses for step 2 in future frames.

Figure 2 shows a preliminary result of the this algorithm with steps 1 and 2 implemented, and limited form of step 3.

Acknowledgements

Thanks to Albert Huang for assistance with the coordinate frame figures.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu