CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Vision-Based System for Occupancy and Posture Analysis

Matthew T. Farrell, Ichiro Masaki & Berthold Horn

Over the past few years advances in computer vision have made vision-only safety systems a plausability. Intelligent vehicles is an important area for the application of advaces in human tracking and posture analysis. A simple automotive system, an airbag, deploys at 300km/hr regardless of the occupancy of the passenger seat. The system we are developing in the Intelligent Vehicles Group will be able to recognize different types of occupancy and the posture of that occupant relative to the airbag. This is so we can avoid injury to the passenger wether they are an infant, child, or adult.

Basic System Description

The basic method for the recovery of posture and occupancy of the passenger seat is to use a projected pattern onto the target to recover information about depth from the passenger. The pattern is projected onto the target in near-infrared. The type of pattern can be either gaussian white noise, as in [1], or known stripe patterns. There are different advantages and drawbacks to both. For example, the use of stripe patterns introduces a "phase ambiguity" during stereo matching causing an inaccurate depth to be calculated. However, this can be over-come with the appropiate adjustment to the wavelength of each stripe in the sine grating. This gives each "stripe" a unique width that improves stereo matching acccuracy, especially in regions of low texture difference.

The occupant has a stereo pair of images captured under structured illumination. The first box shows two stereo cameras on the outside with an LCD projector in the center. Next the stereo pair is captured and rectified.


Image Capture

In the current system, images are captured using two rectified b/w stereo CCD cameras that have been modified to take advantage of light in the near-infrared range (wavelength approx. 900nm). The baseline of the cameras is very small so we can take advantage of targets that are close to the sensor. Since the pattern is invisible it would be useless to use color CCD cameras and would only increase the cost of the system more. Further, using only near infrared light removes difficulties that most systems have with environments that have highly dynamic lighting. These overall features lead to a system that captures high quality images of the occupant.

As mentioned, because of the use of structured lighting, there is a high degree of control on where edges and other textures occur. This is ideal when extending this sytem for use with feature extraction, and pattern classification. The important features of the object are depth and shape rather than more highly varying features like color, edges, or features extracted via haar wavelets, as in [2].

Disparity Calculation


The stereo correspondence algorithm then computes the appropiate matches in each image. From this depth can be determined in the way shown.

From a stereo pair of images disparity is calculated by minimizing a cost function. The cost function in this case is the Sum-of-Squared-Differences (SSD). The minimum SSD value for a reference pixel along its search in a paired image gives the closest brightness match in a window of the other image.

The correlation window, this window is called, is sized so that specific features show up in the search. In particular, since we are projecting structured light onto the scene it is a good idea to make the correlation window about the size of the large stripe. In the case of gaussian noise, to make the window larger than an average "speckle" on the scene. This is so that the brightness calculated over the window is unique giving better matching. Further, the addition of stripes gives us a known structure on the scene that allows for the recovery of 3D information from the scene. This is an improvment on current stripe pattern systems becuase we use both disprity and known stripe sizes to recover additional information about distance to the target.

Experiments & Future directions

We expect in further tests that gaussian white noise will give us better depth results and that additionally improving the focus of the camera will give even better results. The algorithm is currently being implemented in real-time so that data can be collected about how successfully the algorithm can run under dynamic lighting.


[1] Siebert & Marshall. Human Body 3D Imaging by Speckle Texture Projection Photogrammetry. In Sensor Review, pp. 218--226, UK, Vol. 20, No. 3 2000.

[2] Zhang, Kiselewich, & Bauson. A monocular Vision-Based Occupant Classification Approach for Smart Airbag Deployment. In IEEE Intelligent Vehicles Conference Proceedings 2005, pp. 632--637, US, 2005.


vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu