In the current system, images are captured using two rectified b/w stereo CCD cameras that have been modified to take advantage of light in the near-infrared range (wavelength approx. 900nm). The baseline of the cameras is very small so we can take advantage of targets that are close to the sensor. Since the pattern is invisible it would be useless to use color CCD cameras and would only increase the cost of the system more. Further, using only near infrared light removes difficulties that most systems have with environments that have highly dynamic lighting. These overall features lead to a system that captures high quality images of the occupant.
As mentioned, because of the use of structured lighting, there is a high degree of control on where edges and other textures occur. This is ideal when extending this sytem for use with feature extraction, and pattern classification. The important features of the object are depth and shape rather than more highly varying features like color, edges, or features extracted via haar wavelets, as in [2].
From a stereo pair of images disparity is calculated by minimizing a cost function. The cost function in this case is the Sum-of-Squared-Differences (SSD). The minimum SSD value for a reference pixel along its search in a paired image gives the closest brightness match in a window of the other image.
The correlation window, this window is called, is sized so that specific features show up in the search. In particular, since we are projecting structured light onto the scene it is a good idea to make the correlation window about the size of the large stripe. In the case of gaussian noise, to make the window larger than an average "speckle" on the scene. This is so that the brightness calculated over the window is unique giving better matching. Further, the addition of stripes gives us a known structure on the scene that allows for the recovery of 3D information from the scene. This is an improvment on current stripe pattern systems becuase we use both disprity and known stripe sizes to recover additional information about distance to the target.
We expect in further tests that gaussian white noise will give us better depth results and that additionally improving the focus of the camera will give even better results. The algorithm is currently being implemented in real-time so that data can be collected about how successfully the algorithm can run under dynamic lighting.
[1] Siebert & Marshall. Human Body 3D Imaging by Speckle Texture Projection Photogrammetry. In Sensor Review, pp. 218--226, UK, Vol. 20, No. 3 2000.
[2] Zhang, Kiselewich, & Bauson. A monocular Vision-Based Occupant Classification Approach for Smart Airbag Deployment. In IEEE Intelligent Vehicles Conference Proceedings 2005, pp. 632--637, US, 2005.
Computer Science and Artificial Intelligence Laboratory (CSAIL) The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA tel:+1-617-253-0073 - publications@csail.mit.edu |