CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Comparing Visual Features for Morphing Based Recognition Techniques

Jane Wu & Lior Wolf

Motivation

The difficulty of accurately performing object classifications is a common problem in object recognition. For instance, in natural images, cluttering can prevent the detection of an object from a noisy background. In addition, varying illumination and pose alignment in different images also confounds accurate classification. Although various methods exist for object classification, [1] has recently demonstrated that the idea of using correspondence and shape matching has shown promise in this task. However, so far, there has not been a thorough study of the various possible ways of capturing features in an image or the various possible ways of comparing two images using this idea. Therefore, the idea of this research stems from the work done by[1] in shape correspondence, but it investigates multiple alternatives to both the types of feature descriptors and types of correspondence methods used in shape matching.

Main Ideas

The main idea of the project centers around deformable shape matching. The technique involves three stages. First, find corresponding points between two shapes. Then, using this correspondence, calculate a transform for the rest of the points. Finally, calculating the error of the match.

In order to demonstrate the effectiveness of the algorithm,[1] uses a large dataset containing various object categories called FeiFei101[2]. To ease some of the computation involved, a shortlist is used to narrow the number of images that may be used to determine the goodness of a match in the final stage. To calculate the shortlist, multiple points are first selected on a image and feature descriptors are processed at each of those points. In particular,[1] uses geometric blur as the feature descriptor. The similarity of two images is evaluated based on the median of the minimum euclidean distance of the descriptors. This project uses multiple types of feature descriptors including geometric blur[1], c1[4], and sift[3].

After generating the shortlist, a more refined method is used to determine the best match out of the top possibilities given by the shortlist. There are also multiple ways to evaluate the goodness of a match.[1] uses linear integer programming. In this method, match quality (found using the shortlist) and distortion are used to form a cost function. An integer programming problem needs to be solved to find the set of correspondences that minimizes cost. A second approach that this project uses is a multi-stage method that employs least median square distance error (LMSDE). This method relies on the point correspondences for two images found using the shortlist. Then, a randomization of subsets of these points produces the base points to compute thin-plate spline warpings. Then LMSDE is performed between the warped points and the hypothesized locations of those points, and the best morph is considered to be the warping function that generates the least error.

Acknowledgments

This report describes research done at the Center for Biological & Computational Learning, which is in the McGovern Institute for Brain Research at MIT, as well as in the Dept. of Brain & Cognitive Sciences, and which is affiliated with the Computer Sciences & Artificial Intelligence Laboratory (CSAIL).

This research was sponsored by grants from: Office of Naval Research (DARPA) Contract No. MDA972-04-1-0037, Office of Naval Research (DARPA) Contract No. N00014-02-1-0915, National Science Foundation (ITR/SYS) Contract No. IIS-0112991, National Science Foundation (ITR) Contract No. IIS-0209289, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218693, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218506, and National Institutes of Health (Conte) Contract No. 1 P20 MH66239-01A1.

Additional support was provided by: Central Research Institute of Electric Power Industry (CRIEPI), Daimler-Chrysler AG, Compaq/Digital Equipment Corporation, Eastman Kodak Company, Honda R&D Co., Ltd., Industrial Technology Research Institute (ITRI), Komatsu Ltd., Eugene McDermott Foundation, Merrill-Lynch, NEC Fund, Oxygen, Siemens Corporate Research, Inc., Sony, Sumitomo Metal Industries, and Toyota Motor Corporation.

References

[1] A. C. Berg, T. L. Berg and J. Malik. Shape matching and object recognition using low distortion correspondence. In U.C. Berkeley Technical Report, Berkeley, CA, USA, Dec. 2004.

[2] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In CVPR, Workshop on Generative-Model Based Vision, 2004.

[3] D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, pp. 1150-1157, 1999.

[4] T. Serre, L. Wolf, and T. Poggio. A new biologically motivated framework for robust object recognition. In CBCL Paper #243/AI Memo #2004-026, Massachusetts Institute of Technology, Cambridge, MA, November, 2004.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)