CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Probabilistic Geometric Grammars for Object Recognition

Meg Aycinena, Leslie Pack Kaelbling & Tomas Lozano-Perez

Introduction

We are researching a generative parts-based three-dimensional representation and recognition framework for classes of objects. The framework uses probabilistic grammars to represent object classes recursively in terms of their parts, thus exploiting the hierarchical and substitutive structure inherent to many types of objects. It models the 3D geometric characteristics of object parts using multivariate conditional Gaussians over dimensions, position, and rotation. We develop algorithms for learning geometric models and rule probabilities given parsed 3D examples and a fixed grammar. We are also working on a parsing algorithm for classifying unlabeled unparsed 3D examples given a geometric grammar.

Motivation

This work is novel in that it combines several approaches to the task of object classification and recognition: the use of three-dimensional models, a parts-based approach, and the use of probabilistic grammars to capture structural variability.

structural variability in chairs

The majority of current work in object recognition is largely image based, so we must motivate the use of the 3D models. There is great variation in the appearance of objects, both among different views of the same object instance, and among different instances within the object class. Variation due to viewpoint can be explained more compactly and accurately as the composition of the 3D shape of the object with the viewing projection than as a collection of image-based views. Variation within an object class is also most naturally represented using a three-dimensional model, because it allows the spatial relationship between parts to be modeled independent of viewpoint.

We use a parts-based approach because many object classes are too complex to be described well using a single shape or distribution over a single shape. However, they can be naturally modeled as a distribution over a collection of shapes and the relationships between them.

The variability within an object class, especially for objects made by humans such as furniture, is highly related to the parts and displays certain modular, hierarchical, and substitutive structure. Context-free grammars (CFGs) are ideal for capturing this structural variability. They model hierarchical groupings and substitution of subparts, and can naturally represent conditional independences between subgroups with the context-free assumption. An extension to basic CFGs, probabilistic context-free grammars (PCFGs) allow the specification of distributions over the combination of subparts by attaching probabilities to the rule expansions for a given head class [5],[6].

 

Figure 1: Man-made objects exhibit a large
amount of structural variability. [1]

Thus far, this work is focused on the relationship between parts and constrains the shape of parts to simple boxes. We are also focused on recognition and learning given three-dimensional input, although eventually recognition and learning must occur from two-dimensional images.

Approach

Probabilistic geometric grammars (PGGs) extend generic PCFGs by attaching conditional and root geometric models to various parts of the grammar. Formally, a PGG is a set of object or part classes, where each non-primitive class is defined by a set of rules. Each rule maps the head class to a set of rule parts with a certain probability.

simple PGG for chairs

Figure 2: A simple PGG for chairs, with geometric models omitted.

The geometric models in the grammar take two forms: part models and root models. A geometric part model is defined for a single part in a single rule, and is a probability distribution that describes how the geometric characteristics of that rule part vary conditioned on the characteristics of its parent part. A geometric root model, in contrast, is defined over an entire class, and is a probability distribution that describes how the geometric characteristics of the root part vary, independently of any other part. We use multivariate Gaussian distributions over dimensions, positions, and rotations for these models [2],[3],[4].

Assumptions

The PGG framework makes several assumptions of conditional independence between elements of the model. First, it assumes that the way a non-primitive class is expanded (i.e. the choice of rule used) is independent of the way its parent or sibling classes were expanded.

Second, the framework assumes that, in a fixed parsed instance, the geometric characteristics of a part are conditionally independent of those of its non-descendents, given its parents, and of those of its descendents, given its children. This means that a PGG model cannot directly specify geometric dependences between parts of the object that are not related in a child-parent relationship. In exchange, however, the model requires many fewer parameters than a fully connected model. This reduction can potentially increase the speed of learning and improve the learned model's ability to generalize to unseen instances.

Recognition and Learning

Given a PGG, the independence assumptions outlined above, and an object class, we can calculate the probability of a parsed labeled 3D instance. This is the product of three terms:

  • the product of the expansion probabilities of the rules that match each subtree (an internal node and its immediate children) in the instance tree;
  • the product of the the conditional probabilities of the geometric characteristics each non-root part given the characteristics of its parent and the geometric model of the matching rule part in the grammar;
  • the unconditional probability of the geometric characteristics of the root part given the root geometric model of the matching class in the grammar.

Based on this formulation, algorithms can be derived for:

  • learning the conditional and root geometric models of a fixed grammar, given a training set of parsed labeled 3D instances;
  • learning the probabilities on rules in the grammar, given a training set of parsed labeled 3D instances; and
  • classifying an unparsed and unlabeled 3D instance by parsing it against a PGG.

We are implementing and testing these algorithms on synthetic 3D data in order to investigate the chosen model representation. We are also considering how this work might be extended to learning and recognition from two-dimensional images.

Research Support

This research is supported in part by a National Science Foundation (NSF) Graduate Research Fellowship to Meg Aycinena. It is also supported by the Defense Advanced Research Projects Agency (DARPA), through the Department of the Interior, NBC, Acquisition Services Division, under Contract No. NBCHD030010.

References

[1] V. Blanz, B. Scholkopf, H. Bulthoff, C. Burges, V. Vapnik, and T. Vetter. Comparison of View-based Object Recognition Algorithms Using Realistic 3D Models. In Proc. of Artifical Neural Networks, ICANN pages 251-256, 1996.

[2] P. Thomas Fletcher, Sarang Joshi, Conglin LU, and Stephen Pizer. Gaussian Distributions on Lie Groups and Their Application to Statistical Shape Analysis. In Proc. of Information Processing in Medical Imaging pages 450-462, 2003.

[3] Michael Patrick Johnson. Exploiting Quaternions to Support Expressive Interactive Character Motion. PhD thesis, Massachusetts Institute of Technology, 2003.

[4] Michael Irwin Jordan. An Introduction to Probabilistic Graphical Models. To appear, 2005.

[5] Daniel Jurafsky and James H. Martin. Speech and Language Processing. Prentice Hall, 2000.

[6] Christopher D. Manning and Hinrich Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, 2002.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)