MIT CSAIL Research Abstracts

Efficient Object Localization Using Probabilistic Geometric Grammars

Meg Aycinena, Leslie Pack Kaelbling & Tomas Lozano-Perez

Introduction

We are developing a novel representation and recognition framework for visual object classes using probabilistic geometric grammars (PGGs). Our framework extends probabilistic context-free grammars with distributions over the spatial relationships, geometric properties, and image appearance of object parts. Thus our approach exploits the compact structure of grammars to capture structural variability within and among object classes. We have also designed and implemented an efficient algorithm for localizing an object in an image given a learned PGG model.

Motivation

Many object recognition systems are limited by their inability to share common part models or spatial structure among related object classes [1,2,3]. This capability to model shared part structure is desirable because it allows information about parts and relationships in one object class to be generalized to other classes for which it is relevant. For example, we might like to transfer knowledge about the relationships among the arms and back of a chair to all chair classes that have arms and backs, regardless of whether the base is composed of four legs or an axle and wheel-leg. Modeling structural variability and shared part structure can allow effective parameter learning from fewer examples and better generalization of the learned models to unseen instances.

Additionally, a system which models shared structure can exploit its knowledge to perform more efficient recognition. For example, we might search for regions of an image which look like four legs and a top without committing to whether we are looking for a table or chair. Furthermore, a representation which captures structural variability within object classes offers the potential to be generalized to model variability in scenes that share objects and arrangements of objects, just as objects share parts and part structures.

Figure 1: An example PGG structure for chairs.
The icons represent root and conditional
geometric models and appearance models.

Approach

With these goals in mind, we have designed a representation and recognition framework for classes of objects that captures structural variability within and among object classes. Probabilistic geometric grammars (PGGs) represent object classes recursively in terms of their parts, thereby exploiting the hierarchical and substitutive structure inherent to many types of objects. PGGs extend probabilistic context-free grammars (PCFGs), which were developed for sentence parsing in natural language processing. A rule in a PGG specifies that a composite part (e.g., the base of a chair) consists of a set of subparts (e.g., four legs). Figure 1 shows an example PGG structure.

We supplement the traditional PCFG representation with distributions over the relative spatial relationships, geometric characteristics, and image appearance properties of object parts, as well as the appearance and shape of primitive parts. The geometric models on each rule part describe how the shape and position of a subpart vary conditioned on the shape and position of its parent part. We use multivariate conditional Gaussians to represent these distributions. Each primitive part also has an appearance model, describing a distribution over the part's image characteristics. For now, each appearance model consists of a simple distribution over edge presence and orientation within image regions that have been detected using an extension of an algorithm by Jacobs [4].

To perform recognition using a learned PGG model, we have developed an approximate bottom-up dynamic programming algorithm for object localization in an image. It is a hierarchical extension of the localization algorithm for pictorial structures developed by Felzenszwalb and Huttenlocher [2], and depends on a discretization of the image into a fixed set of regions with varying locations and scales. This discretization allows the algorithm to consider a fixed candidate parent part while searching independently for each child part in a rule, thus avoiding the exponential search over ordered subsets.

Results and Future Work

Experimental results evaluating the accuracy of the PGG parsing algorithm are promising. Figure 2 shows examples of successful localization in images.

first example of successful chair localization

second example of successful chair localization

RED: chair
ORANGE: top
YELLOW: seat
GREEN: back
MAGENTA: base
PINK: front left leg
LIGHT GRAY: back left leg
GRAY: front right leg
DARK GRAY: back right leg

Figure 2: Some examples of successful localization by the PGG parsing algorithm after 10 training examples.

However, the results also reveal directions for further improvement; in particular, they demonstrate that human-designed grammar structures often cannot accurately capture the variability of real image data. Therefore, we are investigating semi-supervised algorithms for learning the structure (as well as parameters) of PGG models that explain the image data well while minimizing model complexity and maximizing shared substructure among object classes. We are also exploring richer, more discriminative appearance models and image feature detectors.

Research Support

This research was supported in part by DARPA IPTO Contract FA8750-05-2-0249, "Effective Bayesian Transfer Learning". It was also supported by a NSF Graduate Research Fellowship to Meg Aycinena.

References:

[1] Rob Fergus, Pietro Perona, and Andrew Zisserman. A Sparse Object Category Model for Efficient Learning and Exhaustive Recognition. In Proc. IEEE International Conf. on Computer Vision and Pattern Recognition (CVPR), 2005.

[2] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Pictorial Structures for Object Recognition. International Journal of Computer Vision (IJCV), 61(1), pp. 55--79, 2005.

[3] David J. Crandall and Daniel P. Huttenlocher. Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition. In Proc. European Conf. on Computer Vision (EECV), 2006.

[4] David W. Jacobs. Robust and Efficient Detection of Salient Convex Groups. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 18, pp. 23--37, 1996.

[5] Margaret A. Aycinena. Probabilistic Geometric Grammars for Object Recognition. Master's thesis, Massachusetts Institute of Technology, 2005.