Probabilistic Geometric Grammars for Object RecognitionMeg Aycinena, Leslie Pack Kaelbling & Tomas Lozano-PerezIntroductionWe are researching a generative parts-based three-dimensional representation and recognition framework for classes of objects. The framework uses probabilistic grammars to represent object classes recursively in terms of their parts, thus exploiting the hierarchical and substitutive structure inherent to many types of objects. It models the 3D geometric characteristics of object parts using multivariate conditional Gaussians over dimensions, position, and rotation. We develop algorithms for learning geometric models and rule probabilities given parsed 3D examples and a fixed grammar. We are also working on a parsing algorithm for classifying unlabeled unparsed 3D examples given a geometric grammar. MotivationThis work is novel in that it combines several approaches to the task of object classification and recognition: the use of three-dimensional models, a parts-based approach, and the use of probabilistic grammars to capture structural variability.
Thus far, this work is focused on the relationship between parts and constrains the shape of parts to simple boxes. We are also focused on recognition and learning given three-dimensional input, although eventually recognition and learning must occur from two-dimensional images. ApproachProbabilistic geometric grammars (PGGs) extend generic PCFGs by attaching conditional and root geometric models to various parts of the grammar. Formally, a PGG is a set of object or part classes, where each non-primitive class is defined by a set of rules. Each rule maps the head class to a set of rule parts with a certain probability. Figure 2: A simple PGG for chairs, with geometric models omitted. The geometric models in the grammar take two forms: part models and root models. A geometric part model is defined for a single part in a single rule, and is a probability distribution that describes how the geometric characteristics of that rule part vary conditioned on the characteristics of its parent part. A geometric root model, in contrast, is defined over an entire class, and is a probability distribution that describes how the geometric characteristics of the root part vary, independently of any other part. We use multivariate Gaussian distributions over dimensions, positions, and rotations for these models [2],[3],[4]. AssumptionsThe PGG framework makes several assumptions of conditional independence between elements of the model. First, it assumes that the way a non-primitive class is expanded (i.e. the choice of rule used) is independent of the way its parent or sibling classes were expanded. Second, the framework assumes that, in a fixed parsed instance, the geometric characteristics of a part are conditionally independent of those of its non-descendents, given its parents, and of those of its descendents, given its children. This means that a PGG model cannot directly specify geometric dependences between parts of the object that are not related in a child-parent relationship. In exchange, however, the model requires many fewer parameters than a fully connected model. This reduction can potentially increase the speed of learning and improve the learned model's ability to generalize to unseen instances. Recognition and LearningGiven a PGG, the independence assumptions outlined above, and an object class, we can calculate the probability of a parsed labeled 3D instance. This is the product of three terms:
Based on this formulation, algorithms can be derived for:
We are implementing and testing these algorithms on synthetic 3D data in order to investigate the chosen model representation. We are also considering how this work might be extended to learning and recognition from two-dimensional images. Research SupportThis research is supported in part by a National Science Foundation (NSF) Graduate Research Fellowship to Meg Aycinena. It is also supported by the Defense Advanced Research Projects Agency (DARPA), through the Department of the Interior, NBC, Acquisition Services Division, under Contract No. NBCHD030010. References[1] V. Blanz, B. Scholkopf, H. Bulthoff, C. Burges, V. Vapnik, and T. Vetter. Comparison of View-based Object Recognition Algorithms Using Realistic 3D Models. In Proc. of Artifical Neural Networks, ICANN pages 251-256, 1996. [2] P. Thomas Fletcher, Sarang Joshi, Conglin LU, and Stephen Pizer. Gaussian Distributions on Lie Groups and Their Application to Statistical Shape Analysis. In Proc. of Information Processing in Medical Imaging pages 450-462, 2003. [3] Michael Patrick Johnson. Exploiting Quaternions to Support Expressive Interactive Character Motion. PhD thesis, Massachusetts Institute of Technology, 2003. [4] Michael Irwin Jordan. An Introduction to Probabilistic Graphical Models. To appear, 2005. [5] Daniel Jurafsky and James H. Martin. Speech and Language Processing. Prentice Hall, 2000. [6] Christopher D. Manning and Hinrich Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, 2002. |
|||
|