Creating realistic facial animation is a very difficult and time-intensive process, requiring highly detailed models and skillful animators. The dominant approach is to vary a three-dimensional geometrical model with a basis set of deformations. Creating these models, adapting them to target characters, and controlling them are all major bottlenecks in the production process. We propose to address all of these problems via multilinear models of facial variation. These are models that describe how faces vary along multiple attributes. In particular, we want to capture how the face deforms as one varies the expression and/or identity.
Multilinear models offer dual properties of special interest to animators:
We introduce methods that make multilinear models a practical tool for animating of faces (or other deforming surfaces). The key obstacle in constructing a multilinear model is the vast amount of data (in full correspondence) needed to account for every possible combination of attribute settings. The key problem in using a multilinear model is devising an intuitive control interface.
For the data-acquisition problem, we show how to estimate a detailed multilinear model from an incomplete set of high-quality face scans. For the control problem, we show how to connect this model to cheaper sources of data such as video, so that the model can be applied to performances by actors who are not in the original database and not available for detailed measurement. Pose, performance, and identity parameters can be extracted from video, along with a performance-driven texture function. With this information in hand, one can transfer a performance or identity from one actor to another; drive character animation from video; rewrite video or animation with modified expressions or facial features; or write new texture (e.g., make-up) into video. The model also gives a representation in which one can compute distances between faces in identity space or expression space, to pick similar or different actors.
We have acquired a number of 3D face scans and built two face models: a bilinear model with separate control for identity (shape) and expression, and a trilinear model that varies with identity, mood (smile, frown, ...) and articulation (visemes). We have shown that we can build these models even if a number of attribute combinations are missing from the dataset, as well as predict the likely candidates for the missing scans. We have used the two models to track a number of videos, extracting the pose and performance parameters. Finally, we have demonstrated the power and simplicity of our approach with several video-editing applications: modifying the performance in a video, changing the identity of an actor, transferring the performance from one video to another, and combining attributes from multiple videos (identity from one, mood from another, and articulation from a third one) as shown in the figure.
An intriguing prospect is that, by scanning faces of different ages, we can build a multilinear model that gives us control over an actor's aparent age (identity x expression x age). With our technique, we can do this without scanning each person at all ages.
[1] Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. Face Transfer with Multilinear Models. In ACM Transactions on Graphics 24(3), pp. 426--433, 2005.
Computer Science and Artificial Intelligence Laboratory (CSAIL) The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA tel:+1-617-253-0073 - publications@csail.mit.edu |