CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Face Transfer with Multilinear Models

Daniel Vlasic, Matt Brand, Hanspeter Pfister & Jovan Popovic

Figure: Multilinear face models give animators decoupled control over facial attributes such as identity, mood, and articulation. To illustrate some of many possible uses, from left to right: (1) original video provides the identity, (2) a second video provides mood (raised eyebrows), (3) yet another video provides articulation; (4) the shape combining identity, mood, and articulation from the three videos, and (5) the resulting textured blend.

Introduction

Creating realistic facial animation is a very difficult and time-intensive process, requiring highly detailed models and skillful animators. The dominant approach is to vary a three-dimensional geometrical model with a basis set of deformations. Creating these models, adapting them to target characters, and controlling them are all major bottlenecks in the production process. We propose to address all of these problems via multilinear models of facial variation. These are models that describe how faces vary along multiple attributes. In particular, we want to capture how the face deforms as one varies the expression and/or identity.

Multilinear models offer dual properties of special interest to animators:

Separability: Attributes on different tensor axes are decoupled. E.g., expression can be varied while identity stays constant, or vice versa
Consistency: Encodings are consistent across attributes. E.g., expression parameters encoding a smile for one person will encode a smile for every person spanned by the model, appropriate to their facial geometry and style of smiling.

We introduce methods that make multilinear models a practical tool for animating of faces (or other deforming surfaces). The key obstacle in constructing a multilinear model is the vast amount of data (in full correspondence) needed to account for every possible combination of attribute settings. The key problem in using a multilinear model is devising an intuitive control interface.

For the data-acquisition problem, we show how to estimate a detailed multilinear model from an incomplete set of high-quality face scans. For the control problem, we show how to connect this model to cheaper sources of data such as video, so that the model can be applied to performances by actors who are not in the original database and not available for detailed measurement. Pose, performance, and identity parameters can be extracted from video, along with a performance-driven texture function. With this information in hand, one can transfer a performance or identity from one actor to another; drive character animation from video; rewrite video or animation with modified expressions or facial features; or write new texture (e.g., make-up) into video. The model also gives a representation in which one can compute distances between faces in identity space or expression space, to pick similar or different actors.

Progress

We have acquired a number of 3D face scans and built two face models: a bilinear model with separate control for identity (shape) and expression, and a trilinear model that varies with identity, mood (smile, frown, ...) and articulation (visemes). We have shown that we can build these models even if a number of attribute combinations are missing from the dataset, as well as predict the likely candidates for the missing scans. We have used the two models to track a number of videos, extracting the pose and performance parameters. Finally, we have demonstrated the power and simplicity of our approach with several video-editing applications: modifying the performance in a video, changing the identity of an actor, transferring the performance from one video to another, and combining attributes from multiple videos (identity from one, mood from another, and articulation from a third one) as shown in the figure.

Future

An intriguing prospect is that, by scanning faces of different ages, we can build a multilinear model that gives us control over an actor's aparent age (identity x expression x age). With our technique, we can do this without scanning each person at all ages.

References:

[1] Vlasic D, Brand M, Pfister H, Popovic J. Face Transfer with Multilinear Models. To appear in ACM Transactions on Graphics. 2005.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)