CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Motion Magnification

Ce Liu, Antonio Torralba, William T. Freeman, Frédo Durand & Edward H. Adelson

Motivation

Visual motion can occur at many different amplitudes, and over different temporal and spatial frequency scales. Small motions are difficult to observe, yet may reveal important information about the world: small deformations of structures, minute adjustments in an equilibrium process, or the small movements of a system in response to some forcing function. We want a machine which will reveal and clarify those motions, much as a microscope can reveal small and invisible structures.

System Overview

To find small motions in a video and magnify them, we model the appearance of the input video as translations of the pixel intensities observed in a reference frame. Naively, this sounds like one might (a) compute the translation from one pixel to the next in each frame, and (b) re-render the video with small motions amplified. Unfortunately, such an approach would lead to artifactual transitions between amplified and unamplified pixels within a single structure. Most of the steps of motion magnification relate to reliably estimating motions, and to clustering pixels whose motions should be magnified as a group. Below we motivate and summarize each step of the motion magnification processing. The processing steps are illustrated with the swing set images in the figure below. The details can be found in our publication [1], or the webpage: http://people.csail.mit.edu/~celiu/motionmag/motionmag.html

The overview of the system
Figure 1. The overview of the system through swing set example.

(1) Register input video

Since we are magnifying small motions between frames, performing a robust registration and intensity normalization of each frame is essential. For this step we require that the input image sequence depicts a predominantly static scene. We perform an initial tracking of detected feature points and find the affine warp which best removes the motions of the set of tracked feature points, ignoring outliers.

(2) Track and cluster feature point trajectories

In order that the motion magnification not break apart coherent objects, we seek to group objects that move with correlated (not necessarily identical) motions. To achieve that, we robustly track feature points throughout the sequence, then cluster their trajectories into K sets of correlated motions. One special cluster of feature points with no translation over frames is a background cluster. The trajectory correlations are computed in a manner invariant to the overall scale of the motions, thus allowing very small motions to be grouped with larger motions to which they are correlated. The motions are all specified as translations from the feature point position in a reference frame.

(3) Segmentation: assign each pixel to a motion cluster

We then seek to assign each pixel to some motion cluster. For each motion cluster, we interpolate a dense motion field, giving us at each pixel K possible motion vectors. We assign each pixel of each time frame to one of the clusters. Our selected domain of analysis -- very small, almost imperceptible motions -- presents a unique set of challenges for segmenting pixels into coherent motions. The motions are generally too small to perform that assignment into motion layers based on motion alone, as is done in [2; 3]. To overcome this, we use pixel color, position, as well as the interpolated motions to estimate the cluster assignment for each pixel, defining a Markov random field which we solve using graph cuts [4]. We follow this with an additional requirement of temporal consistency of layer assignment: if the cluster assignment of a pixel varies over its trajectory through time, it will be treated as an outlier.

This gives us a layered motion representation such as that proposed by [2], but generalizing layer membership to include correlated motions, and not just similar ones. Our model of the video is a set of temporally constant pixel intensities, clustered into layers, which translate over the video sequence according to interpolated trajectories that are unique to each pixel. The layer ordering can be specified by the user, or computed using the methods of [5]. (It is sufficient to randomly assign the ordering of non-background layers if the magnified layer has minimal occlusions with other layers, as is often the case). At each stage, pixels which do not fit the model are relegated to a special “outlier layer”. The other layer that is treated specially is the background layer. Regions of the background layer which were never seen in the original video sequence may be made visible by amplified motions of motion layers above the background. We thus fill-in all holes in the background layer by the texture synthesis method of Efros and Leung [6].

(4) Magnify motions of selected cluster

The user then specifies a layer for motion magnification. Presently, the magnification consists of amplifying all translations from the reference position by a constant factor, typically between 4 and 40, but more general motion modification functions are possible.

(5) Interactive editing of the layer

Automatic segmentation algorithm so far cannot produce segmentation accurate to pixel level. We developed user editing tools as well to let user modify segmentation. Based on the automatic segmentation, however, it is much easier to edit layer assignment.

(6) Render video

Following motion magnification, we render the modified video sequence. We first render the pixels of the background layer, which is constant for all frames. Then the pixels assigned to the outlier layer are copied as they appeared in the registered input frames. Finally, the pixels of the remaining layers are written into the output sequence. The intensities are those of the reference frame; the displacements are those of the measured or magnified motions, as appropriate to the layer. We now describe each processing step of motion magnification in detail.

Applications

We foresee broad application of this algorithm in fields related to visualization, such as education, physical diagnosis, pre-measurement planning for precise physical measurements, and surveillance.

References

[1] C. Liu, A. Torralba, W.T. Freeman, F. Durand and E.H. Adelson. Motion Magnification. Accepted by Siggraph, 2005.

[2] J. Wang and E. H. Adelson. Representing moving images with layers. IEEE Trans. Image Processing, 3(5):625–638, 1994.

[3] N. Jojic and B. Frey. Learning flexible sprites in video layers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Kauai, Dec. 2001.

[4] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Pat. Anal. Mach. Intell., 23(11):1222–1239, 2001.

[5] G. Brostow and I. Essa. Motion-based video decompositing. In IEEE International Conference on Computer Vision (ICCV’99), 1999.

[6] A. A. Efros and T. K. Leung. Texture synthesis by non-parametric sampling. In IEEE International Conference on Computer Vision(ICCV’99), pages 1033–1038, Corfu, Greece, September 1999.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)