Abstracts - 2006
Visual Attention Models for Far-Field Scene Analysis
Tomas Izo & W. Eric L. Grimson
One of the central questions any vision system must answer is that of recognition: "What am I looking at?" However, because visual information is so rich and complex in nature, another important question to ask is: "Where should I be looking?" Recently, as cameras have become more readily available and the amount of visual data to be analyzed has rapidly increased, this question has become critical in machine vision applications such as far- and mid-field scene analysis.
Consider a system of cameras observing a rich moving scene. Analyzing the entire scene in detail would require an enormous amount of resources. It might be better to instead perform a coarse analysis, on the basis of which a person or an algorithm can decide where the attention of the system is best directed for further analysis. In a security setting, this might involve selecting the view of the interior or exterior of a building that contains a suspicious activity in progress. In the setting of a television broadcast of a sports event, it might involve selecting the view that contains a foul or the scoring of a point. In a home monitoring setting, it could in turn mean selecting the view that contains an elderly person having health difficulties. The common denominator of all of these tasks is the selection of the most visually interesting part of the scene in order to direct further, detailed attention there.
The goal of this project is to investigate models of attention that would enable a machine observing an unfamiliar moving scene to learn where to look in order to increase the likelihood of spotting an unusual or visually interesting event.
The substrate for our experiments is a working focus-of-attention tracking system observing a far-field outdoor scene. It consists of a stationary overview camera and a high-resolution pan-tilt-zoom camera designed to capture detailed video of scene activity (see  for a more detailed description). The system tracks all moving objects in the stationary view and records their trajectories along with other data such as appearance, size and velocity. Compounded over a long period of time, this corpus of data essentially contains a description of what usually happens in the scene, and it can be used to direct the attention of the high-resolution camera to unusual activity.
What is Unusual?
Definining what constitutes an unusual event is an open problem. In this project, we focuse on events involving a single moving object, i.e. we do not consider interactions. The goal of our attention model is to assign to each object, as it moves through the scene, a value according to how unusual that object's motion is. One possible approach involves applying the information-theoretic concept of surprise (see ) to trajectories of moving objects. We first construct a set of object trajectory models by clustering the trajectories observed thus far. As a new object comes into the scene, we match the developing trajectory to our set of models. Denote this trajectory at time t as . We can calculate the posterior distribution . We define surprise at time t as a quantity proportional to the change in this posterior distribution between time t and t-1, for instance as measured by the relative entropy:
Other possible attention models can be based on low-level saliency, high-level scene statistics or a combination thereof.
Comparison and Evaluation
We plan to experiment with and evaluate several different attention mechanisms. We believe that the most successful model with be one that combines use of high-level information such as the history of object tracks for the scene with low-level information such as image saliency. The ideal outcome would be some insight into what sort of attention model could assist in the task of having to watch video feeds and manually select any interesting activity for further analysis.
This project is supported in part by a grant from DARPA.
 Joshua Migdal, Tomas Izo and Chris Stauffer. Moving Object Segmentation Using Super-Resolution Background Models. In Workshop on Omnidirectional Vision, Camera Networks and Non-Classical Cameras, Beijing, China, October 2005.
 Laurent Itti and Pierre Baldi. A principled approach to detecting surprising events in video. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, June 2005.