CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Visuospatial Reasoning

Sajit Rao


Consider the following examples of everyday tasks: (a) Picking out the smallest checkout line in a supermarket. (b) Looking at the hands of your watch to tell the time, (c) Map understanding: You learn that the Khyber pass in the HinduKush mountains is a strategically important natural gateway used by Alexander's armies to pass from the northwest frontier of Afghanistan into the plains of India in 326 B.C - you lookup a map of the region to see and understand this for yourself. (d) Language understanding and common-sense inference: You hear a news report that the road leading to the market place of a town has been blocked by insurgents, and conclude that a vehicle that needs to reach the marketplace must either clear the blockage (and possibly face resistance) or find an alternative route.

There are literally hundreds of such tasks that one might do during the course of a single day without any conscious effort (even though each task itself may involve dozens of lower-level operations). There are also many examples of how we leverage our visuospatial representations by projecting even non-spatial domains like time or graph-structures (e.g. org-charts and process-flows) into diagrams to make certain inferences ``obvious'' or ``pop-out''.

The ease and apparent lack of effort of even a child in solving/understanding such problems hints at the sophistication of the underlying visuospatial analysis machinery we must have. There is evidence from fMRI and infant development studies [1][2] which show that in humans, visual processes and representations are not only involved in perception and action but also more ``abstract'' cognitive tasks such as doing math, or making an inference. Vision thus appears to be part of our ``thinking machinery'' as well.


Our goal is to build a system where visuospatial perceptual mechanisms are used for perception as well as abstract inference. We expect robust visuospatial reasoning to emerge from three complimentary, interacting competencies:

  1. Scene/Event Analysis: The ability to extract very specific spatial relations from a scene/event on demand (e.g. Is that vehicle blocking the road?) . The Visual Routines proposal [3] is the best candidate we have for spatial analysis. The basic premise is that there is a basis set of operations that can be composed in different ways to extract different spatial relations.

  2. Concept learning/Hypothesis Formation: Learning event signatures and inventing new hypothesis to explain what is observed.

  3. Imagining ``what-if'' scenarios: The capacity to synthesize a visuospatial description using learned spatial concepts (inside, next-to, ..) and then if needed apply the analysis mechanisms on the synthesized, imagined scene.


For the spatial analysis component we are working on a real-time implementation of the Visual Routines architecture described in [4]. Vision Modules run in parallel on multiple machines to execute a visual routine. For the learning component we are initially testing the system's ability to index and learn from a test set of simulated blocks-world events. Learned spatial patterns form the templates for both analysis and imagination.

Research Support

This project is funded by a seedling grant from DARPA.


[1] S. Dehaene and E. Spelke and P. Pinel, and R. Stanescu and S. Tsivkin. Sources of Mathematical Thinking: Behavioral and Brain-Imaging Evidence. In Science, vol 284, May 1999.

[2] S. Carey. Bootstrapping and the Origin of Concepts. In Daedalus, Winter 2004.

[3] S. Ullman. Visual Routines. In Cognition. vol 18, 1984.

[4] S. Rao. Visual Routines and Attention. MIT EECS Ph.d Thesis 1998.


vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu