CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Object Recognition in Clutter: Selectivity and Invariance Properties in Monkey Inferior Temporal Cortex

D. Zoccolan, D. Cox, & M. Kouh

The Problem

A major challenge of current theories of vision is to understand how the visual system performs object recognition in cluttered conditions, typical of natural visual scenes, where objects of interest do not appear in isolation but together with background objects. Object recognition in primates is thought to depend on neuronal activity in the inferotemporal cortex (IT) [1], which is the last stage of the ventral visual stream. In fact, neurons found in monkey IT fulfill two essential requirements for visual recognition: invariance and selectivity. They are selectively tuned to views of complex objects such as faces and their responses show significant invariance to stimulus transformations such as scale and position changes [2, 3]. Previous studies have shown that neurons found in monkey IT are selectively tuned to views of complex objects such as faces and their responses are tolerant to stimulus transformations such as scale and position changes. A few studies also reported some degree of tolerance of IT neurons to clutter [4, 5]. However, a systematic understanding of the relationship between clutter tolerance and shape-selectivity in IT still lacks and no models have been proposed to explain the response of IT neurons to multiple objects based on the responses to those same objects presented in isolation in their receptive field (RF).

Previous Work

To this end, we completed a study in which we examined IT responses to pairs and triplets of objects in three passively viewing monkeys. To probe neurons with objects spanning a broad range of effectiveness, we focused on neurons whose responses were selectively tuned across different sets of parametrized or geometric shapes. Our recordings showed that a large fraction of IT neuronal responses are not clutter tolerant, i.e. their responses to pairs of simultaneously presented objects are weaker than responses to the most effective stimuli of the pairs presented alone. Most specifically, we found that IT responses to pairs and triplets of stimuli could be reliably predicted as the average of the responses to the individual stimuli composing the pairs/triplets. These findings are consistent with a mechanistic model in which the output of each IT neuron is normalized by the total amount of activation in the population of IT cells co-activated by the stimulus pair or triplet. Other potential explanations for the observed responses in clutter could simply rely in the feed-forward connections from V4 (or posterior IT) neurons to postsynaptic targets in anterior IT. For instance, the normalization mechanism responsible for rescaling IT responses could take place at the level of V4 afferents instead than requiring a population of co-activated IT neurons. Finally, computer simulations have shown that a hierarchical model of object recognition [3], with its iterated MAX and Gaussian tuning operations, can also produce average-like effects in simulated IT neurons.


We recently started a new series of experiments aimed at probing the generality of this -Y┤average modelí over a population of IT neurons spanning a broader range of shape selectivity. Instead of focusing on highly selective neurons, tuned in small sets of similar shapes, as done in the previous study [6], we are currently measuring the selectivity of IT neuronal responses over a large set (~200) of natural objects. For each neuron, clutter tolerance, receptive field size, contrast sensitivity and tolerance to size changes are also measured. Preliminary recordings show that some IT neurons have responses that are more robust to clutter than predicted by the ┤average modelí. We are now characterizing such deviations, the relevant parameters and the underlying mechanisms with the help of computational models of object recognition we are developing [3].

Research Support

This report describes research done at the Center for Biological & Computational Learning, which is in the McGovern Institute for Brain Research at MIT, as well as in the Dept. of Brain & Cognitive Sciences, and which is affiliated with the Computer Sciences & Artificial Intelligence Laboratory (CSAIL). This research was sponsored by grants from: Office of Naval Research (DARPA) Contract No. MDA972-04-1-0037, Office of Naval Research (DARPA) Contract No. N00014-02-1-0915, National Science Foundation-NIH (CRCNS) Contract No. EIA-0218506, and National Institutes of Health (Conte) Contract No. 1 P20 MH66239-01A1. Additional support was provided by: Central Research Institute of Electric Power Industry (CRIEPI), Daimler-Chrysler AG, Eastman Kodak Company, Honda Research Institute USA, Inc., Komatsu Ltd., Merrill-Lynch, NEC Fund, Oxygen, Siemens Corporate Research, Inc., Sony, Sumitomo Metal Industries, Toyota Motor Corporation, and the Eugene McDermott Foundation.


[1] K. Tanaka. Inferotemporal cortex and object vision. Annu. Rev. Neurosci., 19: 109-139, 1996.

[2] N. K. Logothetis, J. Pauls, and T. Poggio. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol., 5: 552-563, 1995.

[3] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nat. Neurosci., 2: 1019-1025, 1999.

[4] T. Sato. Interactions between of visual stimuli in the receptive fields of inferior temporal neurons in awake macaques. Exp. Brain Res., 77:23-30, 1989.

[5] E. Rolls and M. Tovee. The responses of single neurons in the temporal visual cortical areas of the macaque when more than one stimulus is present in the receptive field. Exp. Brain Res. 103:409-420, 1995.

[6] D. Zoccolan, D.D. Cox and J.J. DiCarlo. Multiple object response normalization in monkey inferotemporal cortex. J. Neurosci., 25: 8150-64, 2005.

vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu