CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Image-based Object Search on a Mobile Platform

Tom Yeh & Trevor Darrell


Finding information based on an object's visual appearance is useful when specific keywords for the object are not known. We have developed a mobile content-based search system which takes images of objects as queries and finds relevant web pages by matching it to similar images on the web. Image-based search works best when an object is segmented from the background image, but performing such segmentation explicitly is difficult on a mobile terminal. To solve the problem of image segmentation, we propose an interactive segmentation paradigm where the human user specifies segmentation by providing multiple images of the object or scene. Candidate segmentations are offered in real time to the user, who can adjust the viewpoint until the appropriate segmentation mask is found. User studies were carried out to compare this segmentation paradigm to a traditional contour-drawing approach, and to demonstrate the overall utility of the image-based object search concept.



Providing information to a user at the right time and the right place can be critical in a variety of situations. Today, it is generally possible to access the entire Internet from a mobile terminal, but to find a particularly relevant web page can be tedious. Considerable effort has been devoted to bandwidth and screen resolution issues (c.f., the Opera browser), but comparatively little has been done to alleviate the difficulty of initiating and refining an information query within the constraints of a hand-held form factor. When querying about an object that has unique visual features, what can better describe these properties than a visual description? An image of the object can be used as a query that faithfully represents the visual properties of the item. By enabling searches on object appearance, we can offer a new, more convenient and direct means for finding information about objects encountered in everyday life.


With conventional mobile terminal interfaces it can be daunting to form complex textual queries and to browse through matches to find the right information. However, if a hand-held terminal has a camera, as do increasing numbers of cell phones and PDAs, a search can proceed using the appearance of the object directly. Such content-based queries have been the subject of much research, and systems for desktop searching of media libraries have been deployed (e.g., QBIC [1], Webseek [2], etc.). For the majority of these applications, however, image search has been less popular than traditional keyword search methods. Rather than matching based on appearance, images are typically pre-labeled with keywords or matching is performed based on image captions or filenames (e.g., Google Image Search).



To automatically discern the object shape is nontrivial when the image contains other objects or background structures. This is known as the problem of image segmentation and is widely considered as an open question in the computer vision literature. However, it is possible to drastically improve the segmentation performance of simple, fast algorithms by leveraging human perception with a suitable interactive environment. [3] presents an interface on a mobile device to let the user specify the object of interest simply by taking two images, one with object and one without object. By comparing these two images and using relatively simple computer vision techniques for foreground/background estimation, the segmented object image can be extracted. It employs an interactive human-aided segmentation paradigm where human input is obtained online when the image of the object is being taken with the camera, as opposed to working on a static image offline with a mouse and editing tool such as Photoshop. It can be an ideal front-end for the mobile image-base object search system.

The image of the object is taken by the user on the phone (1) and is interactively segmented with our method (2). The segmented object image is used to query the database to find the most relevant object (3), which in turn retrieves the relevant web pages for the user's perusal (4).


We will continue our efforts in developing a mobile image-based object search system by building a more extensive image database and improving the image matching algorithm to incorporate color and texture information. We will also explore alternative user-aided segmentation methods on different hardware.

Research Support

This research was carried out in the Vision Interface Group, which is supported in part by DARPA, Project Oxygen, NTT, Ford, CMI, and ITRI. This project was supported by one or more of these sponsors, and/or by external fellowships.


[1] Niblack W., Barber R., Equitz W., Flickner M., Glasman E., Petkovic D., Yanker P., Faloutsos C., and Taubin G.. The QBIC project, SPIE Storage and Retrieval for Image and Video Databases , 1993, 173187.

[2] Smith J.R. and Chang S.F., Image and video search engine for the World Wide Web. In Proc. of SPIE , pages 8495, 1997.

[3] Yeh T., and Darrell T., DoubleShot: an Interactive User-aided Segmentation Tool, UI 2005.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)