Abstracts - 2006
LabelMe: a database and web-based tool for image annotation
Bryan C. Russell, Antonio Torralba, Samuel Davies, Kevin P. Murphy & William T. Freeman
Research in object detection and recognition in cluttered scenes requires large image and video collections with ground truth labels. The labels should provide information about the object classes present in each image, as well as their shape and locations, and possibly other attributes such as pose. Such data is useful for testing, as well as for supervised learning. Even algorithms that require little supervision need large databases with ground truth to validate the results. New algorithms that exploit context for object recognition  require databases with many labeled object classes embedded in complex scenes. Such databases should contain a wide variety of environments with annotated objects that co-occur in the same images.
Building a large database of annotated images with many objects is a costly and lengthy enterprise. Traditionally, databases are built by a single research group and are tailored to solve a specific problem (e.g, face detection). Many databases currently available only contain a small number of classes, such as faces, pedestrians, and cars. A notable exception is the Caltech 101 database , with 101 object classes. Unfortunately, the objects in this set are generally of uniform size and orientation within an object class, and lack rich backgrounds.
Currently the database contains more than 77,000 objects labeled within 23,000 images covering a large range of environments and several hundred object categories (Figure 2a). The images are high resolution and cover a wide field of view, providing rich contextual information. Pose information is also available for a large number of objects. Since the annotation tool has been made available online there has been a constant increase in the size of the database, with about 7,500 new labels added every month, on average.
One important concern when data is collected using web-based tools is quality control. Currently quality control is provided by the users themselves. Polygons can be deleted and the object names can be corrected using the annotation tool online. Despite the lack of a more direct mechanism of control, the annotations are of quite good quality (Figure 2). Another issue is the complexity of the polygons provided by the users - do users provide simple or complex polygon boundaries? Figure 2b illustrates the average number of points used to define each polygon for four object classes that were introduced using the web annotation tool. These object classes are among the most complicated. These polygons provide a good idea of the outline of the object, which is sufficient for most object detection and segmentation algorithms.
Another issue is what to label. For example, should you label a whole pedestrian, just the head, or just the face? What if it is a crowd of people - should you label all of them? Currently we leave these decisions up to each user. In this way, we hope the annotations will reflect what various people think are ``natural'' ways to segment an image. A third issue is the label itself. For example, should you call this object a ``person'', ``pedestrian'', or ``man/woman''? An obvious solution is to provide a drop-down menu of standard object category names. However, we currently prefer to let people use their own descriptions, since these may capture some nuances that will be useful in the future. The Matlab toolbox allows querying the database using a list of possible synonyms.
For more details, please refer to  or visit the project page: http://labelme.csail.mit.edu.
Figure 1: Screenshot from the labeling tool in use. The user is shown an image, possibly with one or more existing annotations in the image. The user has the option of annotating a new object, by clicking around the boundary of the object, or editing an existing annotation. The user can annotate as many objects in the image as they wish. Once finished, the user then clicks the ``Show New Image'' button to see a new image.
Figure 2: a) Examples of annotated images in the database. The images cover a large range of scenes and object categories. b) These polygons correspond to the average quality of the annotations for four object categories.
Financial support was provided by the National Geospatial-Intelligence Agency, NEGI-1582-04-0004, and a grant from BAE Systems. Kevin Murphy was supported in part by a Canadian NSERC Discovery Grant.
 A. Torralba. Contextual priming for object detection. International Journal of Computer Vision, 53(2):153-167, 2003.
 L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to unsupervised one-shot learning of object categories. In Proc. ICCV, 2003.
 Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proc. SIGCHI conference on Human factors in computing systems, 2004.
 Flikr. http://www.fickr.com.
 B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. LabelMe: a database and web-based tool for image annotation. Technical report, MIT AI Lab Memo AIM-2005-025, 2005.
 D.G. Stork. The Open Mind Initiative. IEEE Intelligent Systems and Their Applications, 14-3, 1999, pages 19-20.