MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Technical Reports

Work Products

Research Abstracts

Historical Collections

horizontal line

Research Abstracts - 2006
horizontal line

horizontal line

Incremental Individual Recognition for a Sociable Robotic Creature

Lijin Aryananda

Introduction

The primary objective of this project is to develop a robotic creature that 'lives' around people on a regular basis and incrementally learns from its daily social experience [1]. In particular, we are using MERTZ , a humanoid head robot platform, to investigate incremental individual recognition through social interaction. A substantial amount of research has been done in person identification technology using various modalities [2,3]. Most of these work attempt to solve the supervised identification problem: given a set of labeled training data, find the correct person label for the test data. Unfortunately, manually entering labeled data set into the robot imposes some limitations, such as a fixed number of people that the robot can recognize. We are interested in a flexible person identification framework, such that the robot can automatically learn to "get to know" different people as it interacts with them.

Motivation

As we introduce robots into the home for assisting and interacting with people on a regular basis, robots must be able to recognize and understand the roles of various people in the household, i.e. the elderly person, the young child, the parents, etc. An automatic and flexible person identification framework would allow the robot to incrementally learn to recognize new individuals over time through natural interaction without any awkward and tedious manual introduction process.

Current Work

We plan to take advantage of the robot's robust platform and long term continuous operation capability to acquire a large amount of multi-modal data from natural interaction with many people. In this setting, the nature of the person identification problem is quite different from that of the typical biometric applications. There is an abundance of available face and speech data during the human-robot interaction sessions. Since facial images can be recorded at 30 frames a second, we can take advantage of the high degree of continuity in the visual input. A very good multi-person face tracking can provide us with initial clusters of data sets. We can also utilize the temporal correlation among these multi-modal data to combine the face and speaker recognition capabilities. Moreover, our goal is not to maximize recogntion accuracy per test image. Instead, we aim to be able to consistently recognize and remember relevant individuals who interact with MERTZ regularly. In the same way, we also do not remember every single person we pass on the street. We are currently developing various necessary subsystems toward this goal, i.e. multi-person tracking, audio-visual correlation, face recognition, and speaker recognition.

MERTZ

Fig 1: Left -- MERTZ, an active-vision humanoid head robot with 13 degrees of freedom.
Right -- a sample experiment session where the robot interacts with passersby in a public space.

References:

[1] L. Aryananda. Mertz: A quest for a robust and scalable active vision humanoid head robot. IEEE-RAS International Conference on Humanoid Robots, 2004.

[2] A.E. Rosenberg and F.K. Soong. Recent research in automatic speaker recognition. Advances in Speech Signal Processing, S. Furui and M.M. Sondhi, Eds. New York: Marcel Dekker, pp. 701-783, 1992.

[3] W.Zhao, R. Chellappa, A. Rosenfeld, P. Phillips. Face recognition: A literature survey. ACM Computing Surveys, Volume 35, pp.399-458, December, 2003.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu