Learning Video ProcessingAli Rahimi, Ben Recht & Trevor DarrellMany vision applications can be expressed as mapping one time series to another. For example tracking can be expressed as learning a mapping from each frame of a video sequence to an intrinsic time-varrying attribute of the scene, such as the position of the limbs of a person in the scene. We seek to learn such mappings from examples comprised of a few example mappings between frames and the desired attribute that should be extracted from the frame. To improve the performance of the learned function, the algorithm uses information in both the examples and the all unlabelled frames in the sequence. Here is an example input video [mpeg] of Ben flailing his hands. For a few frames, we specified the position of his joints. From these frames and the rest of the video, the algorithm learned a function that took as input an image, and returned the position of Ben's shoulders, elbows, and hands. Specified Examples
Applying the Function
Applying the Function (out of sample)
This video [avi] compares simple regression (black marketers) with semi-supervised regression (white markers). The above figures show the training data and some snapshots from this video. This techniques is closely tied to manifold learning, system identification, and semi-supervised learning. For more details, please see [1]. References:[1] A. Rahimi, B. Recht, T. Darrell. Learning Appearance Manifolds from Video. In The Proceedings of Computer Vision and Pattern Recognition, San Diego, CA, USA, July 2005. (pdf) |
||||||||||||||||||||||||||||||
|