CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Euclidean Camera Calibration Using GPS Side Information

Gerald Dalley


Consider the problem of wide-area surveillance, such as traffic monitoring and activity classification around critical assets (e.g. an embassy, a troop base, critical infrastructure facilities such as oil depots, port facilities, airfield tarmacs). We want to monitor the flow of movement in such a setting from a large number of cameras, typically with non-overlapping fields of view. To coordinate the observations in these distributed cameras, we need to know the relative locations of the fields of view (e.g. what portion of the earth's surface does each camera see).

In some instances, one can carefully site and calibrate the cameras to manually obtain a mapping from the camera pixel coordinates to latitude, longitude coordinates on the earth's surface. However, in many cases, cameras must be rapidly deployed and may not last for long periods of time. Additionally, even carefully calibrated cameras tend to move after being bumped or even rattled by large passing vehicles; new cameras tend to be added to systems over time, and older cameras fail. Additionally, outfitting the camera itself with a global positioning system (GPS) receiver is a suboptimal solution. First, many cameras are mounted on the sides of large structures such as buildings that block and/or distort GPS readings. Secondly, a GPS receiver located at the camera indicates where the camera is, not where the groundplane that it views is located.

GPS Side Information

For our project, we assume that we have an installed network of cameras and at least one object moving through the surveillance area that is instrumented with a GPS receiver. Note that we do not have correspondence between the instrumented objects and camera observations, e.g. when we see an object pass through a camera, we do not know to which, if any, instrumented object it corresponds.

Under this setup, we know the latitude and longitude of each instrumented object at each point in time. We denote this data as the set {(xvi,yvi,tvi)}, where v indexes the vehicle and i indexes the time for a given vehicle. We may estimate

Equation for \hat{p}(x,y)

where Kronecker delta is the Kronecker delta function. The image below shows \hat{p}(x,y) for the real traffic network we tested:

Unconditional likelihood map
Plot of all GPS coordinates recorded for all vehicles using the real traffic network (\hat{p}(x,y)).

We separately have access to the recorded in-camera tracking data that reports when vehicles enter and exit each camera's field-of-view: {(tcj)}, where c indexes the camera number and j indexes the times during which reports have occurred. We seek to correlate the spatio-temporal GPS data with the camera report times. To do so, we wish to estimate p(x,y|c), the probability that a vehicle will be at location (x,y) given that it is seen at the edge of camera c.

Although we cannot quite estimate p(x,y|c) directly, we can approximate it with the mixture distribution

Equation for \tilde{p}

where ε is a hidden factor that indicates how well \tilde{p}(x,y|c) approximates p(x,y|c). For our current implementation, we assume that approximation and do not attempt to remove epsilon p.

To test this algorithm, we use a dataset consisting of five cameras, five instrumented vehicles following scripted behavior, and approximately unplanned 17 vehicles that passed through the cameras during the data collection period. In the below figure, we show \tilde{p}(x,y|c) as green and black (bold) dots, composited from all cameras. We threshold \tilde{p}(x,y|c) and consider high values to be candidate entry/exit locations for the cameras. These are shown in the figure as large dark black dots. For each camera, we find the bounding square of a fixed size that contains the largest number of high values. Note that we only consider the high values generated for the particular camera in question. These bounding squares are our estimated camera locations and are shown in red in the figure. Overlayed as well are dark red trapezoids indicating the manually-drawn approximate ground-truth fields-of-view of the cameras. Note that with this data we are able to correctly identify the camera locations.

Composite camera location estimates
Estimated camera locations, composited together ({\tilde{p}(x,y|c)}). Black (green) points are GPS coordinates of vehicles judged to (not) correspond to an entry or exit from a camera. Light red squares indicate the location of all automatically-estimated camera locations. Dark red trapezoids indicate the manually-estimated camera locations.
Future Work

We have several areas where we are working on improving and extending these preliminary results:

  • Improved camera localization through more robust density estimation.
  • Estimation of traffic flow direction in a given camera.
  • Calibrating individual wide-angle camera views to know the GPS location of each pixel, when projected to the ground plane.
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)