|We describe the design considerations underlying a system for scalable, automated capture of precisely controlled imagery in urban scenes. The system operates for architectural scenes in which, from every camera position, some two vanishing points are visible. It has been used to capture thousands of controlled images in outdoor environments spanning hundreds of meters. The proposed system architecture forms the foundation for a future, fully robotic outdoor mapping capability for urban areas, analogous to existing, satellite-based robotic mapping systems which acquire images and models of natural terrain. Four key ideas distinguish our approach from other methods. First, our sensor acquires georeferencing metadata with every image, enabling related images to be efficiently identified and registered. Second, the sensor acquires omni-directional images; we show strong experimental evidence that such images are fundamentally more powerful observations than conventional (narrow-FOV) images. Third, the system uses a probabilistic, projective error formulation to account for uncertainty. By treating measurement error in an appropriate depth-free framework, and by deferring decisions about camera calibration and scene structure until many noisy observations can be fused, the system achieves superior robustness and accuracy. Fourth, the system's computational requirements scale linearly in the input size, the area of the acquisition region, and the size of the output model. This is in contrast to most previous methods, which either assume constant-size inputs or exhibit quadratic running time (or worse) asymptotically. These attributes enable the system to operate in a regime of scale and physical extent which is unachievable by any other method, whether manual or automated. Consequently, it can acquire the most complex calibrated terrestrial image sets in existence, while operating faster thanany existing manual or algorithmic method.