Finding Approximate POMDP solutions Through Belief CompressionNicholas RoyWhatWe have developed a technique for planning in large, realworld problems in the presence of substantial uncertainty. Our algorithms find control policies that can avoid ambiguous situations by taking advantage of information implicit in the world, and can also resolve uncertainty by taking advantage of informationgathering actions. WhyRecent research in the field of robotics has demonstrated the utility of probabilistic models for perception and state tracking on deployed robot systems. For example, Kalman filters and Markov localisation have been used successfully in many robot applications [3,5]. There has also been considerable research into control and decision making algorithms that are robust in the face of specific kinds of uncertainty [1]. Few control algorithms, however, make use of full probabilistic representations throughout planning. As a consequence, robot control can become increasingly brittle as the system's perceptual uncertainty, and state uncertainty, increase. The Partially Observable Markov Decision Process (POMDP) is a way to model uncertainty in a real world system's actions and perceptions [4]. Instead of making decisions based on the current perceived state of the world, the POMDP maintains a belief, or probability distribution, over the possible states of the world, and makes decisions based on the current belief. POMDPs are unfortunately computationally intractable for most real world problems involving planning in belief spaces. HowThe uncertainty that many real world problems exhibit often has a regular structure that can be exploited. By making assumptions about the kinds of probability distributions that a system may experience, it may be possible to reduce the complexity of finding policies that approximate the optimal POMDP policy, without being subject to the pain of finding the exact optimal policy. In figure 1 below we see two example beliefs, shown as particles drawn from the probability distribution over the possible locations of a mobile robot in a map. The denser the particles, the more likely the robot is to be at or near that location. On the left is the kind of distribution that could reasonably be expected to occur; the true location of the robot is not known but is relatively constrained. The righthand image, however, shows particles drawn from a very unlikely distribution. An algorithm that ignores such beliefs during planning will still perform well in the realworld. By ignoring such unlikely beliefs, we are able to generate good policies (in terms of expected reward) in a computationally tractable way. This allows for more reliable control in a number of situations, from motor control to system diagnosis. We have demonstrated this approach on both mobile robot motion planning and speech dialogue management. Our ability to produce good policies for POMDPstyle problems relies on using compact representations of the probabilistic state estimates. One algorithm uses a handchosen set of statistics: the mean and entropy. These statistics are jointly sufficient under certain restricted classes of distributions, for example, gaussians with a single variance parameter. For other distributions, this representation can be viewed as lossy. A second algorithm demonstrates a way to find lowdimensional representations of a set of highdimensional probability distributions automatically, through a technique similar to Principal Components Analysis (PCA). PCA is a technique for finding some lowdimensional, linear subspace that passes through, or close to, a set of data. The particular distance metric used by conventional PCA is not wellsuited for representing probability distributions; however, PCA can be easily modified for other distance metrics. We have demonstrated how to modify PCA appropriately for probability distributions, a technique called exponentialfamily PCA (EPCA) [2], and how to plan using this representation. ProgressFigure 2 compares the approximate POMDP solution with a conventional distanceoptimal algorithm. The objective is for the robot to travel from the start to the goal location, with limited sensing. Not shown is the full probability distribution at each point in time. However, the conventional, distanceoptimal motion planning algorithm becomes very uncertain about its position and ultimately lost, because the distanceoptimal trajectory causes the robot to be without sensor information for an extended period of time. In contrast, the POMDPstyle solution is able to model the limited sensing, and chooses a trajectory that compensates for the limited sensing range by following the walls. As a result, the robot is able to reach the goal robustly.
Research SupportThis research was supported in part by the National Science Foundation under ITR grant # IIS0121426. AcknowledgementsThis is joint work with Sebastian Thrun and Geoffrey Gordon. References[1] J. Andrew Bagnell and Jeff Schneider. Autonomous helicopter control using reinforcement learning policy search methods. pages 16151620, Seoul, South Korea, 2001. IEEE Press. [2] Michael Collins, Sanjoy Dasgupta, and Robert Schapire. A generalization of principal components analysis to the exponential family. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14 (NIPS), Cambridge, MA, 2002. MIT Press. [3] John Leonard and Hugh DurrantWhyte. Mobile robot localization by tracking geometric beacons. IEEE Transactions on Robotics and Automation, 7(3):376382, June 1991. [4] Edward Sondik. The Optimal Control of Partially Observable {M}arkov Decision Processes. PhD thesis, Stanford University, Stanford, California, 1971. [5] Sebastian Thrun, Michael Beetz, Maren Bennewitz, Wolfram Burgard, Armin B. Cremers, Frank Dellaert, Dieter Fox, Dirk Haehnel, Charles Rosenberg, Nicholas Roy, Jamieson Schulte, and Dirk Schulz. Probabilistic algorithms and the interactive museum tourguide robot minerva. International Journal of Robotics Research}, 19(11):972999, November 2000. 

