Approximate Dynamic Programming Approaches to Sensor Management in Sensor NetworksJason L. Williams, John W. Fisher III & Alan S. WillskyThe ProblemSensor resource management issues are pervasive in the design of sensor networks: choosing where the posterior distribution should be maintained, which sensors should be activated for sensing, and which sensors should communicate and what should be communicated are issues which are crucial to both the performance and the lifetime of the network. While many aspects of the problem can be posed in the framework of Markov Decision Processes (MDPs), the resulting algorithms are NPhard, hence it is necessary to consider efficient approximations. There is also an inevitable tradeoff between system performance and network lifetime, which necessitates methods that are able to simultaneously consider multiple objectives. MotivationNetworks of intelligent sensors have the potential to provide unique capabilities for monitoring wide geographic areas through the intelligent exploitation of local computation (so called innetwork computing) and the judicious use of intersensor communication. Tracking moving objects is a common application in which the quantities of interest (i.e. kinematic state) are inferred largely from sensor measurements which are in proximity to the object. Consequently, local fusion of sensor data is sufficient for computing an accurate model of the object state and associated uncertainty, as captured by the posterior distribution. In many sensor networks energy is a dear resource to be conserved so as to prolong the network's operational lifetime. Additionally, it is typically the case that the energy cost of communications is orders of magnitude greater than the energy cost of local computation. Accordingly, it is desirable to incorporate intelligent processing which exploits local computation to maximize the system performance while minimizing communication, and hence energy spent. ApproachConceptually, many of these problems can be formulated and optimally solved as a MDP. However, the state of the MDP is generally a posterior distribution (belief state); for example, in the object tracking example the MDP state could be the posterior distribution of object position and velocity. Since probably distributions are continuous functions, the state space of the MDP is uncountably infinite, and the problem cannot be solved exactly. We are considering a range of approximate dynamic programming methods, including various combinations of traditional approximations such as rolling horizon and rollout methods, open loop feedback controllers and partially stochastic certainty equivalent controllers, as well as learningbased approaches such as value function approximation and policygradient methods. We combine these approximations with Lagrangian relaxation methods to support multiple objective optimization. Our existing work has addressed two subproblems, using the popular paradigm in which a single sensor (the "leader node") is responsible for collecting measurements and incorporating them into the belief state at each time. The first subproblem [1] decouples sensor management (selecting sensors for activation and communication) from leader node selection (determining where the probabilistic representation should be maintained at each time step), and optimizes the latter using a rollout method, conditioned on a fixed strategy for the former. The second subproblem [2] constrains the system to utilize a single sensor for sensing at each time, and optimizes the estimation performance over a rolling horizon subject to a constraint on the expected future communication cost, incorporated using a Lagrangian relaxation method. Our formulations make extensive use of information theoretic objectives to measure uncertainty. Figure 1. A snapshot of the problem of interest. A region is populated with 20 randomly placed sensors, marked with black dots numbered from 1 to 20, which provide location information when the object is close to them. An object, denoted by the black cross, moves through the region and is tracked using a particle filter (particles shown in green). The sensor management algorithm dynamically selects which sensor should sense, and store and update the posterior distribution to maximize estimation performance subject to a constraint on communication cost. The sensor chosen for this time step is highlighted in red. Figure 2. Comparison of the communication cost accrued over time. The red line shows the communication cost accrued in a single Monte Carlo run by an algorithm which selects as leader node the sensor whose measurement has the largest mutual information with the object state. The blue line shows the communication cost accrued by an approximate dynamic programming method incorporates communication cost into its planning. Figure 3. Estimation performance versus communication cost results for various algorithms, averaged over 100 Monte Carlo simulations. The approximate DP algorithm is compared with the method which selects as leader node the sensor whose measurement has the largest mutual information with the object state in the higher diagram, and with the method which selects the sensor with the smallest expected square distance to the object in the lower diagram. The parameter lambda (the Lagrange multiplier) trades off the importance of estimation quality and communication cost. For more information see [2]. Future work will include application of learning methods such as value function approximation and policy gradient. The crucial element in applying these methods is determining the right parametric form, a difficult task when the state space is the space of probability distributions. ApplicationsOur methods are applicable to dynamic sensor network problems such as object tracking. We are also applying our methods to centralized sensor management problems where constraints model sensor utilization. References:[1] Jason L. Williams, John W. Fisher III and Alan S. Willsky. Optimization Approaches to Dynamic Routing of Measurements and Models in a Sensor Network Object Tracking Problem. In The Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 10611064, Philadelphia, PA, April 2005. [2] Jason L. Williams, John W. Fisher III and Alan S. Willsky. An Approximate Dynamic Programming Approach to a Communication Constrained Sensor Management Problem. Submitted to Fusion 2005. 

