CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Monitoring and Diagnosis of Complex Software and Hardware Systems for Earth Observing Missions

Tsoline Mikaelian & Brian C. Williams


Model-based diagnosis of devices has traditionally operated on hardware systems [1][2]. For instance, given an observation sequence, the Livingstone [3] diagnostic engine estimates the state of hardware components based on hidden Markov models that describe each component's behavior in terms of nominal and faulty modes. At the other end of the spectrum, researchers have applied model-based diagnosis to software debugging [4]. This work explores the middle ground between the two, in particular the monitoring and diagnosis of systems with combined hardware and software behavior. It is currently being tested on software failure scenarios that occurred onboard the NASA New Millennium Earth Observing I (EO-1) spacecraft (Fig. 1).


Many complex systems today, such as spacecraft, robotic networks, automobiles and medical devices consist of hardware components whose functionality is extended or controlled by embedded software. The embedded software in each of these systems interacts with the hardware components and influences their behavior. In order to correctly estimate the state of these devices, it is essential to consider their software-extended behavior. We introduce a monitoring and fault diagnosis capability based on software-extended behavioral models, motivated by the need for robustness and safety of increasingly ambitious applications.

As an example of a complex system, consider vision-based navigation for an autonomous rover. The camera in the navigation system is an instance of a device that has software-extended behavior: the image processing software embedded within the camera module augments the functionality of the camera by processing each image and determining whether it is corrupt. A sensor measuring the camera voltage may be used for estimating the physical state of the camera. A hardware model of the camera describes its physical behavior in terms of inputs, outputs and available sensor measurements. A diagnosis engine that uses only hardware models will not be able to reason about a corrupt image. Consider, for instance, a scenario in which the camera sensor measures a zero voltage. Based solely on hardware models of the camera, the measurement sensor and the battery, the most likely diagnoses will include camera failure, low battery voltage and sensor failure. However, given a software-extended model of the camera that incorporates the behavior of the image processing software, the quality of the image may be used to correctly diagnose the navigation system. Given that the processed image is not corrupt, the most likely diagnosis, that the measurement sensor is broken, may be deduced (Fig. 2).

Fig. 1:
Earth Observing I

This scenario demonstrates that a diagnostic engine for systems with software-extended behavior must: 

  1. Monitor the behavior of both the hardware and its embedded software, so that the software state can be used for diagnosing the hardware.
  2. Reason about the system state, given delayed symptoms [5] obtained from the software (an instance of a delayed symptom is the quality of the processed image).

We introduce a novel monitoring and diagnostic system that operates on software-extended behavior models [6], to meet the above requirements. In contrast to previous work on model-based software debugging, the purpose of this work is to leverage information within the embedded software to refine the diagnoses of physical systems. Capturing the behavior of software is more complex than that of hardware, due to the hierarchical structure of a program and the use of complex constructs. We address this complexity by using probabilistic, hierarchical, constraint-based automata (PHCA) [7] which can compactly represent complex behaviors. Building upon this previous work, we introduce a PHCA-based monitoring and diagnostic engine that can handle delayed symptoms. Given a history of observations and commands within a receding N-Stage time horizon, we enumerate and track the most likely trajectories of system state that are consistent with the observations, commands and the PHCA models. Our approach consists of two phases: an offline compilation phase and an online solution phase, as shown in Fig. 3.

The following are highlights of each phase.

Offline Compilation:

  • The PHCA trajectory estimation task is framed as a probabilistic constraint optimization problem (COP) [8] within the N-Stage horizon, in order to account for delayed symptoms.
  • The COP encodes the PHCA models and their execution semantics as probabilistic constraints, such that the optimal solution corresponds to the most likely PHCA state trajectory.
  • Encoding the PHCA execution semantics as constraints eliminates the need for a separate model simulation step during the online phase.
  • Probabilistic (valued) constraints provide convenient expressivity for encoding the models, by not limiting uncertainty specification to decision variables.
  • Tree decomposition [9] is applied to the constraint network to exploit the independence of subproblems through local consistency and dynamic programming, thus enabling the use of efficient optimal constraint solvers during the online phase.

Online Trajectory Tracking:

  • The COP is updated by shifting the time horizon, incorporating new observations and commands, and inserting constraints for tracking the trajectories found within the previous horizon.
  • The solutions to the COP are enumerated in best-first order, thus efficiently focusing on the most likely trajectories while allowing anytime behavior.
  • Within each time horizon, the COP is solved using an efficient, decomposition-based optimal constraint solver [10].

We are in the process of evaluating the system described above on representative scenarios from the Earth Observing One spacecraft. Future work includes optimizing the COP formulation by minimizing the number of variables and constraints generated, investigating the optimal size of the diagnosis horizon and its relationship to the number of trajectories tracked.


This research is sponsored in part by the NASA ROSES program and the DARPA SRS program.


[1] de Kleer, J., and Williams, B.C. Diagnosing multiple faults. In Artificial Intelligence 32(1), pp. 97-130, 1987.

[2] Dressler, O., and Struss, P. The consistency-based approach to automated diagnosis of devices. In Proc. KR-96, pp. 267-311, 1996.

[3] Williams, B.C., and Nayak, P. A model-based approach to reactive self-configuring systems. In Proc. AAAI-96, pp. 971-978, 1996.

[4] Mayer, W., and Stumptner, M. Approximate modeling for debugging of program loops. In Proc. DX-04, 2004.

[5] Kurien, J., and Nayak, P. Back to the future for consistency-based trajectory tracking. In Proc. AAAI-00, 2000.

[6] Mikaelian, T., Model-based monitoring and diagnosis of systems with software-extended behavior, SM Thesis, MIT, 2005. 

[7] Williams, B.C., Chung, S. and Gupta, V. Mode estimation of model-based programs: monitoring systems with complex behavior. In Proc. IJCAI-01, 2001.

[8] Schiex, T., Fargier, H., and Verfaillie, G. Valued constraint satisfaction problems: hard and easy problems. In Proc. IJCAI-95, 1995.

[9] Gottlob, G., Leone, N., and Scarcello, F. A comparison of structural CSP decomposition methods. Artificial Intelligence, 124(2), pp. 243-282, 2000.

[10] Sachenbacher, M., and Williams, B.C. Diagnosis as semiring-based constraint optimization. In Proc. ECAI-04, 2004.

vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu