MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Technical Reports

Work Products

Research Abstracts

Historical Collections

horizontal line

Research Abstracts - 2006
horizontal line

horizontal line

The Daikon System for Dynamic Detection of Likely Invariants

Jeff H. Perkins & Michael D. Ernst

Introduction

Daikon is an implementation of dynamic detection of likely invariants; that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program; these are often seen in assert statements, documentation, and formal specifications. Invariants can be useful in program understanding and a host of other applications. Examples include being constant (x = a), non-zero (x ≠ 0), being in a range (a ≤ x ≤ b), linear relationships (y = ax + b), ordering (x ≤ y), functions from a library (x = fn(y)), containment (x ∈ y), sortedness (x is sorted), and many more. You can extend Daikon to add new properties.

Daikon's output has been used for generating test cases, predicting incompatibilities in component integration, automating theorem-proving, repairing inconsistent data structures, and checking for the validity of data streams, among other tasks. Daikon has been used in the research methodology (as at least one step in the technique) of over 35 publications.

Despite their advantages, invariants are usually missing from programs. An alternative to expecting programmer to fully annotate their code with invariants is to automatically infer likely invariants from program executions. Daikon runs a program, observes the values that the program computes (data trace) and then reports the properties that were true over the observed executions.

Our long term goal is to be able to generate likely invariants on realistic programs across a variety of platforms and languages so that the use of invariants can become a standard part of software development. To accomplish this, Daikon must be able to handle realistic programs on a variety of platforms and languages.

Architecture

Daikon discovers likely invariants from program executions by instrumenting the target program to trace certain variables, running the instrumented program, and inferring invariants over both the instrumented variables and derived variables not manifest in the program. All of the steps are fully automatic (except for requirements of the target program, such as its inputs), and in fact the user issues only a single command that is much like running the original program.

Instrumenters.

The basic task of the instrumenter is to add instructions to the target program so that it will output the values of variables. The traces are piped directly to the inference engine, avoiding the necessity of disk storage. Alternately, the traces can be output to a file, permitting repeated processing or combined processing of multiple runs. In order to obtain good coverage, a target program sometimes needs to be run multiple times over separate inputs.

Instrumenters are language specific. Currently Daikon has instrumenters for C/C++, Java, Perl, and IOA programs. Some instrumenters translate the source to create a new version of the source that is then compiled, linked, and run. Binary instrumenters operate directly upon the executable and are often preferable as they are easier to use. However, they are more likely (at least in the case of compiled languages) to be platform specific.

In languages such as C/C++, it is not possible to determine if a pointer or array index is valid simply by examining the pointer. In this case, the instrumenter must augment the target program to keep track of additional information to determine whether or not each pointer refers to valid memory and the extent of each array.

Inference Engine.

The inference engine reads the trace data produced by an instrumenter and produces likely invariants.

The inference step tests possible invariants against the values captured from the instrumented variables. Properties that are satisfied over all of the values and that also satisfy other tests, such as being statistically justified, not being over unrelated variables and not being implied by other reported invariants are reported as likely invariants.

The essential idea is to use a generate-and-check algorithm to test a a set of potential invariants against the traces. Daikon reports those properties that are tested to a sufficient degree without falsification.

Daikon is enhanced with a number of optimizations that allow it to scale to both non-trivial numbers of invariants and programs of non-trivial size. These optimizations are based on the fact that the basic generate-and-check algorithm checks and reports more invariants that necessary. Many invariants are redundant and do not need to be checked. Together the optimizations reduce the number of invariants that need to be checked by 99% [1].

Progress

In 2005, significant progress has been made in achieving our goal of generating invariants for a variety realistically sized programs. Daikon's support for C and C++ on a variety of platforms has significantly improved. Java support is now 100 percent Java which allows Daikon to work on any Java platform with a version 5.0 JVM.

Kvasir. Daikon's binary front end for C and C++ on Linux, named Kvasir, executes C and C++ programs and creates data trace files by examining the operation of the binary at runtime. It is built upon the Fjalar dynamic analysis framework for C and C++ programs. Fjalar is based on the Valgrind dynamic program supervision framework. By using Valgrind's memcheck support, Kvasir can keep track of the validity of each pointer in the program and can also tell whether or not valid memory has been initialized. Kvasir has been enhanced to support C++ and to handle programs of significant size (up to 110,000 non-comment, non-blank lines of code).

Mangel-Wurzel. Daikon has a new source front end for C and C++, named Mangel-Wurzel. Mangel-Wurzel can run on nearly any Unix platform as well as on Windows (with the Microsoft Visual C/C++ compiler). Mangel-Wurzel uses the Purify program analysis tool from Rational Software to determine the validity of pointers.

Chicory. Daikon's binary front end for Java dynamically instruments Java class files as they are loaded. Because it is implemented in Java, Chicory will run on any platform with a version 5.0 JVM. Chicory has been enhanced to work with larger programs of up to 40,000 non-comment non-blank lines of code.

Online processing. Both Chicory and Kvasir support online processing (via a direct connection from the target program to the inference engine) so that a trace file need not be created. This allows traces of indefinite size to be supported.

Comparability. Invariants over two or more variables are only interesting if the variables are related to each other in some manner. For example, consider a variable temp that contains the temperature and a variable time that contains the number of seconds since 1970. The invariant (time > temp) is true but uninteresting. Computing invariants between unrelated variables not only clutters the output but also consumes significant computing resources.

Our DynComp tool calculates comparability using a dynamic unification-based analysis. Initially each run-time value gets a unique type. A run-time interaction among values indicates that they are related (their types are unified). Also at run time, type information for variables are accumulated from the abstract types for values.

Use of DynComp significantly improved both the run time and precision of Daikon. It also allows more heuristic optimizations (such as not comparing pointer values of different types) to be removed without a time/space penalty, which furthers improves the results.

Future

We will continue to work towards improving Daikon so that it can produce reasonable results on wider variety of realistic programs and to research how software engineers can best use the likely invariants that it uncovers.

Research Support:

This research is supported by gifts from IBM, NTT, and Toshiba, a grant from the MIT Deshpande Center, NSF grant CCR-0234651, and DARPA contract FA8750-04-2-0254.

References:

[1] Jeff H. Perkins and Michael D. Ernst. Efficient incremental algorithms for dynamic detection of likely invariants. In Proceedings of the ACM Sigsoft 12th Symposium on the Foundations of Software Engineering SIGSOFT (FSE 2004), pp 23-32, Newport Beach, California, USA November 2004

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu