Abstracts - 2007
The Daikon System for Dynamic Detection of Likely Invariants
Jeff H. Perkins, Weijie Yuan & Michael D. Ernst
Daikon is an implementation of dynamic detection of likely invariants; that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program; these are often seen in assert statements, documentation, and formal specifications. Invariants can be useful in program understanding and a host of other applications. Examples include being constant (x = a), non-zero (x ≠ 0), being in a range (a ≤ x ≤ b), linear relationships (y = ax + b), ordering (x ≤ y), functions from a library (x = fn(y)), containment (x ∈ y), sortedness (x is sorted), and many more. You can extend Daikon to add new properties.
Daikon's output has been used for generating test cases, predicting incompatibilities in component integration, automating theorem-proving, repairing inconsistent data structures, and checking for the validity of data streams, among other tasks. Daikon has been used in the research methodology (as at least one step in the technique) of over 35 publications.
Despite their advantages, invariants are usually missing from programs. An alternative to expecting programmer to fully annotate their code with invariants is to automatically infer likely invariants from program executions. Daikon runs a program, observes the values that the program computes (data trace) and then reports the properties that were true over the observed executions.
Our long term goal is to be able to generate likely invariants on realistic programs across a variety of platforms and languages so that the use of invariants can become a standard part of software development. To accomplish this, Daikon must be able to handle realistic programs on a variety of platforms and languages.
Daikon discovers likely invariants from program executions by instrumenting the target program to trace certain variables, running the instrumented program, and inferring invariants over both the instrumented variables and derived variables not manifest in the program. All of the steps are fully automatic (except for requirements of the target program, such as its inputs), and in fact the user issues only a single command that is much like running the original program.
The basic task of the instrumenter is to add instructions to the target program so that it will output the values of variables. The traces are piped directly to the inference engine, avoiding the necessity of disk storage. Alternately, the traces can be output to a file, permitting repeated processing or combined processing of multiple runs. In order to obtain good coverage, a target program sometimes needs to be run multiple times over separate inputs.
Instrumenters are language specific. Currently Daikon has instrumenters for C/C++, Java, Perl, and IOA programs. Some instrumenters translate the source to create a new version of the source that is then compiled, linked, and run. Binary instrumenters operate directly upon the executable and are often preferable as they are easier to use. However, they are more likely (at least in the case of compiled languages) to be platform specific.
In languages such as C/C++, it is not possible to determine if a pointer or array index is valid simply by examining the pointer. In this case, the instrumenter must augment the target program to keep track of additional information to determine whether or not each pointer refers to valid memory and the extent of each array.
The inference engine reads the trace data produced by an instrumenter and produces likely invariants.
The inference step tests possible invariants against the values captured from the instrumented variables. Properties that are satisfied over all of the values and that also satisfy other tests, such as being statistically justified, not being over unrelated variables and not being implied by other reported invariants are reported as likely invariants.
The essential idea is to use a generate-and-check algorithm to test a a set of potential invariants against the traces. Daikon reports those properties that are tested to a sufficient degree without falsification.
Daikon is enhanced with a number of optimizations that allow it to scale to both non-trivial numbers of invariants and programs of non-trivial size. These optimizations are based on the fact that the basic generate-and-check algorithm checks and reports more invariants that necessary. Many invariants are redundant and do not need to be checked. Together the optimizations reduce the number of invariants that need to be checked by 99% .
In 2006, significant progress has been made in achieving our goal of generating invariants for a variety realistically sized programs. Daikon's non-instantiating suppression optimization was enhanced so that it runs more efficiently on program points with a large number of variables. Declared constants can be processed more efficiently, invariants over declared constants are output more clearly and many redundant invariants over declared constants have been removed. New security specific invariants have been added that enhance Daikon's ability to find anomolous behavior.
We will continue to work towards improving Daikon so that it can produce reasonable results on wider variety of realistic programs and to research how software engineers can best use the likely invariants that it uncovers.
This research is supported by gifts from IBM, NTT, and Toshiba, a grant from the MIT Deshpande Center, NSF grant CCR-0234651, and DARPA contract FA8750-04-2-0254.
 Jeff H. Perkins and Michael D. Ernst. Efficient incremental algorithms for dynamic detection of likely invariants. In Proceedings of the ACM Sigsoft 12th Symposium on the Foundations of Software Engineering SIGSOFT (FSE 2004), pp 23-32, Newport Beach, California, USA November 2004