Abstracts - 2006
Palulu: Automatic Generation of Unit Regression Tests
Shay Artzi, Michael Ernst, David Glasser, Adam Kiezun, Carlos Pacheco & Jeff Perkins
Programmers spend a significant amount of time modifying existing code: for example, fixing bugs, refactoring, or adding new features. Regression testing---comparing the behavior of the changed program with the program's original behavior---is a widely-used practice to ensure behavior preservation across code changes. We have developed a technique that automates the generation of regression tests at the unit level.
Figure 1 below shows a unit regression test for Java's
Our technique takes a program and outputs many regression tests like the one in Figure 1. An automated tool like Palulu can help developers create regression tests that complement manually-created tests. Palulu has found a number of regression errors in well-tested libraries like IBM's implementation of the JDK and Sun's JDK 1.6 Beta.
Creating regression tests
Creating a regression test consists of (1) creating test inputs that
exercise the program under test, and (2) creating an oracle that checks
that the behavior of the program is preserved. Our technique creates test
inputs by constructing random sequences of method calls that exercise
the methods of the program under test, using feedback-directed generation
. For each input, the technique creates a regression oracle by running
the program on the input and recording (primitive) values returned by
a set of observer methods for the class under test. observer methods are
obtained via an automatic purity analysis [1,3,4]. After executing a test
input and recording its return values, the technique re-executes the input
and ensures that the recorded behavior is reproduced; this second run
removes recorded behaviors that are not reproducible (consider for example
a test that records the value of
Creating complex regression tests
Random test generation is effective for testing programs with relatively unconstrained interfaces, i.e. programs (like the JDK) that allow for the most part arbitrary sequencing of method calls. Random testing is not as effective for programs with more constrained interfaces requiring specific sequence of calls to behave properly. For example, a database server may require calls that initialize the database and establish a connection with a client before servicing requests. Purely random test generation is blind to these constraints, and is not likely to create legal test inputs that require complex setup code.
We have developed a second technique (also implemented in Palulu) that guides the generation process, based on a model of legal method sequences derived from a method trace of an example execution of the program. The technique creates a directed graph that summarizes the sequences of method calls observed during the program's run: edges are method calls and paths represent call sequences observed during execution. The model also specifies the primitive values (or strings) that were given as input arguments during the example run. The technique creates one model for each class under test.
Figure 2 shows an example model for a class implementing a database server. In addition to recording method calls, our instrumentation framework emits all field writes that happen within a method call. We use this additional information to dynamically determine pure (observer) methods from the trace , which the generator uses to create a regression oracle. Thus, the technique performs two dynamic analyses based on the example execution: model creation used to guide test input generation, and purity analysis used to create regression tests.
Given a set of models (one per class under test), the regression test generator creates inputs randomly, except that objects for modeled classes are constrained to follow the model. For the database example, a database object would be created by following the model from Figure 2, and thus it would be properly initialized.
We have used our tool to create regression tests for classes from Sun's JDK 1.5 and ran the tests on a newer version of the JDK, i.e., Sun's JDK 1.6 Beta. In a second experiment, we created regression tests for classes from Sun 1.5 JDK (same as in the first experiment) and ran the tests on IBM's JDK 1.5, which is a different, independent, implementation of the same API. In both experiments, our tool was able to find bugs (like the one illustrated in Figure 1) in a matter of seconds.
In an experiment to evaluate model-based generation of complex inputs, we used Palulu to generate a test for a complex class in Daikon, a large program developed in our group. Based on a model generated from an example execution, Palulu created a test consisting of 20 method calls and involving 10 interrelated classes. We also asked a developer familiar with the code to manually create a test for the class; the task took him 30 minutes, and the manually-generated test was structurally very similar to the one generated by Palulu. Furthermore, random generation without a model was not able to create a sequence of method calls that created a legal object for the class.
 S. Artzi, A. Kiezun, and Michael D. Ernst. Exploring Dynamic Purity Analysis. CSAIL Research Abstract 2006.
 Carlos Pacheco, Shuvendu K. Lahiri, Sanjukta Pal, Thomas Ball, and Michael D. Ernst. Feedback-directed Random Test Generation. Under submission.
 A. Rountev and B. G. Ryder. Points-to and side-effect analyses for programs built with precompiled libraries. Lecture Notes in Computer Science, 2027:20+, 2001.
 A. Salcianu and M. C. Rinard. Purity and side effect analysis for java programs. In VMCAI, pages 199 215, 2005.