MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Research Abstracts Home

CSAIL Digital Archive

Research Activities

CSAIL Home

horizontal line

Research Abstracts - 2007
horizontal line

horizontal line

Feedback-directed Random Test Generation

Carlos Pacheco, Shuvendu K. Lahiri (Microsoft), Michael D. Ernst & Thomas Ball (Microsoft)

We present a new approach to randomized test generation that uses feedback to prune undesirable states and method sequences from the search [2]. Our work addresses random generation of unit tests for object-oriented programs. Test case generation for object-oriented programs can be viewed as a search problem over data values and method sequences, where the public methods provide the ways that state can be mutated. Existing approaches to method sequence generation (for example, [1,3]) typically operate offline: the method sequences are derived statically, compiled, and finally executed. In contrast, we take a feedback-directed, execution-driven approach: our technique inspects the actual execution of the sequence as it is being constructed, in order to guide the search toward sequences that yield new object states. Our technique is fully automatic, requires no input from the user, and scales to realistic applications composed of thousands of classes. Our technique builds inputs incrementally by randomly selecting a method call to apply and selecting arguments from among previously-constructed inputs. As soon as it is built, the input is executed and checked against a set of contracts; the technique then uses the result of the execution to determine if the input is redundant, illegal or error-revealing:

Redundant: the objects that the sequence constructs are equal to objects constructed by a previously-created input.
Illegal: execution of the sequence leads to an exception suggesting that the input is illegal, for example a null-pointer exception thrown when a method is given null as an input parameter.
Error-revealing: the execution violates a contract, e.g. the reflexivity of equals as specified in Sun's API documentation.

Finally, any test input that exhibits violating behavior is minimized and output as small, failing test case like those in Figure 1.

We also contribute a set of auxiliary techniques: heuristics for guiding the search, a contract checking framework that acts as an oracle, and test case minimization for reducing a very long method sequence to a small test sequence that exhibits the same behavior. We have implemented the technique in Randoop [2]. Randoop stands for Random tester for object-oriented programs. Randoop is fully automatic, requires no input from the user (other than the name of a binary for .NET or a class directory for Java), and scales to realistic applications with hundreds of classes. Randoop has found serious errors in widely-deployed commercial and open-source software.

Example

Consider Figure 1 below, which shows two of the test cases generated by Randoop when run on Sun's JDK (version 1.5). The first test case shows a violation of the equals contract: a set returned by unmodifiableSet(Set) sometimes returns false for s1.equals(s1). This violates the reflexivity of equals as specified in Sun's API documentation. This test case actually reveals two errors: one in equals, and one in the TreeSet(Collection) constructor, which failed to throw ClassCastException as required by its specification.

public static void test1() {
  LinkedList l1 = new LinkedList();
  HashMap h1 = new HashMap(0);
  Collection c1 = h1.values();
  java.lang.Object[] a1 = c1.toArray();
  l1.addFirst(a1);
  TreeSet t1 = new TreeSet(l1);
  Set s1 = Collections.unmodifiableSet(t1);
  
  // This assertion fails
  Assert.assertTrue(s1.equals(s1));
}


import javax.xml.datatype.*;

public static void test2() {
  DatatypeFactory d1 = DatatypeFactory.newInstance();
  XMLGregorianCalendar x1 =
    d1.newXMLGregorianCalendarTime(10, 1, 1, 100);
  
  // x1.hashCode() throws no exception at this point.
  x1.setYear(0);
  
  try {
    x1.hashCode(); // hashCode throws an exception
  } catch (IllegalArgumentException e) {
    Assert.fail(e.getMessage());
  }
}

Figure 1. Two test cases generated by Randoop, a tool that implements our technique for Java. The first test shows a java.util.Set such that s1.equals(s1) == false, a violation of the contract for equals. The second test shows a legal object for which hashCode throws an exception.

The second test case creates a legal instance x1 of XMLGregorianCalendar such that x1.hashCode() throws an exception. (The exception is thrown in a constructor that hashCode calls, and is not caught by hashCode.) The API specification explicitly states that 0 is a legal argument to setYear, but such a call causes hashCode to throw an exception, which should never happen for any legal object. Errors like those in TreeSet(Collection) and XMLGregorianCalendar.setYear(int) can be very difficult to debug. Because the violating method does not throw any exceptions, the error may not be exposed for a long time or not until an application is deployed. Our technique aims to expose such problematic behavior before it surfaces as a fault in an application that uses the library.

Evaluation

Our experimental results indicate that feedback-directed random testing can outperform systematic testing in terms of coverage and error detection. On four container data structures used previously to evaluate five different systematic input generation techniques [4], inputs created with feedback-directed random generation achieve equal or coverage than all the systematic techniques. In terms of error detection, feedback-directed random testing revealed many errors across 14 widely-deployed, well-tested Java and .NET libraries totaling 780KLOC. For example:

Three other methods in Sun's JDK create objects that violate the reflexivity of equals. Other methods in javax.xml create malformed objects that crash when hashCode or toString is invoked.
Constructors incorrectly initialize new objects and methods do not handle a corner case and lead to a runtime exception in Apache commons libraries.
dozens of methods in a set of Microsoft libraries throw exceptions not allowed by the specification.
A Microsoft library exhibits non-terminating behavior on a legal input.

By contrast, testing the libraries via model checking using the JPF model checker was not able to create any error-revealing test inputs: the state space for these libraries is enormous, and the model checker ran out of resources after exploring a tiny, localized portion of the state space. Our results suggest that for large libraries, the sparse, global sampling that Randoop performs can reveal errors more efficiently than the dense, local sampling that model checkers perform. And unlike systematic techniques, feedback-directed random testing does not require a specialized virtual machine, code instrumentation, or the use of constraint solvers or theorem provers. This makes the technique highly scalable: we were able to run Randoop on the .NET Framework libraries and three industrial implementations of the JDK, and found previously-unknown errors. In summary, our experiments indicate that feedback-directed random generation retains the benefits of random testing (scalability, simplicity of implementation), avoids random testing's pitfalls (generation of redundant or meaningless inputs), and is competitive with systematic techniques.

References:

[1] Csallner and Y. Smaragdakis. Check'n' Crash: Combining static checking and testing. In ICSE'05, Proceedings of the 26th International Conference on Software Engineering, St. Louis, MO, USA, May 18-20, 2005.

[2] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In ICSE '07: Proceedings of the 29th International Conference on Software Engineering, (Minneapolis, MN, USA), 2007.

[3] T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 05), pages 365-381, April 2005.

[4] W. Visser, C. S. Pacereanu, and R. Pelanek. Test input generation for java containers using state matching. In Proceedings of the 2006 International Symposium on Software Testing and Analysis, Portland, Maine, USA.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu