MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Technical Reports

Work Products

Research Abstracts

Historical Collections

horizontal line

Research Abstracts - 2006
horizontal line

horizontal line

Feedback-directed Random Test Generation

Carlos Pacheco¹, Shuvendu K. Lahiri^, Sanjukta Pal¹, Thomas Ball^ & Michael D. Ernst¹

¹MIT CSAIL
^* Microsoft Research

Introduction

Random testing [5] is an attractive technique for detecting software errors. It can very cheaply generate vast numbers of tests, and those tests may expose corner cases that would have been overlooked by a human or by ordinary usage. Some studies suggest that random testing is as effective as systematic techniques [4,6].

A problem with random testing is that it can waste time exploring uninteresting parts of the search space. Not every input to a program is meaningful, and in fact most inputs are probably illegal. Consider a binary search routine that requires a sorted array as input. Since most arrays violate that precondition, random generation would spend most of its time creating such illegal inputs.

We present a new approach to random search by using feedback to prune undesirable states and method sequences from the search. We also contribute a set of auxiliary techniques: heuristics for guiding the search, a contract checking framework that acts as an oracle, and test case minimization for reducing a very long method sequence to a small test sequence that exhibits the same behavior. We have implemented the technique in two tools: Randoop is targeted to Microsoft's .Net Framework, and Joe is targeted to the Java Virtual Machine. Our tools scale to large programs, and they have found serious errors in widely-deployed commercial and open source software.

Example

Consider Figure 1 below, which shows two of the test cases generated by Joe when run on Sun's JDK (version 1.5). The first test case shows a violation of the equals contract: a set returned by unmodifiableSet(Set) sometimes returns false for s1.equals(s1). This violates the reflexivity of equals as specified in Sun's API documentation. This test case actually reveals two errors: one in equals, and one in the TreeSet(Collection) constructor, which failed to throw ClassCastException as required by its specification.

public static void test1() {
  LinkedList l1 = new LinkedList();
  HashMap h1 = new HashMap(0);
  Collection c1 = h1.values();
  java.lang.Object[] a1 = c1.toArray();
  l1.addFirst(a1);
  TreeSet t1 = new TreeSet(l1);
  Set s1 = Collections.unmodifiableSet(t1);
  
  // This assertion fails
  Assert.assertTrue(s1.equals(s1));
}

import javax.xml.datatype.*;

public static void test2() {
  DatatypeFactory d1 = DatatypeFactory.newInstance();
  XMLGregorianCalendar x1 =
    d1.newXMLGregorianCalendarTime(10, 1, 1, 100);
  
  // x1.hashCode() throws no exception at this point.
  x1.setYear(0);
  
  try {
    x1.hashCode(); // hashCode throws an exception
  } catch (IllegalArgumentException e) {
    Assert.fail(e.getMessage());
  }
}

Figure 1. Two test cases generated by Joe, a tool that implements our technique for Java. The first test shows a java.util.Set such that s1.equals(s1) == false, a violation of the contract for equals. The second test shows a legal object for which hashCode throws an exception.

The second test case creates a legal instance x1 of XMLGregorianCalendar such that x1.hashCode() throws an exception. (The exception is thrown in a constructor that hashCode calls, and is not caught by hashCode.) The API specification explicitly states that 0 is a legal argument to setYear, but such a call causes hashCode to throw an exception, which should never happen for any legal object. Errors like those in TreeSet(Collection) and XMLGregorianCalendar.setYear(int) can be very difficult to debug. Because the violating method does not throw any exceptions, the error may not be exposed for a long time or not until an application is deployed. Our technique aims to expose such problematic behavior before it surfaces as a fault in an application that uses the library.

Feedback-directed test generation

Test case generation for object-oriented programs can be viewed as a search problem over data values and method sequences, where the public methods provide the ways that state can be mutated. Existing approaches to method sequence generation (for example, [1,2,3,7]) typically operate offline: the method sequences are derived statically, compiled, and finally executed. In contrast, we take a feedback-directed, execution-driven approach: our technique inspects the actual execution of the sequence as it is being constructed, in order to guide the search toward sequences that yield new object states. Our technique is fully automatic, requires no input from the user, and scales to realistic applications composed of thousands of classes. Our technique builds inputs incrementally by randomly selecting a method call to apply and selecting arguments from among previously-constructed inputs. As soon as it is built, the input is executed and checked against a set of contracts; the technique then uses the result of the execution to determine if the input is redundant, illegal or error-revealing:

Redundant: the objects that the sequence constructs are equal to objects constructed by a previously-created input.
Illegal: execution of the sequence leads to an exception suggesting that the input is illegal, for example a null-pointer exception thrown when a method is given null as an input parameter.
Error-revealing: the execution violates a contract, e.g. the reflexivity of equals as specified in Sun's API documentation.

Finally, any test input that exhibits violating behavior is minimized and output as small, failing test case like those in Figure 1.

Why it works

To understand why feedback-directed outperforms undirected, consider the two test cases from Figure 1. When creating test1(), the Joe tool constructed a new, empty linked list via a one-element method sequence and later extended the sequence with a call to addFirst. Runtime redundancy checks detected this sequence as producing an object different from the empty list, and the extended sequence was deemed ``useful'' by the tool. Attempts to extend the set of LinkedList sequences with a call of isEmpty, size, remove(int), or most of the 28 methods of LinkedList would have failed to produce a useful sequence, either because execution of the extended sequence did not produce a new (i.e., nonempty) list, or because the execution threw an exception.

Offline exploration, on the other hand, may obliviously extend method sequences that throw exceptions, or might create and extend distinct sequences that yield equal objects. In our experiments, offline exploration did not discover the above violation of the equals contract.

In test2(), factory method newXMLGregorianCalendarTime requires four integer arguments representing year, month, day, and timezone. Without pruning based on execution results, random search can generate and extend method sequences for illegal times (e.g., one with a negative month), not knowing that the execution of the sequence will throw an exception. Runtime pruning reduces the space of invalid sequences and thus increases the chances that violation-inducing test cases are created.

Evaluation

We have implemented the technique in two tools, RANDOOP (for Microsoft's .Net Framework) and JOE (for Java). To evaluate the technique's effectiveness, we have applied the tools to a set of 14 widely-used libraries comprising 780KLOC, and found many serious errors, including:

Three other methods in Sun's JDK create objects that violate the reflexivity of equals. Other methods in javax.xml create malformed objects that crash when hashCode or toString is invoked.
Constructors incorrectly initialize new objects and methods do not handle a corner case and lead to a runtime exception in Apache commons libraries.
192 methods in a set of Microsoft libraries throw exceptions not allowed by the specification.
A Microsoft library exhibits non-terminating behavior on a legal input.

In a second experiment, we evaluated feedback-directed generation against traditional unguided generation. Compared with traditional unguided random test generation (as implemented both by us and by an independent tool [2]), feedback-directed test generation finds more bugs, finds more interesting and severe bugs, and produces fewer redundant tests.

References:

[1] H. Y. Chen, T. H. Tse, and T. Y. Chen. Taccle: a methodology for object-oriented software testing at the class and cluster levels. ACM Trans. Softw. Eng. Methodol., 10(1):56-109, 2001.

[2] C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness ftester for Java. Software: Practice and Experience, 34:1025-1050, 2004.

[3] Csallner and Y. Smaragdakis. Check'n' Crash: Combining static checking and testing. In ICSE'05, Proceedings of the 26th International Conference on Software Engineering, St. Louis, MO, USA, May 18-20, 2005.

[4] J. W. Duran and S. C. Ntafos. An evaluation of random testing. IEEE Transactions on Software Engineering, 10(4):438-444, July 1984.

[5] D. Hamlet. Random testing. In Encyclopedia of Software Engineering. John Wiley and Sons, 1994.

[6] D. Hamlet and R. Taylor. Partition testing does not inspire confidence. IEEE Transactions on Software Engineering, 16(12):1402-1411, Dec. 1990.

[7] T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 05), pages 365-381, April 2005.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu