Eclat: Automatic Generation and Classification of Test InputsCarlos Pacheco & Michael ErnstThe ProblemSuppose we have a working program and a test suite for it. How can we augment the test suite with new, valuable tests? Typically, this is done manually. The programmer thinks hard for a while, and comes up with new inputs to the program that cover untested behavior. Our goal is to provideautomated support to a programmer faced with this task. The Elat ApproachTo help a programmer write new tests, we have developed a technique [3] that creates new test inputs with the following properties:
We have implemented our technique in Eclat, a tool that creates new unit tests for Java programs. The tool is publicly available:
As an example of the technique and the Eclat tool, suppose the class MyStack.java implements a stack data structure (Figure 1). The class has a small test suite MyStackTest.java that we wish to improve. Eclat analyzes the class and the test suite, and it outputs a set of test inputs (uses of the stack) deemed potentially problematic, which differ from the behavior covered by the original test suite. For this example, Eclat creates an input that pops an empty stack and results in the stack pointer becoming negative, putting the stack in an illegal state. Eclat also creates an input that calls the equals method with null as its argument, resulting in a exception. These are problematic behaviors not explored by the original test suite.
TechniqueThe technique (Figure 2) requires a concrete execution of the program (e.g. an exiting test suite). It works as follows.
EvaluationTo evaluate the effectiveness of our technique in creating inputs that reveal errors in program, we used Eclat on 47 classes, several of which had multiple versions (totalling 568 implementations of the classes). All errors discovered by Eclat were real errors inadvertently introduced by the authors of the program. For this set of programs, Eclat generated on average 1415 inputs per class, and it reported on average 1.5 inputs as fault-revealing. Of these, 0.55 were indeed faults-revealing, i.e. caused program behavior that pointed to an error in the implementation. The remaining 0.95 inputs were false positives. In other words, approximately 1 in 3 inputs shown to the user were truly fault-revealing. Example Fault-revealing InputsTo illustrate the inputs that Eclat reports, and the errors it uncovers, below are two inputs reported. The first input tests a class that implements polynomials (in Java).
The first two lines in method test_1_integrate() create a polynomial and query part of its state. The next method call causes problematic behavior, which Eclat outputs as a comment. For this input, the call of method integrate results in the polynomial having a term with zero coefficient--something that never occurred in the original test suite. Further examination reveals that in fact, method integrate erroneously adds an unnecessary zero-term coefficient in some cases. The second example is an erroneous implementation of a method in a utility library. The method create_combinations(int sz, List list) creates all combinations of size sz of the elements in list. The test suite for this class does not test for combinations of size 0. Eclat reports that this input throws an exception in this case, which reveals that the method incorrectly handles combinations of size 0.
Future WorkFuture work on this research centers around two themes.
References[1] J. F. Bowring, J. M. Rehg, and M. J. Harrold. Active learning for automatic classification of software behavior. In ISSTA, pages 195-205, July 2004. [2] Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In ICSE, pages 480-490, May 2004. [3] C. Pacheco and M.D. Ernst. Eclat: Automatic generation and classification of test inputs. ECOOP 2005 --- Object-Oriented Programming, 19th European Conference, (Glasgow, Scotland), July 25-29, 2005. [4] A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In ICSE, pages 465-475, May 2003. |
||||||||
|