Abstracts - 2006
Comparing Test Generation and Classification Techniques
Marcelo d'Amorim 1, Carlos Pacheco2, Darko Marinov1, Tao Xie3, & Michael D. Ernst21 University of Illinois, Urbana-Champaign
2 CSAIL, MIT
3 North Carolina State University
Unit testing checks the correctness of program units (components) in isolation. It is an important part of software development: if the units are incorrect, it is hard to build a correct system from them. Unit testing is becoming a common and substantial part of the software development practice.
Creation of a test suite requires test generation, which generates unit tests, and test classification, which determines whether a test passed or failed. Programmers often perform some test generation and test classification by hand, but it is tedious and error-prone to systematically perform testing manually.
Automatic techniques for test generation and classification
Researchers have proposed techniques that automate test generation and classification for unit testing of object-oriented programs. Techniques for automated test generation include random generation [1,4] and symbolic execution [3,5,6,7,8]. Random generation creates random sequences of method calls with random values for method parameters. Symbolic execution executes method sequences with symbolic parameters, builds constraints on these parameters, and solves these constraints to produce actual tests with concrete values for method parameters.
Techniques for automated test classification include those based on uncaught exceptions [1,6,8] and operational models [4,9]. Exception-based classification labels a test as potentially faulty if it throws an uncaught exception. Model-based classification first infers an operational model  that includes properties such as class invariants and method pre- and post-conditions. Executions that violate the properties are classified as potentially faulty.
Previous research proposed three combinations of random/symbolic generation techniques and exception-based/model-based classification techniques for object-oriented programs: exception-based random testing , model-based random testing , and exception-based symbolic testing [6,7,8].
Model-based symbolic testing
We propose a novel combination of generation-classification, model-based symbolic testing. The approach uses symbolic execution to automate both generation and classification of object-oriented unit tests. As in [4,8], our approach first infers an operational model for a set of classes under test. It then symbolically executes both the methods from these classes and the operational model to generate method sequences with symbolic parameters and to classify the sequences that violate the model for some constraints on the method parameters. It finally solves the constraints on these symbolic parameters and outputs a small number of (concrete) test inputs that are likely to reveal faults. In principle, our new technique may allow operational models to more effectively guide test generation for violating the models than the previous techniques [4,8].
We have implemented our approach in a tool called Symclat. Symclat provides automatic symbolic execution for Java, whereas several previous studies [6,7,8] required manual instrumentation for symbolic execution. Symclat builds on our previous work on symbolic exploration of Java programs . The key conceptual extension that Symclat provides is symbolic execution of model invariants.
Symclat uses Daikon  to generate models for the classes under test. Symclat instruments these classes to check model properties at method entry and exit points. Code execution and model evaluation are seamlessly integrated because the model is translated into executable code, which is handled by the symbolic engine as additional code to execute, together with the tested code.
We have performed a study that empirically compares the four pairs of random/symbolic generation and exception-based/model-based classification techniques. More specifically, we compare two tools that implement automated test generation---Eclat  based on random generation, and our tool Symclat based on symbolic execution (both tools support both exception-based and model-based test classification). Our study investigates the following two questions: (2) Is random generation or symbolic execution more effective in revealing faults? (2) Is exception-based or model-based classification more effective in revealing faults? Answering these questions not only provides insights for tool developers to improve the existing tools but also guidelines for tool users to understand how to use existing tools. Our empirical study uses 61 subjects originally taken from a variety of sources and used in a previous study on Eclat .
Random generation and and symbolic execution detect different faults. When symbolic execution can explore some code (i.e., the code has no programming constructs that the symbolic engine cannot explore), it can generate tests that reveal faults, and it can reveal all the faults than random generation can. However, the symbolic engine often cannot explore code; we were able to run Symclat on only 61 out of 631 subjects from the Eclat study , and Symclat cannot even explore all the code from these 10% of subjects. A better symbolic engine could explore some more subjects, but in general, symbolic execution cannot execute arbitrary code because of the theoretical limitations of the underlying technology (incompleteness of theorem provers and constraint solvers due to undecidability). Random generation can, in contrast, explore all code and thus reveals faults that symbolic execution misses. However, random exploration itself can miss some faults in code that it explores. For instance, it can miss a fault because the random selection does not generate a specific fault-revealing method sequence or a specific input value for a method.
Exception-based and model-based classification are also complementary. Neither appears better than the other: in our experiments, exception-based classification revealed more faults, but each technique revealed some faults that the other missed. Model-based classification can label as illegal a test that violates an incorrectly inferred pre-condition, and if the test is actually legal and fault-revealing, model-based classification will miss it. On the other hand, exception-based classification is unable to detect some errors that manifest themselves by the violation of a model property but throw no exceptions.
Since the techniques are complementary in detecting faults, our study suggests that a developer can benefit from using all techniques, instead of selecting only one.
 C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness ftester for Java. Software: Practice and Experience, 34:1025-1050, 2004.
 M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Trans. Softw. Eng., 27(2):99-123, 2001.
 J. C. King. Symbolic execution and program testing. Commun. ACM, 19(7):385-394, 1976.
 C. Pacheco and M. D. Ernst. Eclat: Automatic generation and classification of test inputs. In Proc. 19th European Conference on Object-Oriented Programming, pages 504-527, Glasgow, Scotland, July 2005.
 S. Visvanathan and N. Gupta. Generating test data for functions with pointer inputs. In Proc. 17th IEEE International Conference on Automated Software Engineering, pages 149-160, September 2002.
 W. Visser, C. S. Pasareanu, and S. Khurshid. Test input generation with Java PathFinder. In Proc. 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 97-107, 2004.
 W. Visser, C. S. Pasareanu, and R. Pelaaaanek. Test input generation for red-black trees using abstraction. In Proc. 20th IEEE/ACM International Conference on Automated Software Engineering, pages 414-417, November 2005.
 T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Proc. 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 365-381, April 2005.
 T. Xie and D. Notkin. Tool-assisted unit test selection based on operational violations. In Proc. 18th IEEE International Conference on Automated Software Engineering, pages 40-48, 2003.