Abstracts - 2007
amock: Automatic Generation of Readable Software Unit Tests from System Tests
David S. Glasser
Automated testing is essential for the construction of reliable software. Computer systems are complex enough that even small changes can have far-reaching consequences. Both macroscopic system tests and microscopic unit tests are required to ensure both that changes do not affect the overall operation of the system and that internal interfaces continue to function as expected. However, writing tests can be one of the most tedious parts of the software development process. While most software engineers recognize the importance of automated testing, many projects fail to achieve a desirable level of testing.
Unit tests can be more useful than system tests for two reasons. First, if a developer changes one small part of a program, running the unit tests that exercise just that part can be more efficient than running an entire system test. Secondly, system test failures often only reveal whether or not the test passed; when a system test starts to fail, it is often difficult to find which subsystem the error originated in. Unit tests are more focused, and so when a unit test fails, it is generally easier to tell which module or even method is responsible for the failure.
Our technique, amock, implements test slicing for Java. It takes as input an execution of a passing system test, and creates small unit tests which exercise individual objects in the same way that the system test did. These unit tests are human-readable and rely only on the standard Java testing frameworks JUnit and jMock. Developers can incorporate these generated tests directly into their test suites as regression tests.
Figure 1 shows a subject program and a corresponding generated unit test. The generated test uses the jMock framework to describe the expected interactions between objects.
The generating test is testing the
Many different approaches to automatic unit test generation have been
studied. The concept of using a dynamic analysis to automatically convert
large tests to small tests has been implemented as
The main goal of test factoring  is to allow developers to run a slow system test much more efficiently when only a small part of the system has changed. Test factoring creates a transcript of a long system test or other program execution, and then plays it back in a special instrumented Java run-time environment where all objects other than those of the class under test are mocks following the transcript. Method calls that only involve classes not under test are essentially skipped, leading to a much faster execution of the original test suite which only exercises code from one class. If the class under test attempts to make different method calls to the rest of the system than it did during the original execution, the replay stops and tells the user that the full system test should be run instead; otherwise, the factored test succeeds or fails according to whether the program that it is replaying succeeds or fails.
Test factoring makes the strong claim that if a factored test for the only class whose code has changed passes, then the system test would have passed as well. However, many benign changes to the class can cause test factoring to fail to replay the test. For example, the method under test could call external methods in a slightly different order or with slightly different parameters; test factoring will consider this to be too different to continue the replay, but a human could determine that both versions are acceptable. A developer cannot easily take a transcript made by test factoring and make it less brittle by relaxing these constraints; tests are recorded in a transcript which is not meant for human consumption. Additionally, test factoring only slices up the program state based on class or package names, not based on time or individual object lifetimes: the generated tests consist of replaying an entire system test on the target class. Thus, even if the tests were human-readable, they would be very long; even if a typical use of an instance of the class under test only involves a few method calls, each test includes all of the method calls ever made on any object of that class. Finally, test factoring relies on the ability to instrument all classes (including the JDK system libraries) even just to replay the tests, which makes it non-trivial to integrate into a pre-existing unit testing process. amock avoids these difficulties by generating tests that use only standard test frameworks such as JUnit and jMock.
In test carving , during the execution of a long system test, all reachable objects are serializing to disk frequently. Pieces of the long test can then be independently ``played back'' by loading the state before an action, executing that action, and comparing the actual post-state to the serialized post-state. This method is fundamentally state-based, and relies strongly on the internal data structures of the objects not changing significantly. Test carving produces tests which only work in the context of their custom serialization framework — carved tests look nothing like tests that a programmer would have designed by hand, and it is unclear how much information a developer can get about why a carved test failed.
amock currently exists in a
 D. Saff, S. Artzi, J. H. Perkins, and M. D. Ernst. Automatic test factoring for Java. In ASE 2005: Proceedings of the 20th Annual International Conference on Automated Software Engineering, pages 114–123, Long Beach, CA, USA, Nov. 9-11, 2005.
 S. Elbaum, H. N. Chin, M. Dwyer, and J. Dokulil. Carving differential unit test cases from system test cases. In Proceedings of the International Symposium Foundations of Software Engineering. ACM, November 2006.