CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Automatic Test Factoring for Java

David Saff, Shay Artzi, Jeff H. Perkins & Michael D. Ernst

Problem: Slow, Unfocused Tests

Frequent execution of a test suite during software maintenance can catch regression errors early and bolster the developer's confidence that steady progress is being made. However, a test suite that takes a long time to produce feedback slows down the developer and reduces the benefit of frequent testing, whether the testing is initiated manually (as in agile methodologies such as Extreme Programming) or automatically (as in continuous testing).

One way to speed test feedback is to avoid testing unchanged parts of the code. In the ideal case, testing a changed code base would only exercise the changed code and its direct interactions with the rest of the system. Developers can tune their test suites to approach this ideal, by writing small, focused test cases and structuring their code to avoid tangled dependencies. Can we automate this effort?

Solution: Test Factoring

We propose test factoring [1], an automatic method for generating focused, quick unit tests for any code base from any repeatable program execution, including general, slow system tests. Each new factored test runs more quickly than the original, while testing less functionality than the original--perhaps exercising only a single component of the code. Test selection techniques can respond to each code change by selecting and running just the factored tests that are affected, while guaranteeing the same defect-discovery power of the original test suite.

One way to factor a test is to introduce mock objects. If a test exercises a component T, which interacts with another component E (the "environment"), the implementation of E can be replaced by a mock. The mock checks that T's calls to E are as expected, and it simulates E's behavior in response. Given a system test for T and E, and a specification for how T and E interact when the system test is run, test factoring generates unit tests for T in which E is mocked. (This specification could be provided by a developer, but we capture it precisely from the running system by monitoring interaction at runtime.) The factored tests can isolate bugs in T from bugs in E and, if E is slow or expensive, improve test performance or cost. Mock objects can replace many kinds of expensive resources, including databases, data structures, files on disk, network communication, and human attention.

object diagram

Figure 1: A small three-object software system for calculating payroll. The PayrollCalculator and ResultSet objects are in the code under test. The Database is in the environment. Boxes represent individual objects. Arrows represent references between objects, and are labeled with the method calls made through each reference. All method calls across the central dotted line, and only those calls, must be captured and replayed correctly in a factored test.

As an example, consider the accounting application in Figure 1, which performs financial calculations based on records retrieved from a third-party database server. This application could be tested by inserting test records into the database, running the application, verifying the correct result is returned, and returning the database to its initial state. In many cases, querying a database is computationally expensive, and only the financial algorithms are being updated. The database interaction is held constant, and the developer trusts the database to provide deterministic results. Thus, a majority of the time spent running such a test suite will be wasted on communication with the database server.

To apply test factoring to this system, developers would likely choose the payroll calculator as T, and the database system as E. The expected queries and results are the only aspect of the database system that the regression tests for the payroll calculator depend on. Thus, we can replace the database system with a mock object E'.

This mock object could be manually constructed, and packages such as JMock provide an easy syntax for doing so, but the developer must still manually enter the correct response for every expected method call, which can consume hours. Instead, we have developed an automatic technique [2] that creates mock objects via a dynamic capture/replay approach. The three inputs to test factoring are a program, a system test, and a partition of the program into the code under test and the environment.

  1. The capture step occurs ahead of time, not at test time. It executes the system tests (we assume they pass) in the context of the original system TE, and records all interactions between T and E, by dynamically inserting recording decorator objects (Figure 2). The resulting transcript indicates, for each call, the procedure name, the arguments, and the return value.
  2. The replay phase occurs during execution of the factored tests. The system is run as before, but with real objects E replaced by mock objects E'; the original environment is never executed during the factored test. E' uses the recorded behavior in order to simulate the environment.

object diagram, with capturing decorators

Figure 2: The payroll system, with references across the partition decorated for behavior capture.

Experimental results

Our experimental methodology uses real code, real errors, and a realistic testing scenario. The program we studied, Daikon, consists of 347,000 lines of Java code (including third-party libraries) implements sophisticated algorithms, and makes use of Java features such as reflection, native methods, callbacks from the JDK, and communication via side effects. The code was under active development, and all errors were real errors made by developers. We compared the effect of test factoring on the running time of tests run continuously by two developers, and on every CVS check-in:

Test time Time to failure Time to success
Dev. 1 .79 (7.4/9.4 min) 1.56 (14/9 sec) .59 (5.5/9.4 min)
Dev. 2 .99 (14.1/14.3 min) 1.28 (64/50 sec) .77 (11.0/14.3 min)
CVS .09 (0.8/8.8 min) n/a .09 (0.8/8.8 min)

As this table shows, test factoring generally reduces the running time. Test time was reduced significantly during successful runs, by as much as 90% in the case of CVS check-ins. However, use of test factoring actually increased time to failure by several seconds. If, while running a factored test, the code under test makes an unexpected call to the environment, the mock object cannot predict if this is correct behavior, or what the appropriate response should be. In this case, only running the original system test can soundly detect a regression error.

Future challenges

Factored tests can be made effective for failing tests and even more effective for passing tests. We are investigating automated techniques to:

  1. recommend natural divisions of code into loosely coupled and easily testable components, allowing more factored tests to complete.
  2. further constrain the allowable behavior for the code under test, improving the ability of factored tests to detect failures.
  3. provide the factored tests to the user as readable source code, and reduce the entire captured behavior of the tested component down to a simpler set of essential method calls to test, making test failures easier to understand.

[1] David Saff and Michael D. Ernst. Automatic mock object creation for test factoring. In ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE'04), Washington, DC, USA, June, 2004.

[2] David Saff, Shay Artzi, Jeff H. Perkins, and Michael D. Ernst. Automatic test factoring for Java. In ASE 2005: Proceedings of the 21st Annual International Conference on Automated Software Engineering, Long Beach, CA, USA, November, 2005.

vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu