MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Research Abstracts Home

CSAIL Digital Archive

Research Activities

CSAIL Home

horizontal line

Research Abstracts - 2007
horizontal line

horizontal line

amock: Automatic Generation of Readable Software Unit Tests from System Tests

David S. Glasser

Introduction

Automated testing is essential for the construction of reliable software. Computer systems are complex enough that even small changes can have far-reaching consequences. Both macroscopic system tests and microscopic unit tests are required to ensure both that changes do not affect the overall operation of the system and that internal interfaces continue to function as expected. However, writing tests can be one of the most tedious parts of the software development process. While most software engineers recognize the importance of automated testing, many projects fail to achieve a desirable level of testing.

Unit tests can be more useful than system tests for two reasons. First, if a developer changes one small part of a program, running the unit tests that exercise just that part can be more efficient than running an entire system test. Secondly, system test failures often only reveal whether or not the test passed; when a system test starts to fail, it is often difficult to find which subsystem the error originated in. Unit tests are more focused, and so when a unit test fails, it is generally easier to tell which module or even method is responsible for the failure.

Test slicing transforms a system test into a suite of unit tests. It is relatively easy to write a system test for a working program — simply run the program and check that the output is as expected. Writing unit tests is much more time consuming, since a program typically has a very large number of individual units (classes, methods, etc.) that would profit from unit testing. The key insight of test slicing is that a system test (which is easy to create) exercises the units of a program in a typical way. A program that observes the effects of a system test on just one submodule can generate a test that repeats those effects without needing to run the rest of the system test.

Our technique, amock, implements test slicing for Java. It takes as input an execution of a passing system test, and creates small unit tests which exercise individual objects in the same way that the system test did. These unit tests are human-readable and rely only on the standard Java testing frameworks JUnit and jMock. Developers can incorporate these generated tests directly into their test suites as regression tests.

Example

Figure 1 shows a subject program and a corresponding generated unit test. The generated test uses the jMock framework to describe the expected interactions between objects.

public class CookieMonster {
    public void eatAllCookies(CookieJar jar) {
        Cookie k;
        for (k = jar.getACookie();
             k != null;
             k = jar.getACookie()) {
            k.eat();
        }
    }
}

public class CookieJar {
    private List<Cookie> myCookies;
    // ...
    public Cookie getACookie() {
        if (myCookies.isEmpty()) {
            return null;
        } else {
            return myCookies.remove(0);
        }
    }
}

public class Bakery {
    public static void main(String[] args) {
        // ...
        CookieJar j = new CookieJar();
        Cookie oatmeal = new OatmealCookie();
        j.add(oatmeal);
        loadMoreCookies(j);
        new CookieMonster().eatAllCookies(j);
    }
    private static void loadMoreCookies(CookieJar j) {
        j.add(new ChocolateCookie());
    }
}

public class AutoCookieMonsterTest extends MockObjectTestCase {
  public void testCookieEating() {
    // Create mocks.
    final CookieJar mockCookieJar = mock(CookieJar.class);
    final OatmealCookie mockOatmealCookie = mock(OatmealCookie.class);
    final ChocolateCookie mockChocolateCookie = mock(ChocolateCookie.class);
    
    // Set up primary object.
    CookieMonster testedCookieMonster = new CookieMonster();
    
    // Set up expectations.
    expects(new InThisOrder() {{
      one (mockCookieJar).getACookie();
      will(returnValue(mockOatmealCookie));
      
      one (mockOatmealCookie).eat();
      
      one (mockCookieJar).getACookie();
      will(returnValue(mockChocolateCookie));
      
      one (mockChocolateCookie).eat();
      
      one (mockCookieJar).getACookie();
      will(returnValue(null));
    }});
    
    // Run the code under test.
    assertThat(testedCookieMonster.eatAllCookies(mockCookieJar),
      is(2)
    );
  }
}

Figure 1. A sample subject program for amock, and a unit test generated by amock for this subject which exercises the CookieMonster class.

The generating test is testing the eatAllCookies method of CookieMonster. It starts by creating mock objects to represent the CookieJar and the two Cookie inside it. It constructs a CookieMonster, and sets up expectations for the interactions between the objects (using jMock's expectation builders). We expect that getACookie will be called three times on the cookie jar. The first two times it will return cookies, which will be eaten; the third time it returns null. Finally, the test actually executes eatAllCookies and asserts that its return value is as expected.

Related work

Many different approaches to automatic unit test generation have been studied. The concept of using a dynamic analysis to automatically convert large tests to small tests has been implemented as test factoring and test carving.

The main goal of test factoring [1] is to allow developers to run a slow system test much more efficiently when only a small part of the system has changed. Test factoring creates a transcript of a long system test or other program execution, and then plays it back in a special instrumented Java run-time environment where all objects other than those of the class under test are mocks following the transcript. Method calls that only involve classes not under test are essentially skipped, leading to a much faster execution of the original test suite which only exercises code from one class. If the class under test attempts to make different method calls to the rest of the system than it did during the original execution, the replay stops and tells the user that the full system test should be run instead; otherwise, the factored test succeeds or fails according to whether the program that it is replaying succeeds or fails.

Test factoring makes the strong claim that if a factored test for the only class whose code has changed passes, then the system test would have passed as well. However, many benign changes to the class can cause test factoring to fail to replay the test. For example, the method under test could call external methods in a slightly different order or with slightly different parameters; test factoring will consider this to be too different to continue the replay, but a human could determine that both versions are acceptable. A developer cannot easily take a transcript made by test factoring and make it less brittle by relaxing these constraints; tests are recorded in a transcript which is not meant for human consumption. Additionally, test factoring only slices up the program state based on class or package names, not based on time or individual object lifetimes: the generated tests consist of replaying an entire system test on the target class. Thus, even if the tests were human-readable, they would be very long; even if a typical use of an instance of the class under test only involves a few method calls, each test includes all of the method calls ever made on any object of that class. Finally, test factoring relies on the ability to instrument all classes (including the JDK system libraries) even just to replay the tests, which makes it non-trivial to integrate into a pre-existing unit testing process. amock avoids these difficulties by generating tests that use only standard test frameworks such as JUnit and jMock.

In test carving [2], during the execution of a long system test, all reachable objects are serializing to disk frequently. Pieces of the long test can then be independently ``played back'' by loading the state before an action, executing that action, and comparing the actual post-state to the serialized post-state. This method is fundamentally state-based, and relies strongly on the internal data structures of the objects not changing significantly. Test carving produces tests which only work in the context of their custom serialization framework — carved tests look nothing like tests that a programmer would have designed by hand, and it is unclear how much information a developer can get about why a carved test failed.

Progress

amock currently exists in a proof of concept status; it supports system tests of the complexity of Figure 1. We expect to expand amock to cover most of Java by the end of Spring 2007.

References:

[1] D. Saff, S. Artzi, J. H. Perkins, and M. D. Ernst. Automatic test factoring for Java. In ASE 2005: Proceedings of the 20th Annual International Conference on Automated Software Engineering, pages 114–123, Long Beach, CA, USA, Nov. 9-11, 2005.

[2] S. Elbaum, H. N. Chin, M. Dwyer, and J. Dokulil. Carving differential unit test cases from system test cases. In Proceedings of the International Symposium Foundations of Software Engineering. ACM, November 2006.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu