CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Continuous Testing of Software During Development

David Saff, Kevin Chevalier, Michael Bridge & Michael D. Ernst

The Problem: Wasted Time

If a developer has a regression test suite available during development or maintenance, but does not run it often, an opportunity is lost to catch regression errors early. The longer a regression error persists without being caught, the longer it may take to find the source of the error and correct the faulty code and any dependent code. However, running the suite has a cost: remembering to run the tests, waiting for them to complete, and returning to the task at hand distract from development.

The Solution: Continuous Testing

Continuous testing uses excess cycles on a developer's workstation or nearby computers to continuously run regression tests in the background as the developer edits code. It provides developers rapid feedback regarding errors that they have inadvertently introduced, with the goal of improving both developer productivity and software quality. The developer no longer has to consider when to run the tests, and errors are caught more quickly, especially those that the developer had no cause to suspect. Continuous testing is inspired by continuous compilation, a feature of many modern development environments that gives rapid feedback about compilation errors. We have performed evaluations of the productivity impact of continuous testing and developed a a production-quality implementation for use in Java development. We are now examining ways to improve the performance of test execution and limit any negative impacts on developer productivity.

Productivity Impact

Testing techniques are often evaluated against released versions of software, or daily checkpoints in a source repository such as CVS. Continuous testing, which is in operation throughout a developer's work day, required new methods for evaluating its effectiveness. We first implemented our own monitoring framework to capture the behavior of developers working without continuous testing, and then simulated the effect of "replaying" the captured history with continuous testing in effect.

We measured [1] real development projects to estimate wasted time, consisting of preventable extra fixing cost added to the time spent running tests and waiting for them to complete. In the monitored projects, wasted time accounted for 10-15% of total development time (programming and debugging). We also developed a model of developer behavior, from which we inferred the ignorance time (from introduction to discovery) and the fix time (from discovery to fix) for each error. We found that ignorance time and fix time are correlated, and concluded that reducing ignorance time should reduce fix time. For the monitored projects, a continuous testing tool could reduce wasted time by 92-98%, which is more effective than test prioritization or more frequent manual testing.

We then implemented a usable implementation of continuous testing, which was used in a controlled experiment [2] comparing three groups of student developers: one provided with continuous testing, one provided only with continuous compilation, and one provided with no asynchronous notification of any type of development error. All participants used Emacs as their Java development environment. Student developers using continuous testing were three times more likely (78%) than the control group (27%) to complete two different one-week programming assignments (which were part of their normal coursework). These statistically significant effects are due to continuous testing: they could not be explained by other incidental features of the experimental setup, such as time worked, regular testing, or differences in experience or tool preference. Students with continuous compilation were twice as likely (50%) to complete the task as those without; this is the first empirical evidence of continuous compilation's effectiveness, to our knowledge.

A majority of users of continuous testing had positive impressions, saying that it pointed their attention to problems they would have overlooked, and it helped them produce correct answers faster and write better code. Staff said that students quickly built an intuitive approach to using the additional features. 94% of users said that they intended to use the tool on coursework after the study, and 90% would recommend the tool to others. Few users found the feedback distracting, and no negative effects on productivity were observed.

Implementation and End-user Adoption

Encouraged by the results, we built and released a full-featured implementation of continuous testing as a plug-in [3] for the Eclipse integrated development environment, which is now being actively maintained, and has an active user community. Our hope is that some of the features from this plug-in can be migrated into the base Eclipse distribution to be widely available to many more developers.

The most widely cited concern among developers is that the processing load of continuously running tests will drag down the workstation on which they are trying to also simultaneously get work done. We are extending the test execution framework to address these concerns in several different ways: by running the tests as a low-priority process, running them on a remote machine, or reducing classloading latency by using Java's "hotswapping" feature to replace changed classes in an already-running virtual machine.

Future Challenges

Continuous testing has seen the most use among developers who are already committed to the idea of having small, fast-running tests that are executed frequently. We believe that transforming a test suite through test factoring will further enhance the reach of continuous testing, making it applicable to regression tests that are long-running, or that require expensive resources or human intervention.

Secondly, our Eclipse plug-in has a flexible, pluggable selection and prioritization framework. We are encouraging researchers in the fields of test selection and prioritization to adapt their algorithms to integrate with continuous testing.

References

[1] David Saff and Michael D. Ernst. Reducing wasted development time via continuous testing. In Fourteenth International Symposium on Software Reliability Engineering (ISSRE 2003), pp. 281--292, Denver, CO. November 2003.

[2] David Saff and Michael D. Ernst. An experimental evaluation of continuous testing during development. In International Symposium on Software Testing and Analysis (ISSTA 2004), pp. 76--85, Boston, MA. July 2004.

[3] David Saff and Michael D. Ernst. Continuous testing in Eclipse. In 2nd Eclipse Technology Exchange Workshop (eTX), Barcelona, Spain. March 2004.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)