MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Research Abstracts Home

CSAIL Digital Archive

Research Activities

CSAIL Home

horizontal line

Research Abstracts - 2007
horizontal line

horizontal line

Combined Static and Dynamic Mutability Analysis

Shay Artzi, Adam Kiezun, David Glasser & Michael D. Ernst

Introduction

Knowing which method parameters may be mutated during a method's execution is useful for many software engineering tasks (e.g., specification [1], verification [2], compiler optimizations [3], refactoring [4], test generation [5], invariant detection [6], program comprehension [7]). We present an approach to discovering parameter immutability, in which several lightweight, scalable analyses are combined in stages, with each stage refining the overall result. The resulting analysis is scalable and combines the strengths of its component analyses. As one of the component analyses, we present a novel, dynamic mutability analysis and show how its results can be improved by random input generation. Experimental results on programs of up to 185 kLOC demonstrate that, compared to previous approaches, our approach increases both scalability and overall accuracy.

The contributions of this research include:

Applying a staged analysis approach to discovering parameter mutability. Our staged approach is unusual in that it combines static and dynamic stages and it explicitly represents analysis imprecision.
A novel, dynamic mutability analysis that scales well, yields accurate results, and complements existing analyses. We extend the dynamic analysis with random input generation, which improves the analysis results by increasing code coverage.
An extension to a state-of-the-art static analysis [8] that improves both accuracy and scalability, and a simple static analysis that helps to reveal the costs and benefits of more sophisticated algorithms.
An extensive evaluation. We have implemented our framework and analyses for Java, and we investigate the costs and benefits of various techniques, including both our own and that of Salcianu and Rinard [8]. The results yield insight into the design and implementation of program analysis tools, with a particular emphasis on mutability analysis. We find that a well-designed collection of fast, simple analyses can outperform a sophisticated analysis in both scalability and accuracy.

Technique: Analysis Pipeline

Different approaches to the mutability problem have different strengths. For example, a static analysis uses approximations regarding points-to information, whereas a dynamic analysis can observe the actual run-time behavior and points-to relationships. A dynamic analysis, however, is limited to the specific observed execution, which may not cover all of the code or exercise it in all possible ways.

Our insight is that combining different mutability analyses can yield an analysis that has better accuracy than any of its components. In our approach, mutability analyses are combined in stages, forming a "pipeline" (see Figure 1). Each pipeline stage refines the results computed by the previous analysis. An analysis can ignore code that has already been adequately analyzed; this can permit the use of techniques that would be too computationally expensive if applied to an entire program.


Figure 1. Example analysis pipeline

We use the following nomenclature. A formal method parameter is mutable is there is an execution in which the parameter reference is used to modify the object pointed to by the reference. Otherwise, the parameter is immutable. A mutability analysis is i-sound if it never classifies a mutable parameter as immutable. An analysis is m-sound if it never classifies an immutable parameter as mutable. An analysis is complete if it classifies every parameter as either mutable or immutable. The problem of mutability inference is undecidable, so no analysis can be i-sound, m-sound and complete. Our approach permits an analysis to explicitly represent its incompleteness using the 'unknown' classification. Thus, an analysis result classifies parameters into three groups: mutable, immutable, and unknown. In previous work, analyses use only two classifications [11, 12], which causes loss of information by conflating parameters/methods that are known to be mutable with those where the analysis approximations prevent definitive classification.

To serve the different application of mutability analysis, the component analyses can be combined in pipelines with varying degrees of precision and recall. The client of the analysis can create an i-sound analysis by combining only i-sound components (all of our analyses have i-sound variations) in the pipeline. It is also possible to combine i-unsound components in the pipeline to trade precision for recall. In an analysis pipeline, the input to the first stage in the staged analysis is an initial classification of all parameters (e.g., all unknown, with the possible exception of pre-analyzed standard libraries). The output of the last stage is the final classification.

Dynamic Analysis

Our dynamic mutability analysis observes the program's execution and classifies as mutable those method parameters that correspond to actually mutated objects. Conceptually, the dynamic analysis tags each reference (not each object—more than one reference can point to the same object) in the running program with the set of all formal parameters (from any method invocation on the call stack) whose fields were directly or indirectly accessed to obtain the reference. Primitives need not be tagged, as they are all immutable. When a reference is used for modification, all formal parameters from the reference's tag are marked classified as mutable.

The dynamic analysis is m-sound, classifying parameters as mutable only when they are mutated. It is also i-sound by classifying all other parameters as unknown. To improve its recall, the dynamic analysis can use several heuristics that allow it to classify certain parameters as immutable or mutable and to avoid the need to maintain reference tags. Each heuristic carries a different risk of unsoundness but, our experiments show that, in practice, the heuristics do not degrade the quality of results. We have implemented the following heuristics:

Classifying parameters as immutable at the end of the analysis. This heuristic classifies as immutable all (unknown) parameters of methods that were executed during the dynamic analysis and achieved a user-settable line coverage.
Classifying parameters as immutable during the analysis after a pre-specified number of calls.
Using the current mutability classification. When an object is passed to a formal parameter that is already classified as mutable, the object is treated as if it were mutated immediately.
Classifying aliased mutated parameters. This heuristic classifies a parameter p as mutable if the object that p points to changes state, regardless of whether the modification happened through an alias to p or through the reference p itself.

Our dynamic analysis uses an example execution. The execution can be provided by the user or it can be generated. In our implementation we use a generator for random method call sequences [5]. The execution provided by the user may exercise the program in ways in which generated code may not be able to, such as by creating and mutating complex objects. However, dynamic mutability analysis gives results only for the part of the program that is covered by the example execution. By using generated sequences, the analysis can analyze larger parts of the program, or even eliminate the need for a user-provided execution. The sequence generation is performed on-line, together with the dynamic analysis, and the generator is steered towards methods with unknown parameters and methods that have not yet been executed by other dynamic analyses in the pipeline.

Static Analysis

Our approach uses two simple and scalable static analyses: an intra-procedural analysis that classifies as (im)mutable parameters (never) affected by field writes within the procedure itself, and an inter-procedural analysis that propagates mutability information along a graph of dependencies between method parameters. The inter-procedural analysis may be executed at any point in an analysis pipeline and may be run multiple times (interleaving with other analyses). Both analyses rely on an intra-procedural, flow-insensitive pointer analysis that calculates the parameters pointed to by each local variable.

Evaluation

We experimentally evaluated 168 combinations of mutability analyses, comparing the results with each other and with a manually computed (and inspected) optimal classification of parameters. Our results indicate that staged mutability analysis can be accurate, scalable, and useful. We performed our experiments on 6 open-source subject programs (up to 185kLOC). For 2 programs, we manually determined the correct classification (mutable or immutable) for all parameters. For each other program, we randomly chose 5 classes, then manually determined the correct classification for each parameter in those classes. In all, we manually classified over 8600 parameters. Knowing the correct classification allowed us to measure the accuracy of each mutability analysis.

Our results indicate that a staged mutability analysis, combined of static and dynamic phases, can achieve better accuracy than a complex static analysis. The results achieved by our best pipeline are better than those achieved by a state-of-the-art static analysis, JPPA [8] (90.2% vs 27.5% correctly classified parameters on a subject program of 110kLOC). In addition, although 45.9% parameters in the subject program are immutable, JPPA detects only 27.5% (which gives recall 0.6), while our analysis detects 41.4% (which gives recall 0.9). Our best pipeline analysis always finds at least as many immutable parameters as JPPA. Our analysis scales well to large code-bases and runs in about a quarter the time of JPPA.

References:

[1] L. Burdy, Y. Cheon, D. Cok, M. D. Ernst, J. Kiniry, G. T. Leavens, K. R. M. Leino, and E. Poll. An overview of JML tools and applications. STTT, 7(3):212-232, Jun 2005

[2] O. Tkachuk and M. B. Dwyer. Adapting side effects analysis for modular program model checking. In ESEC/FSE, pp. 188-197 Sep 2003.

[3] L. R. Clausen. A Java bytecode optimizer using side-effect analysis. Concurrency: Practice and Experience, 9(11):1031-1045, 1997.

[4] M. Fowler. Refactoring: Improving the Design of Existing Code. Addison-Wesley, 2000.

[5] S. Artzi, M. D. Ernst, A. Kiezun, C. Pacheco, and J. H. Perkins. Finding the needles in the haystack: Generating legal test inputs for object-oriented programs. MIT-CSAIL-TR-2006-056, MIT CSAIL, Sep 2006.

[6] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):99-123, Feb. 2001

[7] J. J. Dolado, M. Harman, M. C. Otero, and L. Hu. An empirical investigation of the influence of a type of side effects on program comprehension. IEEE TSE, 29(7):665-670, July 2003

[8] A. Salcianu and M. C. Rinard. Purity and side-effect analysis for Java programs. In VMCAI, Jan 2005.

[9] A. Rountev. Precise identification of side-effect-free methods in Java. In ICSM, pages 82-91, Sep 2004.

[10] A. Rountev and B. G. Ryder. Points-to and side-effect analyses for programs built with precompiled libraries. In CC, pages 20-36, Apr 2001.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu