CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Dynamic Variable Comparability Analysis

Jeff H. Perkins, Philip J. Guo, Stephen McCamant & Michael D. Ernst

What

Variable comparability analysis determines whether two variables can sensibly be compared to one another, because they have the same abstract type: they hold the same kind of data.

Languages like C, C++, and Java provide programmers with only a few basic declared types (int, float, etc.).

Thus, programmers often use a single declared type (such as int) to hold data of multiple distinct abstract types. Consider the following simple example:

int main() {
  int dist = 3000;
  int itemPrice = 50;
  int taxPrice = 3;
  totalCost (dist, itemPrice, taxPrice);
}

int totalCost (int d, int base, int tax) {
  int year = 2005;
  if (d > 1000) {
    int shippingFee = 10;
    return base + tax + shippingFee;
  }
  else {
    return base + tax;
  }
}

int main() {
  Distance dist = 3000;
  Money itemPrice = 50;
  Money taxPrice = 3;
  totalCost (dist, itemPrice, taxPrice);
}

Money totalCost (Distance d, Money base, Money tax) {
  Year year = 2005;
  if (d > 1000) {
    Money shippingFee = 10;
    return base + tax + shippingFee;
  }
  else {
    return base + tax;
  }
}

The three variables in main() all have the same declared type, int, but two of them hold related quantities (amounts of money) as can be determined by the fact that the program adds them together within totalCost(), whereas the other holds a conceptually distinct quantity (a distance).

Thus itemPrice and taxPrice have the same abstract type (and are comparable) while dist has a separate abstract type.

A variable comparability analysis automatically determines when two variables hold values that are of the same abstract type without the need for manual annotations by the programmer. Given a program like the example on the left, it infers abstract types to denote sets of comparable variables. The example on the right shows the more precise types that the analysis infers, in the form of type annotations in a hypothetical language.

Why

Variable comparability information is valuable for both improving human understanding and assisting other program analysis tools. It can be useful to a programmer in understanding the abstract types in a program and locating sites of possible references to values. It can help detect violations of the programmer's intent (when unexpected variables are found to have the same abstract type). It can also aid other program analysis tools that look for relationships between variables. One specific application of this work is to improve the performance and results of the Daikon dynamic invariant detection system [2]. Daikon analyzes program value traces to infer properties that hold over all observed executions. Without comparability information, Daikon attempts to infer invariants over all sets of variables with the same declared type (e.g. int), which is expensive and produces invariants that are not meaningful. In the above example, Daikon may infer that d > tax; while this property is true, it is not likely to be interesting because it makes no sense to compare a distance and an amount of money. Comparability information indicates which pairs of variables should be analyzed for potential invariants, which both improves Daikon's performance and helps it produce more relevant output.

How

A previous approach [1] performed a static analysis to determine when two variables hold the same abstract type. Unfortunately, static analysis has several drawbacks. It does not scale easily to large or complex programs. It is also inherently imprecise; for instance, static analyses are generally unable to distinguish values that have been stored in and then retrieved from a single array.

Our approach is to use dynamic analysis to track values at runtime. Dynamic analysis should scale well, because its running time is closely related to the program's running time, and it can avoid many language complexities that exist only at the source-code level. It also has the potential to be more precise because it need not apply approximations of run-time behavior, but can observe actual behavior.

Our analysis gives two values the same abstract type if they interact via program operations such as arithmetic and assignment. The analysis associates a fresh abstract type with each new value; this is implemented as a unique identifier that tags the value. (For a primitive representation type such as int, new values are dynamic instances of literals, values read from a file, etc. Use of a fresh abstract type for each dynamic instance of a literal, and propagation of abstract types along with values through procedure calls, provide perfect context-sensitivity.) Each operation on two values unifies their abstract types using an efficient union-find data structure. Two variables have the same abstract types if their values do; whether two variables have the same abstract type can change from moment to moment in the program as assignments and other operations are performed.

The analysis is intended to report, for any pair of variables at a given program point (typically procedure entries and exits), whether those variables ever held values of the same abstract type at that program point. The abstract type information that is maintained for values must be integrated with variable information each time a program point is executed. In order to accommodate this, the analysis keeps a second variable-based set of abstract type information (independently for each program point) and merges the value-based information into that data structure at each execution of the program point.

Progress

We have completed a design and instrumentation infrastructure for both Java and C/C++ programs. We are currently in the process of implementing tools to perform this analysis.

Research Support

This research is supported by DARPA contract FA8750-04-2-0254, NSF grants CCR-0133580 and CCR-0234651, the Deshpande Center, NTT, IBM, and the Oxygen project.

References:

[1] R. O Callahan and D. Jackson. Lackwit: A program understanding tool based on type inference. In Proceedings of the 19th International Conference on Software Engineering, pages 338-348, Boston, Massachusetts, May 1997.

[2] M. D. Ernst, A. Czeisler, W. G. Griswold, and D. Notkin. Quickly detecting relevant program invariants. In ICSE 2000, Proceedings of the 22nd International Conference on Software Engineering, pages 449-458, Limerick, Ireland, June 2000.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)