CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Scope-Complete Code Analysis

Greg Dennis & Daniel Jackson

Introduction

We are developing analysis methods that allow a procedure in a conventional object-oriented language to be checked against a rich specification. In common with all automatic analyses of code, we expect to make some compromises. Rather than the traditional compromise of abstraction, which tends to result in large numbers of false alarms, we explore all executions up to a certain size. Like bounded model checking, we restrict the length of execution traces; in addition, we bound the number of heap objects allocated for each type. This approach, being fully concrete, never generates false alarms, but it can fail to discover bugs that require a larger bound for their detection. The bounds on the heap allocations and the length of execution traces are known collectively as the scope.

Approach

To make a procedure amenable to analysis, we first translate it into a formal model. In our approach, we build the models in Alloy [1], a language for writing first-order logical constraints on sets and relations. To solve these constraints, we use an automated tool known as the Alloy Analyzer [4]. To use Alloy, one specifies a system $S$ and a property $P$ of the system and asks the Analyzer to check the validity of $S ⇒ P$ . In response, the Analyzer performs an exhaustive search within the user-provided bounds—a scope-complete analysis—for a counterexample to the implication, i.e. a solution to $S ∧ ¬P$ .

The Alloy models we generate encode the procedure as a transition predicate $τ$ that accurately simulates the procedure's behavior. The formula $τ(s, s′)$ is true when the procedure relates a pre-state $s$ to a post-state $s′$ . The user separately encodes a predicate $ψ$ such that $ψ(s,
s′)$ is true when the pre-state $s$ and post-state $s′$ conform to the procedure's specification. We analyze the transition function $τ$ against the specification $ψ$ by automatically checking the following formula for validity with the Alloy Analyzer:

∀ s, s′ . τ(s, s′) ⇒ 
        ψ(s, s′)

To construct $τ$ , we translate a control flow graph (CFG) of the procedure directly into first-order logical constraints in Alloy. The control flow graph contains a node for each atomic statement or conditional test in the code, and the branches represent possible transitions between these statements. The set of states in an execution trace are partitioned into disjoint subsets, one subset for each node in the CFG. We formulate a transition from the pre-state $s$ to the post-state $s′$ as a conjunction of implications—like a case statement—with one implication per node.

For a simple example, consider a procedure that accesses variables $x$ , $y$ , and $z$ and consider a node $N$ in the CFG at which $x$ is assigned $e$ . Let $N′$ be the successor node of $N$ in the CFG. Our encoding would generated the following implication for node $N$ .

s ∈ N ⇒ (s′.x = e ∧ s′.y 
        = s.y ∧ s′.z = s.z ∧ s′ ∈ N′)

This constraint says that if the pre-state $s$ is at node $N$ , then the value of $x$ in the post-state is $e$ , the values of $y$ and $z$ are left unchanged, and the post-state $s′$ is at node $N′$ . There is a separate encoding for branching nodes and some optimizations that collapse and contract the CFG that make this more tractable in practice.

Case Study: Voting Software

Our group is investigating how to develop reliable electronic voting software. To this end, we applied our code analysis technique described above to a simple but realistic tool for tallying votes. The tool, written entirely in Java, accepts as inputs a set of ballots and a set of candidates and returns the number of votes each candidate received along with whether that candidate won, lost, or tied the electoral race in question.

Using our encoding, we produced a model of the system's core vote-counting procedure and checked it against a complete specification. In our first analysis, we found a bug. As it turns out, the procedure did not always work properly in the case where some candidates received the same number of votes. Although the system had been tested on real election data, none of the real data included this rare tie scenario, so the bug went undetected.

Having corrected the code, we performed the analysis again and found no counterexamples to our assertion that the procedure met the specification. As discussed in "Specification Coverage by Automatic Mutant Generation," we conducted a separate analysis to evaluate the adequacy of the specification in exposing bugs.

Future

Although we have made much progress towards making this analysis tractable, it still does not scale to large call trees. To address this, we are exploring new ways to infer specifications of loops. We are also considering a combination of widening techniques common to abstract interpretation and methods for specification refinement [3] developed in our own group.

Research Support

This work is supported by the NASA High Dependability Computing Program under cooperative agreement NCC-2-1298 and by grant 0086154 (Design Conformant Software) from the ITR program of the National Science Foundation.

References:

[1] Daniel Jackson. Automating First-Order Relational Logic. In Proc. ACM SIGSOFT Conf. Foundations of Software Engineering (FSE), November, 2000.

[2] Daniel Jackson and Mandana Vaziri. Finding Bugs with a Constraint Solver. ISSTA'00, Portland, OR, 2000.

[3] Mana Taghdiri. Inferring Specifications to Detect Errors in Code. In Proc of the 19th International Conference on Automated Software Engineering (ASE), September, 2004.

[4] The Alloy Analyzer. http://alloy.mit.edu/

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)