CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Inferring Specifications to Detect Errors in Code

Mana Taghdiri & Daniel Jackson

Introduction

In recent years, finite state verification techniques have been shown to be effective in finding bugs in programs. Since they do not rely on small test suites, they can be used to uncover unexpected or subtle errors. Software model checkers, for example, typically work by extracting a state machine from the code and exploring the state space. They can automatically find bugs in large programs without requiring the user to provide any annotations. They do not, however, exploit the abstraction boundaries defined by procedures. This is odd, since the modularization of the code into procedures was presumably chosen in order to make reasoning easier.

In contrast, more traditional program verification approaches made extensive use of the program structure. Each procedure would be checked against its specification, using specifications of the called procedures as surrogates for their code. Automating these approaches has motivated several tools. However, they require the user to provide specifications for the called procedures or the procedures are inlined. Providing specifications is a significant burden on users which limits the applicability of a tool. On the other hand, inlining all procedure calls does not scale to large programs.

The aim of our project is to provide a scalable method to find errors in programs by exploiting the procedural structure of the code. We use procedure specifications at call sites. The specifications however, are not provided by the user; they are inferred from the code automatically as needed.

Approach

We check Java programs against user-provided properties expressed in Alloy, a relational first order logic. We particularly target structural properties, i.e., properties that constrain the configuration of the heap. The code is analyzed in a bounded heap. That is, the user provides the number of objects to be considered for each type in the heap. Therefore, although a counterexample returned by our tool is a valid counterexample, the absence of a counterexample does not constitute proof.

We provide an analysis framework that is based on a technique known as "counterexample guided abstraction refinement". The steps of the analysis are as follows:

(1) Abstraction: The procedure being checked is translated to a set of logical constraints that preserve the semantics of the procedure except at its call sites. All procedure calls are replaced with some initial approximate specifications so that the abstract program is an over-approximation of the original code.

(2) Solving: The generated abstraction along with the negation of the property are handed to a constraint solver. If no solution is found, the analysis terminates with no counterexamples.

(3) Validity Check: If a counterexample is found, it must be checked for validity. Each procedure call is checked for consistency with the found counterexample again using a constraint solver. If all procedure calls are consistent, the counterexample denotes a fault and the analysis terminates.

(4) Refinement: If the counterexample is invalid, i.e., a procedure call is inconsistent with the found counterexample, a new partial specification is inferred for the inconsistent call. The specification is extracted from a proof of invalidity generated by the constraint solver and rules out the invalid counterexample. This specification is then conjoined with the old specification of the procedure to form a more precise specification. The process then starts over at the solving phase.

Our implementation of this framework translates Java programs to boolean constraints using a technique previously used in Jalloy[2]. That is, the code is first translated to Alloy and then to boolean constraints represented in conjunctive normal form. The ZChaff SAT solver is then used to analyze the generated boolean formula. New specifications are then inferred from a proof of invalidity that ZChaff generates, using a facility called unsatisfiable core.

Since the refinement phase is performed on demand, the specifications we infer are context-dependent. That is, they depend on both the context in which a procedure is called, and the property being checked. This ensures that only as much information about a procedure is used as is necessary.

Progress

We have implemented a prototype tool to run some experiments. The experiments include checking some programs that manipulate the objects in the heap against some partial specifications. We compared our analysis results with a method that uses the same backend and the same translation except that it inlines all procedure calls. Current experiments show that the procedure abstraction technique considerably improves the analysis time by (1) reducing the translation time, and (2) generating smaller boolean formulas that can be checked faster. Detailed description of this work can be found elsewhere[1].

Future

In our current implementation, the initial specifications only express the frame conditions of procedures. A more precise initial specification might reduce the number of refinements needed and thus improve the scalability of the method. Techniques to obtain such specifications are under investigation.

Furthermore, the current prototype tool will be improved so that it can handle a larger subset of Java. We will perform more experiments to evaluate our method when checking larger programs with deeply nested procedure calls.

Research Support

This project is supported by the grant No. 0086154 from the ITR program of the National Science Foundation.

References:

[1] Mana Taghdiri. Inferring specifications to detect errors in code. In International Conference on Automated Software Engineering (ASE), Linz, Austria, 2004.

[2] Daniel Jackson, Mandana Vaziri. Finding bugs with a constraint solver. In International Symposium on Software Testing and Analysis (ISSTA), Portland, OR, 2000.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)