Scope-Complete Code AnalysisGreg Dennis & Daniel JacksonIntroductionWe are developing analysis methods that allow a procedure in a conventional object-oriented language to be checked against a rich specification. In common with all automatic analyses of code, we expect to make some compromises. Rather than the traditional compromise of abstraction, which tends to result in large numbers of false alarms, we explore all executions up to a certain size. Like bounded model checking, we restrict the length of execution traces; in addition, we bound the number of heap objects allocated for each type. This approach, being fully concrete, never generates false alarms, but it can fail to discover bugs that require a larger bound for their detection. The bounds on the heap allocations and the length of execution traces are known collectively as the scope. ApproachTo make a procedure amenable to analysis, we first translate it into a formal model. In our approach, we build the models in Alloy [1], a language for writing first-order logical constraints on sets and relations. To solve these constraints, we use an automated tool known as the Alloy Analyzer [4]. To use Alloy, one specifies a system and a property of the system and asks the Analyzer to check the validity of . In response, the Analyzer performs an exhaustive search within the user-provided bounds—a scope-complete analysis—for a counterexample to the implication, i.e. a solution to . The Alloy models we generate encode the procedure as a transition predicate that accurately simulates the procedure's behavior. The formula is true when the procedure relates a pre-state to a post-state . The user separately encodes a predicate such that is true when the pre-state and post-state conform to the procedure's specification. We analyze the transition function against the specification by automatically checking the following formula for validity with the Alloy Analyzer: To construct , we translate a control flow graph (CFG) of the procedure directly into first-order logical constraints in Alloy. The control flow graph contains a node for each atomic statement or conditional test in the code, and the branches represent possible transitions between these statements. The set of states in an execution trace are partitioned into disjoint subsets, one subset for each node in the CFG. We formulate a transition from the pre-state to the post-state as a conjunction of implications—like a case statement—with one implication per node. For a simple example, consider a procedure that accesses variables , , and and consider a node in the CFG at which is assigned . Let be the successor node of in the CFG. Our encoding would generated the following implication for node . This constraint says that if the pre-state is at node , then the value of in the post-state is , the values of and are left unchanged, and the post-state is at node . There is a separate encoding for branching nodes and some optimizations that collapse and contract the CFG that make this more tractable in practice. Case Study: Voting SoftwareOur group is investigating how to develop reliable electronic voting software. To this end, we applied our code analysis technique described above to a simple but realistic tool for tallying votes. The tool, written entirely in Java, accepts as inputs a set of ballots and a set of candidates and returns the number of votes each candidate received along with whether that candidate won, lost, or tied the electoral race in question. Using our encoding, we produced a model of the system's core vote-counting procedure and checked it against a complete specification. In our first analysis, we found a bug. As it turns out, the procedure did not always work properly in the case where some candidates received the same number of votes. Although the system had been tested on real election data, none of the real data included this rare tie scenario, so the bug went undetected. Having corrected the code, we performed the analysis again and found no counterexamples to our assertion that the procedure met the specification. As discussed in "Specification Coverage by Automatic Mutant Generation," we conducted a separate analysis to evaluate the adequacy of the specification in exposing bugs. FutureAlthough we have made much progress towards making this analysis tractable, it still does not scale to large call trees. To address this, we are exploring new ways to infer specifications of loops. We are also considering a combination of widening techniques common to abstract interpretation and methods for specification refinement [3] developed in our own group. Research SupportThis work is supported by the NASA High Dependability Computing Program under cooperative agreement NCC-2-1298 and by grant 0086154 (Design Conformant Software) from the ITR program of the National Science Foundation. References:[1] Daniel Jackson. Automating First-Order Relational Logic. In Proc. ACM SIGSOFT Conf. Foundations of Software Engineering (FSE), November, 2000. [2] Daniel Jackson and Mandana Vaziri. Finding Bugs with a Constraint Solver. ISSTA'00, Portland, OR, 2000. [3] Mana Taghdiri. Inferring Specifications to Detect Errors in Code. In Proc of the 19th International Conference on Automated Software Engineering (ASE), September, 2004. [4] The Alloy Analyzer. http://alloy.mit.edu/ |
||
|