CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Annotation-less Unit Type Inference for C

Philip J. Guo & Stephen McCamant

Introduction

For programs that operate on physical quantities, an important check of correctness is that results should have the correct units: a value whose unit is incorrect may be either a cause or a sign of a serious program bug. Despite their widely-recognized value, systems for annotating programs with unit information (also called unit types) are cumbersome to use, especially because many annotations are required. To make it more practical to apply unit types to programs that do not already have them, we have developed a new algorithm that can infer a complete set of unit types for a program without the aid of programmer annotations. The unit types inferred in this way are in a sense maximally general, subsuming any more specific units the programmer may have intended. A developer can give the inferred names to match their intended use with much less effort than would be needed to annotate a program from scratch.

Technical Approach

Like several other systems for unit type inference, of which the concurrently-developed Osprey [1] is especially similar, our tool uses a constraint-based algorithm. Relationships that must hold between the units of related variables (for instance, two values that are added together must have the same units) are represented as linear constraints (with multiplication and division of units represented by addition and subtraction). The tool then uses linear algebra to find a solution to the constraints, which represents a consistent set of unit types for the program. In previous systems, the set of basic dimensions (such as length, mass, and time) was either fixed, or specified using program annotations. In our approach, by contrast, the programmer-intended set of basic dimensions is unknown; instead the tool infers a most-general set of basic dimensions in which every program unit can be expressed. Rather than solving for variables that represent the coefficient of each basic dimension in each unit type, our tool simplifies the constraints to express as many variables as possible as linear combinations of a variables from a linearly independent basis set, which are taken as the basic dimensions.

Once the tool has determined a most-general system of basic dimensions, a human-readable unit type for all of the variables in the program can be constructed by giving the name for each most-general unit in the programmer-desired unit system. Our tool provides an interactive interface for this process, in which units the programmer supplies for one variable are automatically propagated to other related variables. The number of programmer inputs required is at most the number of basic dimensions in the most-general set of dimensions, which is generally much smaller than the total number of variables in a program.

Experience
Program Description LOC # variables # basic dimensions
quasi Draw Penrose tiling 491 219 3
schd Animate Schrödinger equation 633 452 34
bike-power Compute cycling caloric expenditure 680 273 29
demunck Simulate EEG propagation 1005 923 22
starexp Calculate stellar evolution parameters 4974 3517 197
oggenc Perceptually encode audio in frequency domain 58413 11045 163

To evaluate our tool, we applied it to a number of small to moderate-sized C programs that operate on physical quantities, shown in the table above. Most of the programs were found by searching the web for pages which contained the word printf and the name of an SI unit such as "watt"; oggenc is an open-source audio compression tool we have used in previous research, which uses a number of units of time, amplitude, and frequency. In each case, we observed that the number of basic dimensions found by our tool was much smaller (by a factor of 10-100) than the number of program variables; a similar improvement factor would relate the effort required to provide unit annotations with our tool compared to annotating each program variable one by one.

Research Support

This research is supported by DARPA and a National Defense Science and Engineering Graduate Fellowship.

References:

[1] Lingxiao Jiang and Zhendong Su. Opsrey: A practical type system or validating dimensional unit correctness in C programs. In The Proceedings of the 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, May 2006.

[2] Philip Guo and Stephen McCamant. Annotation-less unit type inference for C. December 2005.

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu