Abstracts - 2006
Annotation-less Unit Type Inference for C
Philip J. Guo & Stephen McCamant
For programs that operate on physical quantities, an important check of correctness is that results should have the correct units: a value whose unit is incorrect may be either a cause or a sign of a serious program bug. Despite their widely-recognized value, systems for annotating programs with unit information (also called unit types) are cumbersome to use, especially because many annotations are required. To make it more practical to apply unit types to programs that do not already have them, we have developed a new algorithm that can infer a complete set of unit types for a program without the aid of programmer annotations. The unit types inferred in this way are in a sense maximally general, subsuming any more specific units the programmer may have intended. A developer can give the inferred names to match their intended use with much less effort than would be needed to annotate a program from scratch.
Like several other systems for unit type inference, of which the concurrently-developed Osprey  is especially similar, our tool uses a constraint-based algorithm. Relationships that must hold between the units of related variables (for instance, two values that are added together must have the same units) are represented as linear constraints (with multiplication and division of units represented by addition and subtraction). The tool then uses linear algebra to find a solution to the constraints, which represents a consistent set of unit types for the program. In previous systems, the set of basic dimensions (such as length, mass, and time) was either fixed, or specified using program annotations. In our approach, by contrast, the programmer-intended set of basic dimensions is unknown; instead the tool infers a most-general set of basic dimensions in which every program unit can be expressed. Rather than solving for variables that represent the coefficient of each basic dimension in each unit type, our tool simplifies the constraints to express as many variables as possible as linear combinations of a variables from a linearly independent basis set, which are taken as the basic dimensions.
Once the tool has determined a most-general system of basic dimensions, a human-readable unit type for all of the variables in the program can be constructed by giving the name for each most-general unit in the programmer-desired unit system. Our tool provides an interactive interface for this process, in which units the programmer supplies for one variable are automatically propagated to other related variables. The number of programmer inputs required is at most the number of basic dimensions in the most-general set of dimensions, which is generally much smaller than the total number of variables in a program.
To evaluate our tool, we applied it to a number of small to moderate-sized C programs that operate on physical quantities, shown in the table above. Most of the programs were found by searching the web for pages which contained the word printf and the name of an SI unit such as "watt"; oggenc is an open-source audio compression tool we have used in previous research, which uses a number of units of time, amplitude, and frequency. In each case, we observed that the number of basic dimensions found by our tool was much smaller (by a factor of 10-100) than the number of program variables; a similar improvement factor would relate the effort required to provide unit annotations with our tool compared to annotating each program variable one by one.
This research is supported by DARPA and a National Defense Science and Engineering Graduate Fellowship.
 Lingxiao Jiang and Zhendong Su. Opsrey: A practical type system or validating dimensional unit correctness in C programs. In The Proceedings of the 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, May 2006.
 Philip Guo and Stephen McCamant. Annotation-less unit type inference for C. December 2005.