Abstracts - 2007
Quantitative Information-Flow Tracking for Type-Unsafe Languages
Stephen McCamant & Michael D. Ernst
The goal of information-flow security is to enforce limits on the dissemination of information. For instance, a confidentiality property requires that a program that is entrusted with secrets should not "leak" those secrets into public outputs. Absolute information-flow properties, such as non-interference (a guarantee that a system's public behavior is the same no matter what its secret inputs), are rarely satisfied by real programs: generally some information flow exists, and system designers, implementers, and testers wish to distinguish acceptable from unacceptable flows. Our goal is to apply quantitative information-flow tracking to preexisting real systems, by dynamically measuring the number of bits of information that may be leaked by a specific execution of a program.
The key technical challenge in performing such tracking is to follow information not just as it is stored and transferred, but as it is used and transformed by potentially complex calculations. For instance, a program might use secret information in a branch condition or a pointer, indirectly revealing information via an implicit flow. One way to ensure confidentiality in programs is to use an information-flow type system, which annotates each variable with the owner of the information it might hold . However, such type systems have so far been too cumbersome to use for large applications, and they are not applicable at all to programs written in type-unsafe languages such as C and C++. By contrast, we use a dynamic analysis that tracks information flows in a particular program execution. With a dynamic analysis, our tool can also provide more than a yes/no assessment of whether information might have been revealed: it gives a numeric upper bound in the number of bits of secret information revealed.
Our technique is built upon tainting: it traces the flow of data through a program execution at single-bit granularity, tracking which bits are derived directly or indirectly from secret information. Under a tainting rule, an output bit of a basic operation is secret if it could have been affected by a secret input bit. To account for implicit flows, our technique maintains a count that bounds the amount of information that may have been leaked indirectly: whenever a branch depends on secret data, the global counter is incremented, and similarly for writes through secret pointers. Our tool reports, as a total bound on leakage, the sum of the global counter plus the number of tainted bits that are output.
Two types of program annotation can make the computed bound more precise (that is, smaller). Both types of annotation are untrusted: they improve the tool's results if used well, but they cannot cause it to underestimate the amount of information leaked.
Figure 1. The propagation of secrets in a program represented as a data-flow network. Certain inputs to a program (top row boxes) are secret (shaded). Via the results of intermediate computations (ovals), the inputs affect the values of some outputs (bottom row boxes). The amount of information about the inputs that can be recovered from the outputs is bounded by the number of affected outputs. However, it can be much less if, as shown on the left, there is a flow bottleneck in the intermediate processing; this is characteristic of a secure program. To more accurately estimate the amount of information transmitted, count the information as if it were revealed at the point where it has the most compact representation (shown on the right).
As a simple example of how such information-flow tracking might be used, consider a client application for a networked version of the children's game Battleship, shown in Figure 2. Each player secretly chooses locations for four rectangular ships on a grid representing the ocean, and then they take turns firing shots at locations on the other player's board. The player is notified whether each shot is a hit or a miss, and a player wins when he or she has shot all of the squares of all of the opponent's ships. When playing the game, one would like to know how much information about the layout of one's board is revealed in the network messages to the other player. For instance, in response to each shot fired by the opponent, the program must disclose whether it is a hit or a miss, but that should reveal only one bit.
Figure 2. kbattleship, a program from the KDE desktop environment for playing a game of Battleship with an opponent over the Internet. Our technique can be used to check on each round of the game how much information about the secret ship locations on the player's board is revealed. In the version pictured, the program reveals more information than it should.
Our tool can be used to test this information-flow security policy about the Battleship game client. We annotate the program inputs specifying the locations of the player's ships as secret, and the goal is to estimate how many bits of this information are transmitted over the network socket. The locations of the ships are stored in a relatively complex data structure (the KShipList class), but the details of its internal processing are not important, because the accesses used to generate the network reply are encapsulated in the return value of the single method shipTypeAt. If shipTypeAt returned a boolean value, indicating whether or not the grid square was occupied by a ship, our technique could then verify that this one bit was the only part of the network message that depended on the secret board data. However, in the version (3.3.2) of kbattleship we have examined, shipTypeAt returns not a boolean but an integer code describing the type of ship at the queried location, and this code is passed unmodified in the network message. Though the standard client does not display this extra information, an opponent with a modified game program could use it to unfairly infer the state of adjacent locations. When we informed them of this problem, the developers of kbattleship agreed it was a bug, and it is fixed in version 3.5.3.
We have implemented the technique described above in a tool based on Valgrind, a framework for debugging tools. Using our implementation, we have checked different security properties in each of four other open-source applications. OpenSSH is the most commonly used remote-login application on Unix systems -- we checked that an RSA-based client authentication does not reveal the secret key. OpenGroupware.org is a web-based system for collaboration between users in an enterprise -- we checked how much information about a user's calendar its appointment-scheduling feature revealed. The eVACS system is a client-server implementation of electronic voting -- we checked that the client software does not leak a barcode value used for authentication. In the X Window System commonly used on Unix, a single program called the X server manages the display hardware, and each program (X client) that wishes to display windows communicates with the server over a socket. We used our tool to detect when secret information sent to the server by one client is leaked to another, via cut and paste, or to a third party, via a buffer-overflow code injection attack.
This research is supported by DARPA and a National Defense Science and Engineering Graduate Fellowship.
 Stephen McCamant and Michael D. Ernst. Quantitative Information-Flow Tracking for C and Related Languages MIT Computer Science and Artificial Intelligence Laboratory technical report MIT-CSAIL-TR-2006-076, Cambridge, MA, USA, November 2006.
 Andrew C. Myers. JFlow: Practical mostly-static information flow control. In The Proceedings of the 26th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 99), pp. 228–241, San Antonio, Texas, USA, January 1999.