CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

An Extensible Compiler and Language for Systems Programming

Russ Cox, Frans Kaashoek & Eddie Kohler

Introduction

Systems programmers often need detailed control over the behavior of their programs, including code generation, thread and I/O scheduling, memory alignment, and much more. This requires using languages like C and C++, whose low-level execution model is close to the underlying hardware. Unfortunately, C and, to a lesser extent, C++ offer abstraction capabilities that are as low-level as the execution models, leading to hard-to-read programs and subtle bugs.

We relish the control that C gives us over the run-time behavior of the program. When we need it, having that control is terribly useful, but when we don't, the burden of working with such low-level abstractions is terribly frustrating. To escape from this paradox, we'd like to use the control that C gives us to build our own high-level abstractions custom-fitted to the projects we are working on.

We are not the first to point out that changing C can make writing systems programs easier. A large number of additions to C that have been proposed and implemented, including: object methods; classes and templates; polymorphism; large-scale multithreading; template-based dynamic code generation; complex floating-point numbers; fixed-point numbers, address space tags, and access to hardware registers; region-based memory management; run-time pointer checking; program invariant detection; address-space qualifiers; and more. Most C compilers add small extensions to the language as well: the GNU, Microsoft, and Plan 9 C compilers all provide small (and mostly incompatible) extensions [1].

These extensions to C are typically implemented by starting with a copy of an existing compiler and changing it to add the desired feature. The result is difficult for others to adopt and use, and the separate compiler must now be maintained in order for it to continue to be useful. Using a different compiler for each extension also means that only one extension can be used at a time. If you want complex floating-point number support as well as run-time pointer checking, you're out of luck. A different approach is to write preprocessors, but preprocessor-based extensions don't combine well either; further, simple preprocessors usually break language semantics, while complex ones duplicate compiler code, often with added bugs.

The solution, we believe, is an extensible compiler. An extensible compiler for a language like C would allow systems programmers to keep necessary control and at the same time create the high-level abstractions they need. It would also help systems programmers to create and experiment with new extensions themselves, leading to programs that are clearer, less buggy, and easier and quicker to write. Extensions could inspect and modify all the internal data structures and processes of the compiler: the parse grammar, the abstract syntax tree, type analyses, type checking, variable scoping and lifetimes, structure and stack frame layouts, call graphs, flow graphs and flow analyses, code generation, and more. By using the fruits of the compiler's labors, simple extensions can effect significant changes to the language.

Writing systems programs is hard enough. Programmers shouldn't have to fight the compiler too. Instead, they should be able to adapt the compiler to fit their project better, by adding new notations, new type systems or type checking rules, new analyses, and so on. Having an extensible compiler will make programming faster, easier, less tedious, and more fun. We're looking forward to it.

Project Status

As of Spring 2005, we are in the process of building and experimenting with the extensible compiler.

Research Support

Russ Cox is supported by a Fellowship from the Fannie and John Hertz Foundation.

References:

[1] Russ Cox. Extensions and Tools for the C Programming Language. http://pdos.lcs.mit.edu/~rsc/xcref.html

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)