|
An Extensible Compiler and Language for Systems Programming
Russ Cox, Austin Clements & Frans Kaashoek
[joint work with Eddie Kohler and Tom Bergan, UCLA]
Introduction
Systems programmers often need detailed control over the behavior of
their programs, including code generation, thread and I/O scheduling,
memory alignment, and much more. This requires using languages like C
and C++, whose low-level execution model is close to the underlying hardware.
Unfortunately, C and, to a lesser extent, C++ offer abstraction capabilities
that are as low-level as the execution models, leading to hard-to-read
programs and subtle bugs.
We relish the control that C gives us over the run-time behavior of the
program. When we need it, having that control is terribly useful, but
when we don't, the burden of working with such low-level abstractions
is terribly frustrating. To escape from this paradox, we'd like to use
the control that C gives us to build our own high-level abstractions custom-fitted
to the projects we are working on.
We are not the first to point out that changing C can make writing systems
programs easier. A large number of additions to C that have been proposed
and implemented, including: object methods; classes and templates; polymorphism;
large-scale multithreading; template-based dynamic code generation; complex
floating-point numbers; fixed-point numbers, address space tags, and access
to hardware registers; region-based memory management; run-time pointer
checking; program invariant detection; address-space qualifiers; and more.
Most C compilers add small extensions to the language as well: the GNU,
Microsoft, and Plan 9 C compilers all provide small (and mostly incompatible)
extensions [1].
These extensions to C are typically implemented by starting with a copy
of an existing compiler and changing it to add the desired feature. The
result is difficult for others to adopt and use, and the separate compiler
must now be maintained in order for it to continue to be useful. Using
a different compiler for each extension also means that only one extension
can be used at a time. If you want complex floating-point number support
as well as run-time pointer checking, you're out of luck. A different
approach is to write preprocessors, but preprocessor-based extensions
don't combine well either; further, simple preprocessors usually break
language semantics, while complex ones duplicate compiler code, often
with added bugs.
The solution, we believe, is an extensible compiler. An extensible compiler
for a language like C would allow systems programmers to keep necessary
control and at the same time create the high-level abstractions they need.
It would also help systems programmers to create and experiment with new
extensions themselves, leading to programs that are clearer, less buggy,
and easier and quicker to write. Extensions could inspect and modify all
the internal data structures and processes of the compiler: the parse
grammar, the abstract syntax tree, type analyses, type checking, variable
scoping and lifetimes, structure and stack frame layouts, call graphs,
flow graphs and flow analyses, code generation, and more. By using the
fruits of the compiler's labors, simple extensions can effect significant
changes to the language.
Writing systems programs is hard enough. Programmers shouldn't have to
fight the compiler too. Instead, they should be able to adapt the compiler
to fit their project better, by adding new notations, new type systems
or type checking rules, new analyses, and so on. Having an extensible
compiler will make programming faster, easier, less tedious, and more
fun. We're making it happen.
References:
[1] Russ Cox. Extensions and Tools for the C Programming
Language. http://pdos.lcs.mit.edu/~rsc/xcref.html
|
|