CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

The InfiniT Stored-Processor Computer

Heidi Pan & Krste Asanovic

Motivation

To ensure program correctness in multiprogramming, synchronization mechanisms are needed to enforce data dependencies and timing constraints between the various threads. There are two main approaches to synchronization: locks and transactions. Both can be broken down into three components: (1) acquiring the right to synchronize (lock or commit), (2) releasing the right to synchronize (unlock or clean up), and (3) waiting for the right to synchronize. In conventional multiprocessors, hardware support is provided for the acquisition and release components in the form of atomic operations, but not for the wait component. The software must constantly poll memory to determine whether the release has occurred. This is highly inefficient, as the processors should be doing useful work or sleeping instead of spinning. Furthermore, since synchronization mechanisms are so expensive, programmers must deploy them frugally, making them difficult to reason about and often leading to unwanted race conditions or deadlock.

We introduce the stored-processor architecture, which provides hardware support for the waiting mechanism. By facilitating direct communication between the threads, the waiting thread can efficiently suspend until it is woken up by the releasing thread.

Processor Design

In the stored-processor architecture, each 'processor' is just a data structure residing in globally accessible memory, encoding state such as the instruction pointer and general-purpose registers. In contrast to the conventional multiprocessor, there is no one-to-one correspondence between a processor and a physical processing engine (see Figure 1). The software dynamically creates as many processors as needed, up to an 'infinite' number of threads, hence the name InfiniT.

Figure 1. Conventional Multiprocessor vs. Store-Processor.

Since the entire parallel execution state is in memory, processors can view and change each other's shared state, just as they handle shared data in a conventional multiprocessor. The physical processing engines are augmented with specialized caches to store the processor state and active set, which are snooped by the underlying cache coherence mechanism. A processor can inactivate or migrate another processor by simply writing to the appropriate memory location.

Synchronization: Owner Pointers

InfiniT provides each word in memory with an associated owner pointer, which points to the processor owning that word (Figure 2). A multitude of lightweight synchronization mechanisms can be built on top of these owner pointers, which can be used to dynamically implement wait queues and provide an efficient mechanism to suspend and wake up threads.

Figure 2. Owner Pointers.

For example, owner pointers can be used to support producer-consumer synchronization, similar to the full-empty bit mechanism. The InfiniT ISA is extended to mark the respective store and load as the rendezvous point. If the producer arrives at the rendezvous first, it takes ownership, stores the value, then suspends. Upon its arrival, the consumer reads the value then wakes up the producer, who relinquishes the ownership. On the other hand, if the consumer arrives first, it takes ownership and suspends. When the producer arrives, it stores the value then wakes up the consumer to read the value.

InfiniT also supports unbounded transactional memory [1]. A transaction is a sequence of instructions that either commits or aborts. Once a transaction commits, all of its instructions appear to have executed atomically, but no programmer-visible state is altered if a transaction is aborted. The InfiniT ISA is extended to delineate the start and finish of a transaction. Upon the start of a transaction, the initial processor state is copied to a log, and the processor's transaction state is set to 'pending.' A load claims ownership and records the touched address in the log; a store claims ownership, records the address and the old value, and updates the memory. If the transaction completes without conflict, the processor switches to the 'committed' state and begins to reset owner pointers. Upon conflict between two pending transactions, a deadlock-free heuristic chooses the winner; a pending transaction always yields to a committed or aborted transaction. The losing processor places itself on the waiting queue of the winning processor, enters an 'aborted' state, and begins revoking ownership and restoring old values. When a processor finishes committing or aborting, it wakes up any queued processors.

Protection

Since both private and shared processor state resides in global memory, InfiniT needs efficient fine-grained memory protection between the different processors. We plan to apply the Mondriaan memory protection scheme [2], which provides an efficient, compressed permissions trie structure held in main memory, and a special compressed permissions cache in the processing engine.

Simulator

InfiniT is a radical new architecture with a large unexplored design space, and an entirely new system stack: applications, languages, compiler, runtime libraries, ISA, and micro-architecture. For our initial studies, we avoid redefining the ISA and implementing a new compiler backend. Instead, we use C/C++ with InfiniT extensions as our abstraction, so that we can use existing optimizing compilers for existing architectures, and avoid ISA interpretation overhead in our simulator. We use the Intel PIN library [3] to dynamically instrument all loads and stores to call into our detailed memory system simulator and thread manager. The dynamic instrumentation of a host execution provides the speed that enables simulation of numerous threads, while capturing important memory system aspects of the InfiniT architecture.

Research Support

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA) or the U.S. Government.

This project is sponsored by the Defense Advanced Research Projects Agency (DARPA) through the Department of the Interior National Business Center under grant number NBCH104009.

References

[1] C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. Unbounded Transactional Memory. In The Proceedings of the 11th International Symposium on High Performance Computer Architecture, pp. 316--327, San Francisco, CA, February 2005.

[2] Emmett Witchel. Mondriaan Memory Protection. MIT Ph.D. Dissertation, January 2004.

[3] Pin - A Dynamic Binary instrumentation Tool. http://rogue.colorado.edu/pin

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)