Abstracts - 2006
HASim: A Hardware Partitioned Processor for fast and Accurate Architectural Exploration
Nirav Dave, Michael Pellauer & Joel Emer
Microarchitectural performance simulators are often desired as a first stage testbed for exploring new architectural ideas. To be effective such simulators require three basic properties: speed of design, confidence in their correctness, and speed of simulation. Previous implementations of such simulators in software invariably achieve at most two of these goals. For accuracy, software simulators must model a great deal of state which is inherently slow. Even with fidelity-reducing optimizations, operation still lie 4 orders of magnitude below the designs being modeled.
One accepted approach for software simulations is to split the model into two partitions: a functional partition, and a timing partition. The functional partition is responsible for executing the functional aspects of the model's instructions. Unsurprisingly, the timing partition is responsible for determining the timing of each stage of the instructions' execution. This partitioning allows designers to reuse the code that does the functional calculations, and simplifies the code in the timing partition since it only has to worry about timing and resource management.
HASim takes the approach of doing the performance simulation in hardware, specifically FPGAs. By putting the model onto an FPGA we expect to operate in the many MIPS range: a 100x improvement over the best software simulators. To address the speed of model design, our approach includes the software notion of partitioning the functional and timing aspects of the model. Like in the software approach, the design effort of the functional partition can be amortized across many timing models. In addition, since the timing partition does not have to be concerned with the actual implementation of the instruction set semantics it is easier to design and more likely to meet the physical constraints of an FPGA. To further aid in design efficiency, HASim is implemented in Bluespec SystemVerilog. This allows the modeling to be approached from a higher level of abstraction. Additionally, Bluespec's strong parametrization aids in the verification process.
HASim breaks the functional partition into a simple pipeline with the following stages: Fetch, Decode, Execute, Memory, Local Commit, and Global Commit. Instructions pass from stage to stage as directed by the timing partition via provided request/response method pairs as shown in Figure 1.
Figure 1. The HASim Design
To handle instruction reordering in the functional partition, each stage holds a table from which unprocessed instructions can be taken out of order. To support this, the functional partition must implement register/memory renaming of at least the depth of the modeled microarchitecture. To deal with misspeculations, a broadcast abort operation is used to kill wrong path instructions.
As a potential optimization, we are considering allowing the emulated times for each stage to slip with respect to one another. To help prevent timing infidelities, we would insist that any interaction with the functional partition be tagged with its associated emulated clock tick.
Progress and Future Work
Currently, we have a parametrized single-issue functional partition with a toy-ISA operating. We hope to implement the full x86 ISA. We are also considering incorporating a number of different ISAs into a single reusable partition via some universal micro-operation ISA.
We will also explore how to best implement timing partitions for superscalar and speculative architectures. Support for multiprocessor systems could be achieved via duplicating/sharing of functional partitions. We also hope to use large FPGAs platforms to run models which are otherwise difficult to simulate by current software techniques, e.g., multi-megabyte caches.
Another avenue of interest is to combine this project with UNUM. Doing so may provide a way to incrementally refine from timing-accurate models to more hardware-accurate models which could be used for more in-depth analysis.
 Kenneth C. Barr, Ramon Matas-Navarro, Christopher Weaver, Toni Juan, and Joel Emer. Proceedings of 3rd annual Boston Area Architecture Workshop. Providence, RI, 2005.
 Kenneth C. Barr, Heidi Pan, Michael Zhang, and Krste Asanovic. Accelerating a Multiprocessor Simulation with a Memory Timestamp Record. In International Symposium on Performance Analysis of Systems and Software, Austin, TX, 2005.
 Bluespec, Inc., Waltham, MA. Bluespec SystemVerilog Version 3.8 Reference Guide, November 2004.
 Derek Chiou, Huzefa Sunjeliwala, Dam Sunwoo, John Xu, and Nikhil Patil. FPGA-based Fast, Cycle-Accurate, Full-System Simulators. Number UTFAST-2006-01, Austin, TX, 2006.
 Nirav Dave and Michael Pellauer. UNUM: A General Microprocessor Framework Using Guarded Atomic Actions. In Proceedings of the Workshop on Architecture Research using FPGA Platforms held at HPCA-11, San Francisco, CA, February 2005.
 Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, and Toni Juan. Asim: A performance model framework. Computer, 35(2): pp. 68--72, 2002