MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Technical Reports

Work Products

Research Abstracts

Historical Collections

horizontal line

Research Abstracts - 2006
horizontal line

horizontal line

The Extreme Benchmark Suite

Steve Gerding & Krste Asanovic

Introduction

The Extreme Benchmark Suite (XBS) [1] is designed to support performance measurement of highly parallel "extreme" processors, many of which are designed to replace custom hardware implementations. XBS is designed to avoid many of the problems that occur when using existing benchmark suites with nonstandard and experimental architectures. In particular, XBS is intended to provide a fair comparison of a wide range of architectures, from general-purpose processors to hard-wired ASIC implementations. XBS has a clean modular structure to reduce porting effort, and is designed to be usable with slow cycle-accurate simulators. To allow better coverage, XBS separates validation tests from performance measurement workloads. To simplify power measurement on real hardware or simulators, XBS provides separate power measurement workloads.

Benchmark Characterization

Benchmarks are organized into three levels: kernels, applications, and scenarios.

Kernels represent small compute-intensive pieces of code that perform a single algorithm. Examples include FFT, matrix multiply, Quicksort, DCT, RSA encryption, or image convolution.
Applications combine multiple kernels to perform an identifiable task such as compressing an image in JPEG format or decoding an MP3 audio file.
Scenarios combine multiple simultaneously running applications to measure the performance of processors on typical workloads. Examples include running a WCDMA link while decoding an MPEG video file.

A benchmark suite is a subset of the benchmarks in a particular application area. The purpose of a suite is to ensure processors are compared across a range of benchmark kernels, applications, and scenarios that will be encountered in a given device, and to avoid "cherry picking" of benchmarks that happen to work well with a given design. Examples include a wireless telecom suite, a network router suite, a consumer media suite, and a mobile device suite. The same benchmark might appear in several suites, though possibly with different input workloads. The suite may include a workload weighting that allows a single number to be used to characterize overall performance on the suite.

Benchmark Implementation

All benchmarks have the structure shown in Figure 1. An input generator generates a benchmark input file, or set of files. In some cases, the input generator is replaced with standard input files. The implementation under test has a harness to apply the input data and extract output data into an output file. An output checker reads in the input and output files and verifies that the implementation has produced a valid result.

Figure 1: XBS benchmark structure.

The input generator and the output checker are separate programs designed to run on standard workstations. The majority of the code in a conventional benchmark program is concerned with preparing input and validating output, which increases the effort of porting code. This benchmarking-specific code is not required in a production implementation. By separating out this code, we reduce the complexity of porting a benchmark. In particular, the full facilities of a standard workstation can be used in this code without worrying about porting to devices that, for example, have non-standard I/O or lack floating-point. Only the core code has to be ported to run on the device under test, with a simple test harness to read in and write out the input and output data streams. This approach also reduces simulation time for a simulated device.

Current State of XBS and Future Work

The XBS benchmarks are currently being developed and ported to different architectures. Once a significant number of benchmarks have been implemented and timed on multiple platforms, the XBS benchmarks, along with their times, will be made available on the web. This will allow members of the computer architecture community to compare different platforms, post XBS times for their own processors, and add new benchmarks to XBS.

The platforms on which XBS benchmarks are initially being implemented are:

The SCALE vector-thread architecture [2] currently being developed at MIT
The family of Intel x86 processors, assembly optimized using the Intel IPP Libraries [3]
Hard-wired ASICs, using the Verilog and Bluespec hardware description languages

At present, the following benchmarks have been implemented within the XBS framework and have been ported to some or all of the aforementioned platforms:

	Kernel Benchmarks		Application Benchmarks
	802.11a Scrambler 802.11a Interleaver 802.11a Modulator 802.11a Synchronizer 802.11a Detector/Deinterleaver Convolutional Encoder Viterbi Decoder IFFT FFT FFT Radix 4 Block JPEG Coeffcient Quantization JPEG Entropy Encoding RGB to YCC Color Conversion IDCT DCT Vector-Vector Add Matrix Transpose FIR		802.11a Transmitter 802.11a Receiver JPEG Compressor JPEG Decompressor H.264 Decoder

Research Support

This work was partially supported by NSF CAREER Award CCR-0093354, the Cambridge-MIT Institute, and an equipment donation from Intel Corporation.

References

[1] Steven Gerding. The Extreme Benchmark Suite: Measuring High-Performance Embedded Systems Master's Thesis, Massachusetts Institute of Technology, September 2005. [ PDF ]

[2] Ronny Krashinsky, Christopher Batten, Mark Hampton, Steve Gerding, Brian Pharris, Jared Casper, and Krste Asanovic. The Vector-Thread Architecture. In 31st International Symposium on Computer Architecture, Munich, Germany, June 2004. [ PDF ]

[3] Intel Integrated Performance Primitives. http://www.intel.com/software/products/ipp/

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu