CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

The Extreme Benchmark Suite

Steve Gerding & Krste Asanovic

Introduction

The Extreme Benchmark Suite (XBS) is designed to support performance measurement of highly parallel "extreme" processors, many of which are designed to replace custom hardware implementations. XBS is designed to avoid many of the problems that occur when using existing benchmark suites with nonstandard and experimental architectures. In particular, XBS is intended to provide a fair comparison of a wide range of architectures, from general-purpose processors to hard-wired ASIC implementations. XBS has a clean modular structure to reduce porting effort, and is designed to be usable with slow cycle-accurate simulators. To allow better coverage, XBS separates validation tests from performance measurement workloads. To simplify power measurement on real hardware or simulators, XBS provides separate power measurement workloads.

Benchmark Characterization

Benchmarks are organized into three levels: kernels, applications, and scenarios.

Kernels represent small compute-intensive pieces of code that perform a single algorithm. Examples include FFT, matrix multiply, Quicksort, DCT, RSA encryption, or image convolution.
Applications combine multiple kernels to perform an identifiable task such as compressing an image in JPEG format or decoding an MP3 audio file.
Scenarios combine multiple simultaneously running applications to measure the performance of processors on typical workloads. Examples include running a WCDMA link while decoding an MPEG video file.

A benchmark suite is a subset of the benchmarks in a particular application area. The purpose of a suite is to ensure processors are compared across a range of benchmark kernels, applications, and scenarios that will be encountered in a given device, and to avoid "cherry picking" of benchmarks that happen to work well with a given design. Examples include a wireless telecom suite, a network router suite, a consumer media suite, and a mobile device suite. The same benchmark might appear in several suites, though possibly with different input workloads. The suite may include a workload weighting that allows a single number to be used to characterize overall performance on the suite.

Benchmark Implementation

All benchmarks have the structure shown in Figure 1. An input generator generates a benchmark input file, or set of files. In some cases, the input generator is replaced with standard input files. The implementation under test has a harness to apply the input data and extract output data into an output file. An output checker reads in the input and output files and verifies that the implementation has produced a valid result.

Figure 1: XBS benchmark structure.

The input generator and the output checker are separate programs designed to run on standard workstations. The majority of the code in a conventional benchmark program is concerned with preparing input and validating output, which increases the effort of porting code. This benchmarking-specific code is not required in a production implementation. By separating out this code, we reduce the complexity of porting a benchmark. In particular, the full facilities of a standard workstation can be used in this code without worrying about porting to devices that, for example, have non-standard I/O or lack floating-point. Only the core code has to be ported to run on the device under test, with a simple test harness to read in and write out the input and output data streams. This approach also reduces simulation time for a simulated device.

Current State of XBS and Future Work

The XBS benchmarks are currently being developed and ported to different architectures. Once a significant number of benchmarks have been implemented and timed on multiple platforms, the XBS benchmarks, along with their times, will be made available on the web. This will allow members of the computer architecture community to compare different platforms, post XBS times for their own processors, and add new benchmarks to XBS.

The platforms on which XBS benchmarks are initially be implemented are:

The SCALE vector-thread architecture [1] currently being developed at MIT
The family of Intel x86 processors, assembly optimized using the Intel IPP Libraries [2]
Hard-wired ASIC, using the Verilog and Bluespec HDLs

At present, the following benchmarks have been implemented within the XBS framework and have been ported to some or all of the aforementioned platforms:

	Kernel Benchmarks		Application Benchmarks
	FIR Convolutional encoder iDCT Vector-vector add Rijndael Matrix transpose iFFT		JPEG compressor JPEG decompressor 802.11a transmitter

Research Support

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA) or the U.S. Government.

Effort sponsored by the Defense Advanced Research Projects Agency (DARPA) through the Department of the Interior National Business Center under grant number NBCH104009.

References

[1] Ronny Krashinsky, Christopher Batten, Mark Hampton, Steve Gerding, Brian Pharris, Jared Casper, and Krste Asanovic. The Vector-Thread Architecture. In 31st International Symposium on Computer Architecture, Munich, Germany, June 2004. [ PDF ]

[2] Intel Integrated Performance Primitives. http://www.intel.com/software/products/ipp/

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)