CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

METERG: Architectural Support for Worst-case Performance Guarantees on Multiprocessor Platforms

Jae W. Lee & Krste Asanovic

Introduction

While architectural innovations to optimize common cases have boosted system performance enormously in terms of average execution time, they have had less impact on worst-case performance. Consequently, there is a large and increasing gap between average and worst-case performance of processors which use sophisticated architectural techniques such as caches, complex branch predictors, dynamic scheduling, deep speculation, and prefetching. These techniques cannot be easily adopted in systems with real-time constraints. For example, streaming video and real-time controllers that require tasks to complete before a deadline.

Most current worst-case performance analyses rely on static timing analysis to some extent. In this approach, a performance model of each component in the system is set up, and the program in question is analyzed statically to determine the component's worst-case behavior. While the technique works well for simple systems [1], it cannot be easily applied to more complex uniprocessors, let alone multiprocessors.

In conventional worst-case performance analyses [2, 3], the only source of performance fluctuation is variation of input data affecting a program's control and data flow paths. However, in a system with multiple processing cores, the dynamics of the concurrent processes on other processors become another major source of performance fluctuation. According to our study, execution time of a code sequence with identical input data can increase by over 100 % on an 8-processor system, solely because of contention to access shared resources such as memory modules and interconnection networks. Most Quality-of-Service (QoS) research has focused on providing bandwidth or latency guarantees for a subblock of the system (e.g. memory, I/O, and so on), but not on overall end-to-end performance which system users ultimately care about.

To overcome these limitations, we introduce the METERG system, which is an acronym for measurement-time enforcement and runtime guarantee. The goal of METERG framework is to guarantee worst-case end-to-end performance in the presence of contention for shared resources without complex static analysis. However, it is not our goal to completely replace static analysis-based techniques, which are still useful to get an absolute performance lower bound in a simple system across all possible input data. Instead, our work can augment or replace them in the cases which they cannot cover, by producing a reasonable performance lower bound in more complex systems.

Figure 1. METERG System Overview.
METERG System Overview
Approach

Figure 1 shows an overview of a METERG system. While the instruction set architecture (ISA) provides an abstraction for the behavior of the underlying hardware, it does not reveal any information about its performance. The METERG framework provides a simple but useful abstraction for the end users to reason about the end-to-end performance of a program running on a processor in multiprocessor system. In the following description, we assume a multiprogrammed environment, where the program of interest monopolizes all CPU time of the processor it is running on, and is not preempted and scheduled out by the OS until it finishes execution.

Figure 2 depicts the performance abstraction provided by the METERG system. It implements a black box performance function F(Prog, Input, OpMode, x, Context). The program's performance is determined by four parameters provided by a user and one parameter implicitly given by the system itself. A brief description of the parameters will follow:

Prog = Program binary to run
Input = Input data to run the program with
OpMode = Operation mode (either enforcement or deployment mode - explained later)
x = Resource allocation parameter (specifying an amount of resources to be reserved for this program)
Context = System's context (implicit; dynamics of other concurrent processes)
Figure 2. Performance Abstraction of METERG Layer.
Performance Abstraction of METERG Layer

In general, it is intractable to calculate or model the performance function F(.) itself. Instead, the METERG system enforces the following inequality of F(.).

METERG inequality (1)

The inequality (1) can be interpreted like this: For given program, input data, and resource allocation parameter, performance in deployment-mode execution is always equal to or higher than that in enforcement-mode execution, regardless of the dynamics of other concurrent processes.

Because this inequality holds in a METERG system, we can actually run an application in enforcement mode to measure its performance, and use it as the worst-case performance estimate. The system guarantees that a later execution of the same code sequence in deployment mode with the same parameters will perform as well as, or better than the previous execution in enforcement mode.

This property of METERG system can be enforced by hardware with two components.

Guaranteed QoS blocks (corresponding to the function g(.) in figure 2) They are similar to the conventional QoS-capable components which can provide a certain bandwidth and/or latency guarantee. They take one or more resource allocation parameters for each application to control allocation of their resources. For example, a shared bus with guaranteed QoS may take a single parameter to determine the fraction of available bandwidth it will allocate to each application and set up its bus arbiter properly.

One unique requirement for the guaranteed QoS blocks in METERG system is to support two operation modes for each application -- enforcement mode and deployment mode. The operation mode specifies how the block will interpret the resource allocation parameter(s); in enforcement mode, the block always allocates resources no more than those specified by the parameters, while in deployment mode no less than specified. In other words, the resource allocation parameter is interpreted as the upper bound of allocated resources in the former, but as the lower bound in the latter. Consequently, for a given set of parameters, an application in deployment mode gets allocated strictly more resources than in enforcement mode. Also, the access time of the block in deployment mode must be always no greater than that in enforcement mode.

Processor with Performance Monotonicity (corresponding to the function h(.) in figure 2) In modern out-of-order processors, a smaller access time to a system component does not necessarily lead to better performance. For example, a smaller memory access time could cause the instruction schedule to evict valid cache lines before they are used, thereby degrading the end-to-end performance. It is not unusual for complex processors to have anomalies where more resources could result in worse performance. The METERG framework mandates a processor's performance to be monotonically increasing to allocation of additional resources in guaranteed QoS blocks.

According to [4], if all common resources (functional units, registers, buses, RW ports, etc.) are accessed in-order, no timing anomalies can occur. Therefore, we can adopt microarchitectural optimizations such as cache, sophisticated branch predictor, multiple issue, and prefetching as long as we do not violate the in-order access requirement.

For different input data, the inequality (1) does not hold in general, because the program can take different control and data flow paths. The METERG framework, however, can be still useful in estimating the worst-case performance for the following reasons. First, it can augment the conventional static analysis-based techniques by providing a safe system substrate on which they can be applied without worrying about the performance loss by contention. Second, an enforcement-mode execution with a different input set itself gives a reasonable, if not absolute, performance lower bound, because there is a substantial performance margin between executions in the two modes, coming from extra resource allocation in deployment mode and selective activation of sophisticated features in the processor.

The METERG framework also has an implication in designing guaranteed QoS components. The only requirement of guaranteed QoS components in the METERG system is to guarantee a smaller access time in deployment mode than in enforcement mode. Here the inequality between the two modes matters, but not their absolute values. Consequently, it is likely to be easier to design METERG-compatible components than conventional guaranteed QoS blocks. For example, it is more difficult to service all the memory accesses within 100 ns than to enforce all memory access time in deployment mode to be smaller than that in enforcement mode.

Research Support

This work is partially supported by NSF CAREER Award CCR-0093354, the Cambridge-MIT Institute, and an equipment donation from Intel corporation.

References

[1] Aravindh Anantaraman, Kiran Seth, Kaustubh Patil, Eric Rotenberg, and Frank Mueller. Virtual Simple Architecture(VISA): Exceeding the Complexity Limit in Safe Real-Time Systems. In the Proceedings of the 30th International Symposium on Computer Architecture (ISCA), pp. 350--361, June 2003.

[2] R. Arnold, F. Mueller, D. B. Whalley, and M. Harmon. Bounding Worst-case Instruction Cache Performance. In the Proceedings of the 15th Real-Time Systems Symposium (RTSS), December 1994.

[3] S. Lim, Y. Bae, G. Jang, B. Rhee, S. Min, C. Park, H. Shin, K. Park, and C. Kim. An Accurate Worst-case Timing Analysis Technique for RISC Processors. In the Proceedings of the 15th Real-Time Systems Symposium (RTSS), pp. 97--108, December 1994.

[4] T. Lundqvist and P. Stenstrom. Timing Anomalies in Dynamically Scheduled Microprocessors. In the Proceedings of the 20th Real-Time Systems Symposium (RTSS), 1999.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)