Abstracts - 2006
METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors
Jae W. Lee & Krste Asanovic
Multiprocessor systems present serious challenges in the design of real-time systems due to the wider variation of execution time of an instruction sequence compared to uniprocessor systems. This variation results partly from complex non-deterministic interactions among multiple concurrent processes that share common system resources such as the memory system. Although such non-determinism is tightly controlled by adding conventional Quality-of-Service (QoS) support, it is generally difficult to find the minimal hardware resource request settings for a given user-level performance goal. That is, translating a high-level performance goal for a complex application (e.g., transactions per second) into minimal settings for the hardware QoS components (e.g., memory bandwidth and latency) is challenging and usually intractable.
One approach to find the minimal setting of resource reservation is analysis-based [2, 3]. In order to estimate the minimal amount of resources to meet the performance goal, the user either relies on high-level knowledge of a program or calculates it using program analysis, hardware performance modeling, or both. This approach is not always possible and usually very expensive considering the complexity of modern hardware and software systems. Another approach is measurement-based . In this approach, the user tries to measure the performance of their code running on the target system with varying settings of the resource controls. Unfortunately, a conventional QoS-capable shared resource usually distributes unused resources to the sharers so as to maximize the overall system's throughput. Consequently, even with an identical setting of resource reservation parameters, the observed performance of an instruction sequence fluctuates widely depending on how much additional resource it receives.
Therefore, we propose a new measurement-based technique, METERG (Measurement-Time Enforcement and Runtime Guarantee) , where QoS blocks are modified to support two modes of operation. During performance measurement, resource guarantees in the QoS blocks are treated as an upper bound, while during deployment, resource guarantees are treated as a lower bound. The METERG QoS system enables us to easily estimate by measurement the maximum execution time of the instruction sequence produced by a program and input data pair, under the worst-case resource contention. In this way, we can guarantee measured performance during operation. According to the simulation results in an example case, the estimated performance lower bound is about 20% worse than the actual (observed) worst-case performance.
In the METERG QoS system, each process requiring QoS support (QoS process for brevity) can run in two operation modes: enforcement and deployment. In enforcement mode, a QoS process cannot take more than its guaranteed resource share from each QoS block even when there are additional resources available. In deployment mode, however, the process is allowed to use any available additional resources. If a QoS block supports the two operation modes as described above, we call it a METERG QoS block. If every shared resource within a system is a METERG QoS block, the system is termed a METERG system.
With a METERG system, a user first measures execution time of a given code sequence with a given input in enforcement mode with given resource reservation parameters. The METERG system then guarantees that a later execution of the same code sequence in deployment mode will perform as well as or better than the previous execution in enforcement mode, provided the same parameters are used for resource reservation.
Figure 1 depicts the METERG system model. There are n processors and m METERG QoS blocks (e.g., memory, I/Os). These QoS blocks are shared among all n processors, and the j-th METERG QoS block can reserve a certain amount of resource for Processor i as specified by xij (where 0 ≤ xij ≤ 1 by convention). Unlike conventional guaranteed QoS blocks, they accept an extra parameter, OpModei, from a processor to request the QoS operation mode. If the processor requests enforcement mode, the strict upper bound on runtime usage of a resource is enforced in every METERG QoS block. If the processor requests deployment mode, it can use additional otherwise unclaimed resources.
In order to reflect a tradeoff between safety and tightness of performance estimation, we introduce two types of enforcement mode: relaxed and strict enforcement modes. In relaxed enforcement mode, only the bandwidth given to a processor in deployment mode is guaranteed to be no less than what would have been given in enforcement mode (Condition 1). In strict enforcement mode, both bandwidth (Condition 1) and latency guarantees (Condition 2) are enforced. That is, the latency in accessing a shared resource in enforcement mode is guaranteed to be no smaller than that in deployment mode. Consequently, the estimated lower bound on the execution time of a program running in strict enforcement mode is safer than that in relaxed enforcement mode but is not as tight.
In order to evaluate the effectiveness of the METERG framework, we have added a strict METERG memory system to a full-system execution-driven multiprocessor simulator based on Bochs IA-32 emulator. Although our simulator only supports the METERG QoS in the memory subsystem, we believe that the QoS support can be extended to other shared I/O devices (e.g., disk, network). We use a synthetic benchmark, called memread, for our evaluation to mimic the behavior of an application whose performance is bounded by the memory system performance. It runs an infinite loop which accesses a large memory block sequentially to generate cache misses, with a small amount of bookkeeping computation in each iteration.
Figure 2 depicts the performance of memread in various configurations. In (a), as the number of concurrent processes increases from 1 to 8, the performance of a process degrades by 46% without QoS support (BE case), but only by 9% in deployment mode (DEP case). The estimated performance from a measurement in enforcement mode (ENF case) indicates the performance degradation for the given resource reservation to be 31% in the worst case. The performance gap between enforcement and deployment mode executions can be explained by at least two factors. First, there is extra bandwidth given to the process in deployment mode, which would not be given in enforcement mode. Second, regardless of the actual severity of contention, every single memory access in enforcement mode takes the longest possible latency in deployment mode. In (b), we observe that the performance estimation in strict enforcement mode becomes tighter as the resource allocation parameter (x1) increases.
Figure 3 shows the interactions of multiple concurrent QoS processes. The figure demonstrates that, even if we increase the number of QoS processes from 1 to 4, the performance of QoS processes in deployment mode degrades very little (by less than 2%) for a given parameter (xi=0.25) as long as the system is not oversubscribed. The performance lower bound estimated by an enforcement-mode execution is strictly guaranteed. (The amount of reserved resource for each process is given in the resource allocation vector. Note that xi=0 means no resource is reserved for Processor i (best-effort), and that we use x4=0.20 rather than x4=0.25 in the case of 4 QoS + 4 BE so as not to starve the best-effort processes.)
This work was partially supported by NSF CAREER Award CCR-0093354, the Cambridge-MIT Institute, and an equipment donation from Intel Corporation.
 Jae W. Lee and Krste Asanovic. METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors. In the Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), San Jose, CA, April 2006.
 Aravindh Anantaraman, Kiran Seth, Kaustubh Patil, Eric Rotenberg and Frank Mueller. Virtual Simple Architecture(VISA): Exceeding the Complexity Limit in Safe Real-Time Systems. In the Proceedings of the 30th International Symposium on Computer Architecture (ISCA), pp. 350--361, June 2003.
 R. Arnold, F. Mueller, D. B. Whalley and M. Harmon. Bounding Worst-case Instruction Cache Performance. In the Proceedings of the 15th Real-Time Systems Symposium (RTSS), December 1994.
 S. Petters. Comparison of Trace Generation Methods for Measurement-based WCET Analysis. In the Proceedings of the third International Workshop on Worst Case Execution Time Analysis, July, 2003.