# **Application Slowdown Model**

# Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur Mutlu



Carnegie Mellon





# Problem: Interference at Shared Resources



# Impact of Shared Resource Interference



**Our Goal: Achieve High and Predictable Performance** 

### 1. Quantify Impact of Interference - Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

# Quantifying Impact of Shared Resource Interference



## Slowdown: Definition

# Approach: Impact of Interference on Performance



Our Approach: Estimate impact of interference aggregated over requests

### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

# Observation: Shared Cache Access Rate is a Proxy for Performance

#### Performance ∞ Shared Cache Access rate



#### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

# Estimating Cache Access Rate Alone



#### **Challenge 2Challenge 1:**

Shared cachelain memory capacity bandwidth interferenceinterference

# Estimating Cache Access Rate Alone



#### **Challenge 1:**

Main memory bandwidth interference

# Highest Priority Minimizes Memory Bandwidth Interference

Can minimize impact of main memory interference by giving the application highest priority at the memory controller

(Subramanian et al., HPCA 2013)

- 1. Highest priority minimizes interference
- 2. Enables estimation of miss service time (used to account for shared cache interference)

# Estimating Cache Access Rate Alone



#### **Challenge 2:**

Shared cache capacity interference

## Cache Capacity Contention



Applications evict each other's blocks from the shared cache

# Shared Cache Interference is Hard to Minimize Through Priority



Long warmup
Lots of interference to other applications

# Our Approach: Quantify and Remove Cache Interference

1. Quantify impact of shared cache interference

2. Remove impact of shared cache interference on CAR<sub>Alone</sub> estimates

## 1. Quantify Shared Cache Interference



# 2. Remove Cycles to Serve Contention Misses from CAR<sub>Alone</sub> Estimates

Cache Contention Cycles = #Contention Misses x

Average Miss Service Time

From auxiliary tag store when given high priority

Measured when application is given high priority

Remove cache contention cycles when estimating Cache Access Rate Alone (CAR Alone)

# Accounting for Memory and Shared Cache Interference

Accounting for memory interference

Accounting for memory and cache interference

### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

## Application Slowdown Model (ASM)

$$Slowdown = \frac{Cache Access Rate_{Alone}(CARAl_{one})}{Cache Access Rate_{Shared}(CARSh_{ared})}$$

## **ASM: Interval Based Operation**



## A More Accurate and Simple Model

- More accurate: Takes into account request overlap behavior
  - Implicit through aggregate estimation of cache access rate and miss service time
  - Unlike prior works that estimate per-request interference
- Simpler hardware: Amenable to set sampling in the auxiliary tag store
  - Need to measure only contention miss count
  - Unlike prior works that need to know if each request is a contention miss or not

### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation

# Previous Work on Slowdown Estimation

- Previous work on slowdown estimation
  - STFM (Stall Time Fair Memory) Scheduling [Mutlu et al., MICRO '07]
    - FST (Fairness via Source Throttling) [Ebrahimi et al., ASPLOS '10]
    - Per-thread Cycle Accounting [Du Bois et al., HiPEAC '13]

Basic Idea:

Count interference cycles experienced by each request

## Methodology

- Configuration of our simulated system
  - 4 cores
  - 1 channel, 8 banks/channel
  - DDR3 1333 DRAM
  - 2MB shared cache

- Workloads
  - SPEC CPU2006 and NAS
  - 100 multiprogrammed workloads

## Model Accuracy Results



Average error of ASM's slowdown estimates: 10% Previous models have 29%/40% average error

### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

## Cache Capacity Partitioning



Previous partitioning schemes mainly focus on miss count reduction Problem: Does not directly translate to performance and slowdowns

# ASM-Cache: Slowdown-aware Cache Capacity Partitioning

 Goal: Achieve high fairness and performance through slowdown-aware cache partitioning

 Key Idea: Allocate more cache space to applications whose slowdowns reduce the most with more cache space

### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

## Memory Bandwidth Partitioning



Goal: Achieve high fairness and performance through slowdown-aware bandwidth partitioning

# ASM-Mem: Slowdown-aware Memory Bandwidth Partitioning

 Key Idea: Prioritize an application proportionally to its slowdown

High Priority Fraction<sub>i</sub> = 
$$\frac{Slowdown_{i}}{\sum_{j} Slowdown_{j}}$$

 Application i's requests prioritized at the memory controller for its fraction

### 1. Quantify Slowdown

- Key Observation
- Estimating Cache Access Rate Alone
- ASM: Putting it All Together
- Evaluation

- Slowdown-aware Cache Capacity Allocation
- Slowdown-aware Memory Bandwidth Allocation
- Coordinated Cache/Memory Management

# Coordinated Resource Allocation Schemes



- 1. Employ ASM-Cache to partition cache capacity
- 2. Drive ASM-Mem with slowdowns from ASM-Cache

### Fairness and Performance Results



14%/8% unfairness reduction on 1/2 channel systems compared to PARBS+UCP with similar performance

## Other Results in the Paper

- Distribution of slowdown estimation error
- Sensitivity to system parameters
  - Core count, memory channel count, cache size
- Sensitivity to model parameters
- Impact of prefetching
- Case study showing ASM's potential for providing slowdown guarantees

## Summary

- Problem: Uncontrolled memory interference cause high and unpredictable application slowdowns
- Goal: Quantify and control slowdowns
- Key Contribution:
  - ASM: An accurate slowdown estimation model
  - Average error of ASM: 10%
- Key Ideas:
  - Shared cache access rate is a proxy for performance
  - Cache Access Rate <sub>Alone</sub> can be estimated by minimizing memory interference and quantifying cache interference
- Applications of Our Model
  - Slowdown-aware cache and memory management to achieve high performance, fairness and performance guarantees
- Source Code Release by January 2016

# **Application Slowdown Model**

# Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur Mutlu







