

> University of Maryland

Department of Electrical and Computer Engineering Comparative Analysis of Contemporary Cache Power Reduction Techniques

> Ph.D. Dissertation Proposal Samuel V. Rodriguez





> University of Maryland

Department of Electrical and Computer Engineering

## Motivation

-Thermal Design Power (TDP) is now a priority specification

-AMD currently can't compete in "Thin and light" notebooks because of their higher TDP's

-AMD's power advantage in initial dualcore offerings

*-An entire Intel Pentium 4 design recently cancelled because of higher than expected TDP's* 



Photograph taken from Gurumurthi





> University of Maryland

Department of Electrical and Computer Engineering

> Fraction of die area and xsistor count dedicated to caches is increasing

Photograph taken from Weiss2002



> University of Maryland

Department of Electrical and Computer Engineering

# **Presentation Outline**

- Motivation (finished)
- Background
  - Power Dissipation
  - Cache/SRAM Implementation
- Contemporary Cache Power Reduction Schemes
- Proposed Work
- Q&A



Graph from Kim2004









> University of Maryland

Department of Electrical and Computer Engineering

## **Background** (Power Dissipation)

- Power<sub>dyn</sub>  $\propto$  N x C x V<sub>DD</sub><sup>2</sup> x f  $-\uparrow\uparrow\uparrow$  N : Number of transistors

  - ↑↑ f:
- $-\downarrow\downarrow$  C: Device capacitance
- $-\downarrow$  V<sub>DD</sub> : supply voltage
  - Frequency
- Dynamic power trend: slow increase







> University of Maryland

Department of Electrical and Computer Engineering

## Background (Power Dissipation)

• Subthreshold leakage is increasing:

Id,sat 
$$\propto$$
 (V<sub>gs</sub> - V<sub>th</sub>) = (V<sub>DD</sub> - V<sub>th</sub>)

• Increase: 5x per generation



University of Maryland

Department of Electrical and Computer Engineering

# Background (Power Dissipation)

• Gate leakage



Tox scaling resulting in increased gate leakage caused by oxide tunneling

• Gate leakage: *500x* per generation!!!



> University of Maryland

Department of Electrical and Computer Engineering

# **Presentation Outline**

Motivation (finished)

## Background

- Power Dissipation
- Cache/SRAM Implementation
- Contemporary Cache Power Reduction Schemes
- Proposed Work
- Q&A













> University of Maryland

Department of Electrical and Computer Engineering

# **Presentation Outline**

- Motivation (finished)
- Background (finished)
  - Power Dissipation
  - Cache/SRAM Implementation
- Contemporary Cache Power Reduction Schemes
- Proposed Work
- Q&A



## Cache Power Reduction Techniques

Samuel Rodriguez Ph.D. Proposal

> University of Maryland

Department of Electrical and Computer Engineering

| Scheme                 | Dynamic /<br>Static? | Est. Power<br>Savings | Exec-time increase? | State-<br>Retentive? |
|------------------------|----------------------|-----------------------|---------------------|----------------------|
| Gated-Vdd              | Static               | N/A *                 | YES                 | NO                   |
| Cache<br>decay         | Static               | 80%                   | YES                 | NO                   |
| DRG-cache              | Static               | 39%-59%               | NO                  | YES                  |
| Drowsy<br>cache        | Static               | 60-75%                | YES                 | YES                  |
| Near-OPT<br>precharge  | Static               | N/A **                | YES                 | N/A                  |
| Way-<br>halting        | Dynamic              | 55%                   | NO                  | N/A                  |
| Data size<br>detection | Dynamic              | N/A                   | NO                  | N/A                  |

\* - paper only cites 62% energy-delay savings

\*\* - paper only cites 92% reduction of bitline discharge



# Cache Power Reduction Techniques (cont...)

Samuel Rodriguez Ph.D. Proposal

> University of Maryland

Department of Electrical and Computer Engineering

| Scheme                 | Miss<br>Ratio<br>increase? | Access<br>time<br>increase? | Variable<br>load-hit<br>latency? | µARCH<br>transparent<br>? | Additional<br>noise<br>problems? |
|------------------------|----------------------------|-----------------------------|----------------------------------|---------------------------|----------------------------------|
| Gated-Vdd              | YES                        | YES                         | NO                               | NO                        | NO                               |
| Cache<br>decay         | YES                        | YES                         | NO                               | NO                        | NO                               |
| DRG-cache              | NO                         | YES                         | NO                               | YES                       | YES                              |
| Drowsy<br>cache        | NO                         | YES                         | YES                              | NO                        | YES                              |
| Near-OPT<br>precharge  | NO                         | NO*                         | YES                              | NO                        | NO                               |
| Way-<br>halting        | NO                         | NO*                         | NO                               | YES                       | NO                               |
| Data size<br>detection | NO                         | YES                         | NO                               | YES                       | NO                               |
|                        | NO                         | YES                         | NO                               | YES                       | NO                               |

\* - With proper design







> University of Maryland

Department of Electrical and Computer Engineering

#### **Cache Power Reduction Techniques**

1. Gated-Vdd (microarchitecture : Dynamically ResIzable [DRI] Cache)

- Mask out part of the index to dynamically resize the cache
- Make this decision based on the cache Hit ratio
- Energy-delay reduced by 62%





> University of Maryland

Department of Electrical and Computer Engineering

#### **Cache Power Reduction Techniques**

1. Gated-Vdd (microarchitecture : Dynamically ResIzable [DRI] Cache)

Example: If MASK removes the upper 2 bits of the index, only the lower \_ sets of the cache can be accessed (all other sets are gated off)











3. Data Retention Ground (DRG) (circuit)



-DRG gates the ground of the MC's
-With careful sizing, state can be preserved!
-Technique is transparent!!
-Power is reduced by 39% to 59%



Samuel Rodriguez Ph.D. Proposal

> University of Maryland

Department of Electrical and Computer Engineering





> University of Maryland

Department of Electrical and Computer Engineering

#### Cache Power Reduction Techniques

### 5. Near-optimal Precharging

bitline leakage burns power even in unused cache subarray (additional power is needed during the precharge phase)
For a given time interval, only a small fraction of subarrays are actually used

-Bitline discharge reduced by 92%







University of Maryland

Department of Electrical and Computer Engineering

> -Perform early miss detection to stop access to cache ways that are certain to miss
> -Early miss detection performed by offloading a few tag bits into a faster array that performs tag comparison early in the access

-Power reduced by 55%

#### Cache Power Reduction Techniques

### 6. Way-halting cache





> University of Maryland

Department of Electrical and Computer Engineering

#### **Cache Power Reduction Techniques**

#### 7. Data Size Detection



-Not every operand uses up the maximum space provided by the wordlength (e.g. ~94% of the operands in 64-bit Alpha SpecInt95 benchmarks use 32-bit or less)

-Keep track of this information to turn off the upper bits of the datapath (saving on wordline, bitline and sense-amp power)



## Cache Power Reduction Techniques

Samuel Rodriguez Ph.D. Proposal

> University of Maryland

Department of Electrical and Computer Engineering

| Scheme                 | Dynamic /<br>Static? | Est. Power<br>Savings | Exec-time increase? | State-<br>Retentive? |
|------------------------|----------------------|-----------------------|---------------------|----------------------|
| Gated-Vdd              | Static               | N/A *                 | YES                 | NO                   |
| Cache<br>decay         | Static               | 80%                   | YES                 | NO                   |
| DRG-cache              | Static               | 39%-59%               | NO                  | YES                  |
| Drowsy<br>cache        | Static               | 60-75%                | YES                 | YES                  |
| Near-OPT<br>precharge  | Static               | N/A **                | YES                 | N/A                  |
| Way-<br>halting        | Dynamic              | 55%                   | NO                  | N/A                  |
| Data size<br>detection | Dynamic              | N/A                   | NO                  | N/A                  |

\* - paper only cites 62% energy-delay savings

\*\* - paper only cites 92% reduction of bitline discharge



# Cache Power Reduction Techniques (cont...)

Samuel Rodriguez Ph.D. Proposal

> University of Maryland

Department of Electrical and Computer Engineering

| Ratio<br>increase? | time<br>increase?            | load-hit<br>latency?                                       | transparent<br>?                                                               | Additional<br>noise<br>problems?                                       |
|--------------------|------------------------------|------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------|
| YES                | YES                          | NO                                                         | NO                                                                             | NO                                                                     |
| YES                | YES                          | NO                                                         | NO                                                                             | NO                                                                     |
| NO                 | YES                          | NO                                                         | YES                                                                            | YES                                                                    |
| NO                 | YES                          | YES                                                        | NO                                                                             | YES                                                                    |
| NO                 | NO*                          | YES                                                        | NO                                                                             | NO                                                                     |
| NO                 | NO*                          | NO                                                         | YES                                                                            | NO                                                                     |
| NO                 | YES                          | NO                                                         | YES                                                                            | NO                                                                     |
|                    | YES<br>YES<br>NO<br>NO<br>NO | YES YES<br>YES YES<br>NO YES<br>NO YES<br>NO NO*<br>NO NO* | YES YES NO<br>YES YES NO<br>NO YES NO<br>NO YES YES<br>NO NO* YES<br>NO NO* NO | YESYESNONOYESYESNONONOYESNOYESNOYESYESNONOYESYESNONONO*YESNONONO*NOYES |

\* - With proper design



> University of Maryland

Department of Electrical and Computer Engineering

# **Presentation Outline**

- Motivation (finished)
- Background (finished)
  - Power Dissipation
  - Cache/SRAM Implementation
- Contemporary Cache Power Reduction Schemes
- Proposed Work
- Q&A



> University of Maryland

Department of Electrical and Computer Engineering

# **Proposed Work**

- Detailed comparative study of discussed low-power cache techniques (and various combinations)
- Metrics of comparison:
  - Power dissipation (including overheads)
  - Performance penalty (IPC and access time)
  - Die area overhead
  - Complexity



> University of Maryland

Department of Electrical and Computer Engineering

# **Proposed Work**

- Contributions
  - Every scheme is put on the same playing field
  - Schemes are made up to date with the use of predictive 65nm/45nm technology
  - Improved evaluation accuracy
    - Gate leakage is now accounted for
    - Careful accounting for overheads
    - Use of a state-of-the-art memory system model
  - Data Size Detection is proposed

| NUERSITL<br>18<br>ZARYLAND                                 |       |  |
|------------------------------------------------------------|-------|--|
| Samuel Rodriguez<br>Ph.D. Proposal                         |       |  |
| University of<br>Maryland                                  |       |  |
| Department of<br>Electrical and<br>Computer<br>Engineering | Q & A |  |

| 1   | TERS | ITY |    |
|-----|------|-----|----|
| (S) | A    | 26  | 2  |
| 18  |      | 5%  | 56 |
| 12  | RY   | AP  | /  |

> University of Maryland

Department of Electrical and Computer Engineering

## Thank You