## **Read Disturb Errors**

in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

Yu Cai, **Yixin Luo**, Saugata Ghose, Erich F. Haratsch\*, Ken Mai, Onur Mutlu Carnegie Mellon University, \*Seagate Technology



Carnegie Mellon



#### **Executive Summary**

- Read disturb errors limit flash memory lifetime today
  - Apply a high pass-through voltage ( $V_{pass}$ ) to multiple pages on a read
- We characterize read disturb on real NAND flash chips
  - -Slightly lowering V<sub>pass</sub> greatly reduces read disturb errors
  - -Some flash cells are more prone to read disturb
- Technique 1: Mitigate read disturb errors online
  - $-V_{pass}$  Tuning dynamically finds and applies a lowered  $V_{pass}$
  - -Flash memory lifetime improves by 21%
- Technique 2: Recover after failure to prevent data loss
  - Read Disturb Oriented Error Recovery (RDR) selectively corrects cells more susceptible to read disturb errors
  - -Reduces raw bit error rate (RBER) by up to 36%

#### Outline

- Background (Problem and Goal)
- Key Experimental Observations
- Mitigation: V<sub>pass</sub> Tuning
- Recovery: Read Disturb Oriented Error Recovery
- Conclusion

#### Outline

- Background (Problem and Goal)
- Key Experimental Observations
- Mitigation: V<sub>pass</sub> Tuning
- Recovery: Read Disturb Oriented Error Recovery
- Conclusion

## NAND Flash Memory Background



## Flash Cell Array



#### Flash Cell



Floating Gate Transistor (Flash Cell)

#### Flash Read



## Flash Pass-Through



## Read from Flash Cell Array



#### Read Disturb Problem: "Weak Programming" Effect



#### Read Disturb Problem: "Weak Programming" Effect



Read disturb errors: Reading from one page can alter the values stored in other unread pages

# Goal: Mitigate and Recover Read Disturb Errors

#### Outline

- Background (Problem and Goal)
- Key Experimental Observations
- Mitigation: V<sub>pass</sub> Tuning
- Recovery: Read Disturb Oriented Error Recovery
- Conclusion

#### Methodology

FPGA-based flash memory testing platform [Cai+, FCCM '11]



- Real 20- to 24-nm MLC NAND flash chips
- 0 to 1M read disturbs
- 0 to 15K Program/Erase Cycles (PEC)

## Read Disturb Effect on V<sub>th</sub> Distribution



### Other Experimental Observations

- Lower threshold voltage states are affected more by read disturb
- Wear-out increases read disturb effect

## Key Observation 1: Slightly lowering V<sub>pass</sub> greatly reduces read disturb errors



#### Outline

- Background (Problem and Goal)
- Key Experimental Observations
- Mitigation: V<sub>pass</sub> Tuning
- Recovery: Read Disturb Oriented Error Recovery
- Conclusion

## Read Disturb Mitigation: V<sub>pass</sub> Tuning

• Key Idea: Dynamically find and apply a lowered  $V_{\text{pass}}$ 

- Trade-off for lowering V<sub>pass</sub>
  - +Allows more read disturbs
  - Induces more read errors

## Read Errors Induced by V<sub>pass</sub> Reduction



## Read Errors Induced by V<sub>pass</sub> Reduction



### Utilizing the Unused ECC Capability



- 1. Huge unused ECC correction capability can be used to tolerate read errors
- 2. Unused ECC capability decreases over time Dynamically adjust  $V_{pass}$  so that read errors fully utilize the unused ECC capability

SAFARI

## V<sub>pass</sub> Reduction Trade-Off Summary

- Conservatively set V<sub>pass</sub> to a high voltage
  - Accumulates more read disturb errors at the end of each refresh interval
  - +No read errors
- ullet Dynamically adjust  $V_{pass}$  to unused ECC capability
  - + Minimize read disturb errors
  - Control read errors to be tolerable by ECC
  - $\circ$  If read errors exceed ECC capability, read again with a higher  $V_{pass}$  to correct read errors

## V<sub>pass</sub> Tuning Steps

- Perform once for each block every day:
  - 1. Estimate unused ECC capability
  - 2. Aggressively reduce  $V_{pass}$  until read errors exceeds ECC capability
  - 3. Gradually increase V<sub>pass</sub> until read error just becomes less than ECC capability

## Evaluation of V<sub>pass</sub> Tuning

- 19 real workload I/O traces
- Assume 7-day refresh period
- •Similar methodology as before to determine acceptable  $V_{\text{pass}}$  reduction

- Overhead for a 512 GB flash drive:
  - $-128~{\rm KB}$  storage overhead for per-block  ${\rm V}_{\rm pass}$  setting and worst-case page
  - −24.34 sec/day average V<sub>pass</sub> Tuning overhead

## V<sub>pass</sub> Tuning Lifetime Improvements



Average lifetime improvement: 21.0%

#### Outline

- Background (Problem and Goal)
- Key Experimental Observations
- Mitigation: V<sub>pass</sub> Tuning
- Recovery: Read Disturb Oriented Error Recovery
- Conclusion

#### Read Disturb Resistance



## Observation 2: Some Flash Cells Are More Prone to Read Disturb

After 250K read disturb:



#### Read Disturb Oriented Error Recovery (RDR)

- Triggered by an uncorrectable flash error
  - —Back up all valid data in the faulty block
  - Disturb the faulty page 100K times (more)
  - -Compare V<sub>th</sub>'s before and after read disturb
  - -Select cells susceptible to flash errors  $(V_{ref}-\sigma < V_{th} < V_{ref}-\sigma)$
  - —Predict among these susceptible cells
    - Cells with more  $V_{th}$  shifts are disturb-prone  $\rightarrow$  Higher  $V_{th}$  state
    - Cells with less V<sub>th</sub> shifts are disturb-resistant → Lower V<sub>th</sub> state

#### RDR Evaluation



Reduce total error counts up to 36% @ 1M read disturbs ECC can be used to correct the remaining errors

#### Outline

- Background (Problem and Goal)
- Key Experimental Observations
- Mitigation: V<sub>pass</sub> Tuning
- Recovery: Read Disturb Oriented Error Recovery
- Conclusion

#### **Executive Summary**

- Read disturb errors limit flash memory lifetime today
  - Apply a high pass-through voltage ( $V_{pass}$ ) to multiple pages on a read
- We characterize read disturb on real NAND flash chips
  - -Slightly lowering V<sub>pass</sub> greatly reduces read disturb errors
  - -Some flash cells are more prone to read disturb
- Technique 1: Mitigate read disturb errors online
  - $-V_{pass}$  Tuning dynamically finds and applies a lowered  $V_{pass}$
  - -Flash memory lifetime improves by 21%
- Technique 2: Recover after failure to prevent data loss
  - Read Disturb Oriented Error Recovery (RDR) selectively corrects cells more susceptible to read disturb errors
  - -Reduces raw bit error rate (RBER) by up to 36%

## **Read Disturb Errors**

in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

Yu Cai, **Yixin Luo**, Saugata Ghose, Erich F. Haratsch\*, Ken Mai, Onur Mutlu Carnegie Mellon University, \*Seagate Technology



Carnegie Mellon

