



#### JULY 21-24 • SAN JOSE, CA

### ONFi: Achieving Breakthrough NAND Performance

Amber Huffman Principal Engineer Intel Corporation



# Agenda

- ONFi Workgroup Update
  - Mission and membership
  - ONFi 2.0 specification completed
  - NAND connector specification completed
  - JEDEC collaboration established
- High-speed NAND interface details
- Extending the high-speed interface





## Agenda

- ONFi Workgroup Update
  - Mission and membership
  - ONFi 2.0 specification completed
  - NAND connector specification completed
  - JEDEC collaboration established
- High-speed NAND interface details
- Extending the high-speed Interface





# Mission... Possible.

- NAND has been the only commodity memory with no standard interface
   Command set, timings, and pin-out are similar among vendors
- NAND has been ripe for standardization due to a few inflection points:
  - 1) Explosion in use of NAND for MP3 players, phones, caches, and SSDs
  - 2) Increase in number of NAND vendors serving the market (2 to 6+)
- ONFi Workgroup was formed in May 2006 to address "gap"
  - ONFI = Open NAND Flash Interface
- ONFi revision 1.0 delivered in December 2006, defines:
  - Uniform NAND electrical and protocol interface
    - Raw NAND component interface for embedded use
    - Includes timings, electrical, protocol
    - Standardized base command set
  - Uniform mechanism for device to report its capabilities to the host

ONFI 1.0 sets a solid foundation for NAND (r)evolution.

DENALI SOFTWARE, INC



OPEN NAND FLASH INTERFACE

Micron

intel

SONY HUNIX SPANSION

A-Data Aleph One **Arasan Chip Systems Avid Electronics** Chipsbank DatalO Entorian Foxconn Hagiwara Sys-Com **Hyperstone** Inphi Jinvani Systech Lotes Marvell Moai Electronics **Orient Semiconductor** POI Sandforce Sigmatel Silicon Storage Tech Smart Modular Tech. Synopsys Telechips

**Transcend Information** 

**University of York** 

Afa Technologies Anobit Tech. **ASMedia Technology BitMicro** Cypress Datalight FCI **Fusion Media Tech** HiperSem InCOMM Intelliprop **Kingston Technology** LSI **Mentor Graphics** Molex P.A. Semi **Prolific Technology** Seagate Silicon Integrated Systems STEC Solid State System Tandon Teradyne, Inc. Тусо **Virident Systems** 

Alcor Micro Apacer ATI **Biwin Technology DataFab Systems** Denali Software FormFactor **Genesys Logic Hitachi GST** Indilinx **ITE Tech** Lauron Technologies Macronix Metaram **NVidia** Powerchip Semi. Qimonda Shenzhen Netcom Silicon Motion Skymedi **Super Talent Electronics** Tanisys Testmetrix **UCA Technology** WinBond

ONFI continues to grow with over 80 members.

Members

\*Other names and brands may be claimed as the property of others

N

## **ONFi 2.0 Delivers the Speed You Need**

- ONFi 2.0 was published on February 27<sup>th</sup>
  - Available at www.onfi.org
- Adds a synchronous DDR interface option for high speed
  - Up to 133 MT/s in current generation
  - 3.3V and 1.8V VccQ options
  - BGA package optimized for high speed
- Many more details to come later in the presentation...



ONFI 2.0 triples the legacy NAND interface speed.

DENALI SOFTWARE, INC.

### **Towards NAND Ubiquity in PCs...**

- As NAND becomes ubiquitous in PCs, the NAND controller will be integrated with the platform (rather than on a PCIe add-in card)
- It is not necessarily desirable to solder NAND to the platform, though
  - Need to offer customers capacity and feature choices
  - Dynamically changing price of NAND
- Desirable to have an "unregistered, unbuffered DIMM" for NAND
  - Provides a cost effective method to offer choice with the platform



# **Connector Specification Complete!**

- The ONFi Workgroup published Revision 1.0 of the connecter specification on April 23<sup>rd</sup>
  - Available at www.onfi.org
- The ONFi connector leverages existing memory connectors
  - Avoids major tooling costs
  - Re-uses electrical verification
  - Ensures low cost and fast TTM



The ONFI connector and module are key building blocks for pervasive use of NAND in PC platforms.

Μ

# **JEDEC and ONFi Launch Collaboration**

- The ONFi Workgroup is pleased to team up with JEDEC on NAND standardization moving forward
- ONFi is submitting the ONFi 2.0 specification as part of the joint effort







# Agenda

- ONFi Workgroup Update
  - Mission and membership
  - ONFI 2.0 specification completed
  - NAND connector specification completed
  - JEDEC collaboration established
- High-speed NAND interface details
- Extending the high-speed interface





# **The Outdated Legacy NAND Interface**

- NAND performance is determined by two elements
  - NAND array access time
  - Data transfer time across the bus
- For legacy NAND reads, the dominant factor is the bus!
  - Performance is limited to 40 MB/s
  - With interface improvements data could be read at over 150 MB/s
- The issue gets <u>significantly worse</u> as page size increases

Legacy NAND Interface Bottleneck



Even with pipelined reads, the NAND array sits idle for  $80 + \mu s$  while data is transferred to the host...



### ONFi 2.0 To The Rescue...

### **Project Goal**

- Develop a <u>scalable</u> and <u>backwards compatible</u> high speed interface that <u>does not require a DLL</u> on the NAND device
  - Performance goal: 133 MT/s initially, with scalability to at least 400 MT/s over several generations

### The Result

- ONFi 2.0 source synchronous data interface
  - Scalable
  - Backwards compatible
  - No DLL on the device

| Interface Roadmap |            |  |  |  |  |  |  |
|-------------------|------------|--|--|--|--|--|--|
| Legacy            | 40 MB/s    |  |  |  |  |  |  |
| Gen1              | ~ 133 MB/s |  |  |  |  |  |  |
| Gen2              | ~ 266 MB/s |  |  |  |  |  |  |
| Gen3              | 400 MB/s + |  |  |  |  |  |  |



# **Enabling a Seamless Transition**

- Source synchronous is backwards compatible with the legacy NAND interface to enable:
  - a) An orderly discovery process
  - b) To allow NAND parts to support both interfaces during transition
  - c) To allow host to support either type of NAND easily
- NAND pins are re-purposed when source synchronous is selected
  - WE# is used as a clock for data input and output (clock used when I/O is active)
  - RE# is used to indicate the direction of data transfer and bus ownership
  - A strobe (DQS) is added for latching data input and output (the only new signal)
- The pins were named using traditional DRAM nomenclature to make the interface easier to understand for those with a DRAM background

| Sym                            | Symbol  |       |                                        |  |  |  |  |
|--------------------------------|---------|-------|----------------------------------------|--|--|--|--|
| Traditional Source synchronous |         | Туре  | Description                            |  |  |  |  |
| I/O[7:0]                       | DQ[7:0] | I/O   | Data inputs/outputs                    |  |  |  |  |
| _                              | DQS     | I/O   | Data strobe                            |  |  |  |  |
| WE#                            | CLK     | Input | Write enable => Clock                  |  |  |  |  |
| RE#                            | W/R#    | Input | Read enable => Write / Read# direction |  |  |  |  |



### **Source Synchronous Discovery**

#### Using asynchronous SDR:

- Read ID is used to identify the device supports ONFI
- Read Parameter page identifies that source synchronous is supported
- The host selects source synchronous using Set Features
- The host then enjoys using the high-speed source synchronous interface

#### Parameter Page Information

| Byte    | O/M | Description                                                |  |  |  |  |  |  |
|---------|-----|------------------------------------------------------------|--|--|--|--|--|--|
| 6-7     | М   | eatures supported                                          |  |  |  |  |  |  |
|         |     | 6-15 Reserved (0)                                          |  |  |  |  |  |  |
|         |     | 5 1 = supports source synchronous                          |  |  |  |  |  |  |
|         |     | 4 1 = supports odd to even page Copyback                   |  |  |  |  |  |  |
|         |     | 3 1 = supports interleaved operations                      |  |  |  |  |  |  |
|         |     | 2 1 = supports non-sequential page programming             |  |  |  |  |  |  |
|         |     | 1 1 = supports multiple LUN operations                     |  |  |  |  |  |  |
|         |     | 0 1 = supports 16-bit data bus width                       |  |  |  |  |  |  |
| 141-142 | 0   | Source synchronous timing mode support                     |  |  |  |  |  |  |
|         |     | 4-15 Reserved (0)                                          |  |  |  |  |  |  |
|         |     | 3 1 = supports timing mode 3                               |  |  |  |  |  |  |
|         |     | 2 1 = supports timing mode 2                               |  |  |  |  |  |  |
|         |     | 1 1 = supports timing mode 1                               |  |  |  |  |  |  |
|         |     | 0 1 = supports timing mode 0                               |  |  |  |  |  |  |
| 143     | 0   | Source synchronous features                                |  |  |  |  |  |  |
|         |     | 2-7 Reserved (0)                                           |  |  |  |  |  |  |
|         |     | <ol> <li>1 = typical capacitance values present</li> </ol> |  |  |  |  |  |  |
|         |     | 0 tCAD value to use                                        |  |  |  |  |  |  |
| 144-145 | 0   | CLK input pin capacitance, typical                         |  |  |  |  |  |  |
| 146-147 | 0   | I/O pin capacitance, typical                               |  |  |  |  |  |  |
| 148-149 | 0   | Input pin capacitance, typical                             |  |  |  |  |  |  |
| 150     | М   | Input pin capacitance, maximum                             |  |  |  |  |  |  |

| Timing Mode     | Mode 0 | Mode 1 | Mode 2 | Mode 3 | Unit |
|-----------------|--------|--------|--------|--------|------|
| tCK             | 50     | 30     | 20     | 15     | ns   |
| CLK frequency   | ~20    | ~33    | ~50    | ~66    | MHz  |
| Interface speed | 40     | 66     | 100    | 133    | MT/s |



# Commands

- Asynchronous SDR and source synchronous DDR command issue is very similar
- tCAD timing parameter ensures 25 ns or 45 ns is provided to process each command and address
  - Avoids redoing command state machine
  - Avoids host needing to pulse WE# at different rates for cmd/addr and data
- Traditional parameters, like tCCS, tRHW, tWHR, etc are still used
- Data strobe (DQS) is not used for command/address cycles
- Reset (FFh) always issued in asynchronous SDR mode to reset the device





### **Data Phase**

- Data is transferred on each edge of DQS
- DQS marks where the receiver should latch the data
- ALE/CLE takes on new meaning
  - 00b: Idle

MEMCO

- 11b: Data



# Data Output (Reads from Device)

- Device returns two bytes for each CLK period where ALE/CLE are 11b
- Data output is latched by the host on each edge of the strobe (DQS)
  - DQ and DQS are transmitted edge aligned for ease of NAND implementation
- Device may take up to 20 ns (tDQSCK) to output data



# **Data Input (Writes to Device)**

- Data input is latched on each edge of the strobe (DQS)
  - A strobe corresponds to each clock period where ALE/CLE is 11b
  - Bi-directional strobe used to latch data to ensure loading matched to achieve scalable solution



### Impedance, Slew Rates and Robustness

- Interoperability at higher speeds is critical, especially with the connector
  - Important to make sure design can work across long trace lengths and deal with discontinuities introduced
- Input slew rates, output slew rates, and impedance values are specified
  - Ensures robust interoperable designs can be delivered



| Description | VOUT to<br>VssQ | Maximum | Nominal | Minimum | Unit |
|-------------|-----------------|---------|---------|---------|------|
|             | 0.2 x VccQ      | 95.0    | 39.0    | 21.5    | Ohms |
| R_pulldown  | 0.5 x VccQ      | 90.0    | 50.0    | 26.0    | Ohms |
|             | 0.8 x VccQ      | 126.5   | 66.5    | 31.5    | Ohms |
| R_pullup    | 0.2 x VccQ      | 126.5   | 66.5    | 31.5    | Ohms |
|             | 0.5 x VccQ      | 90.0    | 50.0    | 26.0    | Ohms |
|             | 0.8 x VccQ      | 95.0    | 39.0    | 21.5    | Ohms |



# **TSOP** and Its Limitations...

- TSOP was extended in a straightforward manner to support source synchronous
  - DQS was added on pin 35, all other pins in same location
  - VccQ and VssQ locations also defined
- However, TSOP is not suitable as speed continues to scale past 100 MT/s...
  - Package has high input capacitance due to single-sided bond pads
  - Typically up to four die can be stacked in the package leading to higher capacitance
  - Lower cost package construction than DRAM TSOP

| Ssync                                                                                                                                             | Async                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                   | Async                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Ssync                                                                                                                                                               |
|---------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| R<br>R<br>R/B4#<br>R/B3#<br>R/B1#<br>W/R#<br>CE1#<br>CE2#<br>R<br>Vcc<br>Vss<br>CE3#<br>CLE<br>ALE<br>CLK<br>WP#<br>VSP3<br>R<br>R<br>R<br>R<br>R | R       1       )         R       2         R       3         R/B4#       4         R/B3#       5         R/B2#       6         R/B1#       7         RE#       8         CE1#       9         CE2#       10         R       11         Vcc       12         Vss       13         CE3#       14         CE4#       15         CLE       16         ALE       17         WE#       18         WP#       19         VSP3       20         R       21         R       23         R       23         R       24 | 48-pin TSOP<br>and<br>48-pin WSOP | 48       VssQ         47       R         46       R         45       R         44       IO7         43       IO6         42       IO5         41       IO4         40       R         39       VccQ         38       VSP1         37       Vcc         36       Vss         35       VSP2*         34       VccQ         33       R         32       IO3         31       IO2         30       IO1         29       IO0         28       R         27       R         26       R         25       VssQ | VssQ<br>R<br>R<br>DQ7<br>DQ6<br>DQ5<br>DQ4<br>R<br>VccQ<br>VSP1<br>Vcc<br>VSS<br>DQS<br>VccQ<br>R<br>DQ3<br>DQ2<br>DQ1<br>DQ0<br>R<br>R<br>R<br>R<br>R<br>R<br>VssQ |

# **BGA Package**

 ONFi 2.0 defines a BGA package optimized for source synchronous

#### Attributes:

- Dual 8-bit interface
- More power/ground balls for lower noise
- Signals arranged with excellent signal integrity in mind
- 1mm ball spacing for low cost PCB assembly
- Two package outline options supported for increasing densities

MEMCON08

|   | 1 | 2     | 3     | 4      | 5      | 6      | 7      | 8     | 9     | 10 |
|---|---|-------|-------|--------|--------|--------|--------|-------|-------|----|
| А | R | R     |       |        |        |        |        |       | R     | R  |
| В | R |       |       |        |        |        |        |       |       | R  |
| С |   |       |       |        |        |        |        |       |       |    |
| D |   | R     | RFT   | VSP3-2 | WP2#   | VSP2-2 | VSP1-2 | RFT   | R     |    |
| Е |   | R     | RFT   | VSP3-1 | WP1#   | VSP2-1 | VSP1-1 | RFT   | R     |    |
| F |   | VCC   | VCC   | VCC    | VCC    | VCC    | VCC    | VCC   | VCC   |    |
| G |   | VSS   | VSS   | VSS    | VSS    | VSS    | VSS    | VSS   | VSS   |    |
| Н |   | VSSQ  | VCCQ  | VREFQ2 | VREFQ1 | R/B2#  | R/B4#  | VCCQ  | VSSQ  |    |
| J |   | DQ0-2 | DQ2-2 | ALE2   | CE4#   | R/B#   | R/B3#  | DQ5-2 | DQ7-2 |    |
| К |   | DQ0-1 | DQ2-1 | ALE1   | CE3#   | CE2#   | CE#    | DQ5-1 | DQ7-1 |    |
| L |   | VCCQ  | VSSQ  | VCCQ   | CLE2   | W/R2#  | VCCQ   | VSSQ  | VCCQ  |    |
| Μ |   | DQ1-2 | DQ3-2 | VSSQ   | CLE1   | W/R1#  | VSSQ   | DQ4-2 | DQ6-2 |    |
| Ν |   | DQ1-1 | DQ3-1 | DQS2#  | DQS2   | CLK2#  | CLK2   | DQ4-1 | DQ6-1 |    |
| Ρ |   | VSSQ  | VCCQ  | DQS1#  | DQS1   | CLK1#  | CLK1   | VCCQ  | VSSQ  |    |
| R |   |       |       |        |        |        |        |       |       |    |
| Т | R |       |       |        |        |        |        |       |       | R  |
| U | R | R     |       |        |        |        |        |       | R     | R  |

# Agenda

- ONFi Workgroup Update
  - Mission and membership
  - ONFi 2.0 specification completed
  - NAND connector specification completed
  - JEDEC collaboration established
- High-speed NAND interface details
- Extending the high-speed interface





### **There is Head Room**

 Why stop at 133 MT/s? Successful operation at 166 MT/s for 8 NAND die with a connector!





MEMCONO denali software, ind

# Enter ONFi 2.1...

- Work on the 2.1 specification started early this year, with target completion in 2H'08
- Recognizes head room by defining 166 MT/s and 200 MT/s timing modes
- Includes additional new features, like an interleaved read command to continue moving the industry forward

| Parameter       | Moo | de O                                                                   | Mo  | ode 1 Mode 2 |     | Mode 3 |     | Mode 4 |     | Mode 5 |      | Unit |     |
|-----------------|-----|------------------------------------------------------------------------|-----|--------------|-----|--------|-----|--------|-----|--------|------|------|-----|
|                 | 5   | 0                                                                      | 3   | 0            | 2   | 20     |     | 15     |     | 12     |      | 0    | ns  |
|                 | ~2  | 20                                                                     | ~   | 33           | ~   | 50     | ~66 |        | ~83 |        | ~100 |      | MHz |
|                 | Min | Max                                                                    | Min | Max          | Min | Max    | Min | Max    | Min | Max    | Min  | Max  |     |
| tAC             | —   | 20                                                                     | —   | 20           | _   | 20     | —   | 20     | _   | 20     | —    | 20   | ns  |
| tADL            | 100 | _                                                                      | 100 | _            | 70  | _      | 70  | _      | 70  | _      | 70   | _    | ns  |
| tCADf           | 25  | _                                                                      | 25  | _            | 25  | —      | 25  | _      | 25  | _      | 25   | —    | ns  |
| tCADs           | 45  | _                                                                      | 45  | _            | 45  | _      | 45  | _      | 45  | _      | 45   | _    | ns  |
| tCAH            | 10  | _                                                                      | 5   | _            | 4   | —      | 3   | -      | 2.5 | _      | 2    | —    | ns  |
| tCALH           | 10  | _                                                                      | 5   | _            | 4   | —      | 3   | _      | 2.5 | _      | 2    | —    | ns  |
| tCALS           | 10  | _                                                                      | 5   | _            | 4   | _      | 3   | _      | 2.5 | _      | 2    | —    | ns  |
| tCAS            | 10  | _                                                                      | 5   | _            | 4   | _      | 3   | _      | 2.5 | _      | 2    | —    | ns  |
| tCH             | 10  | _                                                                      | 5   | _            | 4   | _      | 3   | _      | 2.5 | _      | 2    | _    | ns  |
| tCK(avg) or tCK | 50  | _                                                                      | 30  | _            | 20  | _      | 15  | _      | 12  | —      | 10   | _    | ns  |
| tCK(abs)        |     | Minimum: tCK(avg) + tJIT(per) min<br>Maximum: tCK(avg) + tJIT(per) max |     |              |     |        |     |        |     |        |      |      | ns  |



DENALI SOFTWARE, INC

# Summary

- ONFi source synchronous is designed to preserve backwards compatibility, ease NAND transitions, and scale in speed across multiple generations
- Benefits of ONFi 2.0 solution:
  - Delivers 133 MT/s in first generation, scales up to 400 MT/s through straightforward techniques (like complementary signals)
  - Lower power by separating Vcc and VccQ, and lower VccQ (1.8V)
  - Backwards compatible with legacy NAND interface, including with TSOP
  - Standard BGA package designed to overcome speed limitations of TSOP
  - Ensures that NAND controllers can be confidently designed for future NAND devices through mechanisms like Read Parameter Page
- ONFi 2.1 scales performance to 166 MT/s and 200 MT/s

Achieve breakthrough performance with ONFI 2.0 today! For more information, visit www.onfi.org.

