Non-Volatile Memory (NVM) ‒ VLSC ‐ EPFL

New memory technologies are changing the computer systems landscape. Motivated by the power and volatility limitations of Dynamic Random Access Memory (DRAM), non-volatile memory (NVM) technologies — such as ReRAM, PCM, and STT-RAM — are being deployed in server and commodity computers. Memories built from these technologies can be directly accessible at the byte or word granularity and are also non-volatile. Thus, by putting these new memories on the CPU’s memory bus, the CPU can directly read and write non-volatile memory using load and store instructions. These memories erase the classical dichotomy between slow, non-volatile disks or SSDs and fast, volatile memory, greatly expanding the possible uses of durability mechanisms.

Taking advantage of non-volatility is not as simple as just writing data to NVM. Without programming support, it is challenging to write correct, efficient code that permits recovery after a power failure since the restart mechanism must find a consistent state in the durable storage. This problem is well-known in the database community, and a significant portion of a DB system is devoted to ensuring recoverability after failures.

NVM differs, however, because its writes are fine-grain, low-cost, and go directly to memory, leaving little opportunity for software intervention. A further complication is that a processor’s memory system reorders writes to NVM, making it challenging to ensure that program state, even when consistent in memory, is recorded consistently to durable storage. In the interest of high performance, processors employ caches and write buffers and store values to memory at unpredictable times. Consequently, stored values may not reach NVM in the same order in which a program executes them, which complicates capturing a consistent snapshot in durable storage.

Achieving performance comparable to well-designed data structures in non-persistent (transient) memory is difficult, primarily because of the cost of ensuring the order in which memory writes reach NVM. Often, this requires flushing data to NVM and waiting a full memory round-trip time. We introduce two new techniques: Fine-Grained Checkpointing, which ensures a consistent, quickly recoverable data structure in NVM after a system failure. In-Cache-Line Logging, an undo-logging technique that enables recovery of earlier state without requiring cache- line flushes in the normal case. We implemented these techniques for the Masstree data structure, making it persistent and demonstrating the ease of applying them to a highly optimized system and their low (5.9-15.4%) runtime overhead cost.

Preserving the consistent state of an application at termination is, however, only part of recovery. The application will use the persistent heap after a restart, which means that it must be restored to a proper state before the application restarts. Transient values need to be removed from durable objects. Pointers in a durable object may also become invalid after a crash if the durable heap is mapped to a different address in recovery. Rectifying these problems after a crash puts a significant burden on a programmer.

Existing NVM programming frameworks are intended for non-object-oriented languages such as C and do not gracefully support an object-oriented language such as C++. They lack support for standard features — transient fields, function pointers, and virtual methods — resulting in error-prone programming practices. We propose a new NVM language extension and runtime system that supports object-oriented programming and alleviates the programming pitfalls of prior approaches. At the heart of our approach is object reconstruction, which transparently restores a persistent object’s state during process restart.

Papers:

Fine-Grain Checkpointing with In Cache Line Logging
Nachshon Cohen, David Aksun, Hillel Avni, James Larus
ASPLOS 2019, Providence RI, April 2019.
https://infoscience.epfl.ch/record/263802?ln=en
Efficient Logging in Non-Volatile Memory by Exploiting Coherency Protocols
Nachshon Cohen, Michal Friedman, James Larus
SPLASH 2017 OOPSLA, Vancouver Canada, October 25-27, 2017.
http://infoscience.epfl.ch/record/231400
NVM ReConstruction: Object-Oriented Recovery for Non-Volatile Memory
Nachshon Cohen, David Aksun, James Larus
SPLASH 2018 at OOPSLA, Boston MA, November 7-9, 2018.
https://infoscience.epfl.ch/record/256836?&ln=en