Run key-value store with NO-SYNC mode #29819

rjl493456442 · 2024-05-22T08:20:54Z

Within Geth, two types of storage engines are used: key-value stores (LevelDB or Pebble) and append-only flat files (Freezer).

Currently, we employ a strict policy to ensure the best data durability. For instance, we perform a WAL fsync after each write to Pebble and fsync all relevant index files and data files after each write to Freezer. However there is always tradeoff. Perfoming fsync frequently could result in a performance degradation.

In this ticket, I want to explore the idea that Geth could survive an unclean shutdown through a recovery mechanism with a less strict data durability guarantee.

Before diving into the concrete solution, let's recap the storage engine properties we have.

Data durability

Key-value store offers us different options for data durability, typically two modes:

(a) SYNC mode

The Write-Ahead Log (WAL) will be written and synced for each write operation. This make sure all the written data has been transferred to disk.

(b) NO-SYNC mode

In LevelDB, no-sync means the write operation will be marked as completed without WAL fsync. The recent write could be lost if machine is crashed, but if process is crashed, nothing will be lost.

In Pebble, no-sync means the write operation will be marked as completed without waiting data being written to WAL. The main difference is if process is crashed, the recent write will be lost.

If key-value store is run with NO-SYNC mode, recent write could be lost due to unclean shutdown.

Write order

Within each storage engine, ensuring the validity of the write order is straightforward. For instance, the key-value store uses a Write-Ahead Log (WAL) to maintain the order of writes, even after an unclean shutdown. While some recent writes might be lost due to a process or machine crash, the integrity of the order is preserved. In the case of the Freezer, entries are written to files in an append-only manner, allowing for the truncation of corrupted entries. Thus, we can be confident that each storage engine maintains order correctness.

However, preserving the write order between different storage engines is more challenging. Theoretically, a Write-Ahead Log (WAL) should be used to aggregate writes across different storage engines. Obviously we don't have that now.

In summary, losing recent writes is acceptable, but we must ensure that geth has a corresponding recovery mechanism to recover from an unclean shutdown.

Now let's explore the recovery mechanisms in Geth

Chain Freezer

The chain freezer is used to store ancient chain segments. There are two scenarios where we use the chain freezer:

(a) Old chain segments are migrated from the key-value store to the freezer. The data is written to the freezer first along with an fsync operation. In this scenario, the worst case is that the key-value delete operation may be lost, resulting in dangling chain data being left behind. However, we can ensure that the data is correctly migrated to the freezer. Therefore, we don't need the SYNC mode for the key-value store here.

(b) Chain segments are written directly to the freezer during snap sync.

Again, chain segments are written first along with an fsync operation, then the key-value store is manipulated (e.g., updating the chain head). In this scenario, the worst case is that the key-value write operation (chain head updating) may be lost, and the extra chain segment above the chain head will be truncated on the next start. Therefore, we don't need the SYNC mode for the key-value store here.

State Freezer

The state freezer is used to store state diffs caused by state transitions. Whenever a diff layer is merged into the bottom-most layer (still in memory), the associated state diffs are written to the freezer. If an unclean shutdown occurs, the state data in memory will be lost, and state diffs above the persistent state will be truncated.

However, we need to ensure that the state freezer is fsync'd before the persistent state is modified. Otherwise, we can pessimistically assume that the write in the key-value store is flushed to disk while the freezer is not, which could lead to state corruption.

In summary, we can ensure that Geth can survive an unclean shutdown (process crash or machine crash) even without SYNC-mode protection. We don't need to enforce a strict policy for write order across the two storage engines. Instead, we just need to ensure that any extra data in the freezer can be correctly truncated during recovery.

rjl493456442 added the type:feature label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run key-value store with NO-SYNC mode #29819

Run key-value store with NO-SYNC mode #29819

rjl493456442 commented May 22, 2024 •

edited

Loading

Run key-value store with NO-SYNC mode #29819

Run key-value store with NO-SYNC mode #29819

Comments

rjl493456442 commented May 22, 2024 • edited Loading

rjl493456442 commented May 22, 2024 •

edited

Loading