Skip to content

Update Time Travelling Queries Guide #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 85 additions & 20 deletions docs/defradb/guides/time-traveling-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,21 @@
sidebar_label: Time Traveling Queries Guide
sidebar_position: 40
---
# A Guide to Time Traveling Queries in DefraDB
# Time Traveling Queries in DefraDB

## Overview
Time Traveling queries allow users to query previous states of documents within the query interface. Essentially, it returns data as it had appeared at a specific commit. This is a powerful tool as it allows users to inspect and verify arbitrary states and time regardless of the number of updates made or who made these updates if the user has the current state. Since a current state is always going to be based on some previous state and that previous state is going to be based on another previous state, hence time-traveling queries provide the ability to “go back in time” and look at previous states with minimal changes to the working of the query. A special quality of this query is that there is minimal distinction between a regular query run versus a time-traveling query since both apply almost the same logic to fetch the result of the query.

## Background
Applications built with **local-first software** frequently require the ability to inspect how a piece of information evolved over time. This is especially critical in environments where data is collaboratively managed and replicated across edge compute devices. Traditional data management systems fall short in this area—once a document is updated, the prior state is typically lost unless manually backed up.

The Web2 stack has traditional databases, like Postgres or MySQL, that usually have the current state as the only state. Once a user makes an update, the previous state is overwritten. There is no way to retrieve it from the system, unless a snapshot is captured, which exists as an independent file in the backup. The only way to access previous states is by loading the backup onto the database and querying the previous state. Additionally, in traditional systems, this backup occurs only once every hour, once a day, or once a month. This results in a loss of the ability to introspect each update made in the database. Here, the time travel inquiry system provides an edge over the existing databases as the data model of this system is independent of the mechanism of creating snapshots or backups that a user would utilize as part of natural maintenance and administration. The data model of time-traveling queries is such that every update is a function of all the previous updates. Essentially, there is no difference between inspecting the state of a document at a present point in time versus a previous point since the previous state is a function of the document graph.
**DefraDB** solves this problem by introducing **Time Traveling Queries**—a feature that allows entities to access the exact state of a document as it existed at any point in its version history. By leveraging cryptographic **Content Identifiers (CIDs)** and Conflict-free Replicated Data Types (CRDTs), DefraDB empowers edge applications to query historical document states natively and efficiently.

## Usage
---

## What Are Time Traveling Queries?

A powerful feature of a time-traveling query is that very little work is required from the developer to turn a traditional non-time-traveling query into a time-traveling query. Each update a document goes through gets a version identifier known as a Content Identifier (CID). CIDs are a function of the data model and are used to build out the time travel queries. These CIDs can be used to refer to a version that contains some piece of data. Instead of using some sort of human-invented notion of semantic version labels like Version 1, or Version 3.1 alpha, it uses the hash of the data as the actual identifier. The user can take the entire state of a document and create a single constant-sized CID. Each update in the document produces a new version number for the document, including a new version number for its individual fields. The developer then only needs to submit a new time-traveling query using the doc key of the document that it wants to query backward through its state, just like in a regular query, only here the developer needs to add the 32-bit hexadecimal version identifier that is expressed as it’s CID in an additional argument and the query will fetch the specific update that was made in the document.
A **Time Traveling Query** is a standard query with an added `cid` parameter that instructs DefraDB to return the document's state at that specific version. CIDs are derived from the document’s underlying data, ensuring immutability and content-addressable consistency across distributed environments.

```graphql
# Here we fetch a User of the given docID, in the state that it was at
# at the commit matching the given CID.
query {
User (
cid: "bafybeieqnthjlvr64aodivtvtwgqelpjjvkmceyz4aqerkk5h23kjoivmu",
Expand All @@ -29,32 +28,98 @@ query {
}
```

This retrieves the `User` document as it existed when the given CID was generated, regardless of any subsequent changes.

---

## Why It Matters

### Without Time Traveling Queries

In most cloud-based or traditional systems, document updates overwrite prior values. Even when backups or snapshots are used, they often lack precision, and access involves significant latency or operational complexity. This leads to:

- Lost visibility into the lifecycle of a document.
- No way to validate historical states for auditing or debugging.
- Difficulty satisfying compliance or traceability requirements.

### With Time Traveling Queries

By incorporating Time Traveling Queries into a local-first database model, DefraDB allows applications to:

- Inspect changes over time with fine-grained detail.
- Reproduce document states for validation, debugging, or historical analysis.
- Reduce reliance on separate backup mechanisms.
- Enable embedded edge software to operate independently while retaining auditability.

---

## How It Works

The mechanism behind time-traveling queries is based on the Merkel CRDT system and the data model of the documents discussed in the above sections. Each time a document is updated, a log of updates known as the Update graph is recorded. This graph consists of every update that the user makes in the document from the beginning till the Nth update. In addition to the document update graph, we also have an independent and individual update graph for each field of the document. The document update graph would capture the overall updates made in the document whereas the independent and individual update graphs would capture the changes made to a specific field of the document. The data model as discussed in the Usage section works in a similar fashion, where it keeps appending the updates of the document to its present state. So even if a user deletes any information in the document, this will be recorded as an update within the update graph. Hence, no information gets deleted from the document, as all updates are stored in the update graph.
Under the hood, DefraDB uses **Merkle CRDTs**—a data structure that combines the benefits of Merkle Directed Acyclic Graphs (DAGs) with Conflict-free Replicated Data Types. Each update to a document is recorded as a delta (a small payload of changes), and these updates form a chain that connects back to the genesis state.

When a Time Traveling Query is issued:

1. The engine identifies the target CID in the document's DAG.
2. It traverses back through the chain to reconstruct the document at that specific point in time.

[Merkle CRDT Guide](./merkle-crdt.md)
### Key Technical Features

Since we now have this update graph of changes, the query also takes its mechanism from the inherent properties of the Delta State Merkel CRDTs. Under this, the actual content of the update added by the user in the document is known as the Delta Payload. This delta payload is the amount of information that a user wants to go from a previous state to the very next state, where the value of the next state is set by some other user. For example, suppose a team of developers is working on a document and one of them wants to change the name of the document, then in this case, the delta payload of the new update would be the name of the document set by that user. Hence, the time-traveling queries work on two core concepts, the appending update graph and the delta payload which contains information that is required to go from the previous state to the next state. With both of these, whenever a user submits a regular query, the query caches the present state of the document within the database. And we internally issue a time-traveling query for the current state, with the only upside being that the user can submit a non-time-traveling query faster since a cached version of the same is already stored in the database. Thus, using this cached version of the present state of the document, the user can apply a time-traveling query using the CID of the specific version they want to query in the document. The database will then set the CID provided by the user as the Target State, a state at which the query will stop and go back to the beginning of the document, known as the Genesis State. The query will then apply all its operations until it reaches back to the Target State.
- **Immutable Snapshots**: CIDs are content-addressed hashes, guaranteeing the integrity of every version.
- **Field-Level Tracking**: Each field in a document has its own delta chain, allowing for extremely fine-grained reconstruction.
- **Embedded Replay Engine**: Even if a device is offline or disconnected from the infrastructure, local compute can reconstruct state from stored deltas.

The main reason behind setting a Target state is because the Merkel CRDT is a single-direction graph, and it only points backward in time. But to apply all the updates of all the delta payloads from the genesis to the target state, we need the query to track the existence of the target state as the present state of the target version can be a function of multiple operations. We thus perform a two-step process where it starts from the target version, goes to the genesis state, and comes back to the version. And from this, we produce the current present or the actual external facing state, also known as the serialized state.
---

## Use Cases

- **Application Debugging**: Quickly identify what changed and when.
- **Data Auditing**: Retrieve document state for compliance or historical verification.
- **Collaborative Editing**: Support undo functionality or version comparison within local-first applications.
- **Edge-Based Automation**: Allow devices embedded with local software to reason about historical context even in isolation.

---

## Limitations

1. Relational Limitation: A user will not be able to apply a time-traveling query to a series of related documents relating to the document that they are applying the query. For example, a person has some books and a list of their respective authors. An author can have many books under their name, but one book can be associated with one author only. Now, if a user applies a time-traveling to a specific version at some point in time of a particular book, it will only be able to query the state of that book and not the related state of its correlated author. A regular query, on the other hand, can go in-depth and present its values or get the state of the book and its correlated author. However, with the time-traveling query, the user will not be able to query beyond the exact state to which the query is applied.
### 1. No Relational Time Travel (Yet)

Currently, Time Traveling Queries apply only to individual documents. For example, retrieving a `Book` document at a past version will not automatically return its related `Author` document as it existed at the same time. Relationship-aware time travel is on the roadmap.

### 2. Performance Overhead

Time Traveling Queries reconstruct state by replaying changes from the document’s genesis. This incurs computational overhead proportional to the number of deltas. Optimization strategies, such as periodic snapshots, can reduce traversal cost.

2. Performance Limitation: As discussed earlier, the present state is stored as a cached version in the database, and based on this cached version, the current present state is computed. Hence the performance of the query depends on two factors:
---

## Future Enhancements

- **Snapshot Support**: Developers will be able to configure periodic state snapshots (e.g., every 1,000 deltas). This trades a small increase in data size for faster query execution.
- **Relational Time Travel**: Planned enhancements will allow traversing document relationships using doc keys within CRDT graphs. This will enable fetching a consistent multi-document state as it existed at a specific time.

---

## FAQs

a. The size of the update graph of the document
**Q: Is this like Git for documents?**
A: Conceptually, yes. Just like Git uses commit hashes to represent source code versions, DefraDB uses CIDs to represent document states.

b. The number of updates that are between the Genesis state and the Target state.
**Q: What’s the cost of storing all historical data?**
A: Each update is stored as a delta. These are highly compressed and space-efficient. Additionally, snapshotting can reduce retrieval times without significantly increasing size.

The larger the number of updates that exist between the Genesis state and the Target state, the longer it is going to take for the query to go back to the genesis state, perform its operations and come back to the target state. And hence, the time taken by the query to provide results would increase in proportion to the number of updates that are present between these two states.
**Q: Can I use this from a frontend app?**
A: Yes. Time Traveling Queries use the same query structure with the addition of a `cid`. They are fully accessible through any client that supports GraphQL-like syntax.

**Q: How are deletions handled?**
A: Deletions are treated as regular updates. They remain part of the document’s historical graph and are retrievable just like any other version.

## Future Outlook
**Q: Can I query linked documents in historical context?**
A: Not yet. However, relational graph traversal in time travel is a planned feature.

The future outlook for time-traveling queries focuses mainly on resolving the current limitations that this query faces. To navigate the relational limitation, the current data model being used the time-traveling query needs to be exposed to the underlying aspects of the Merkel CRDT data model. Here, taking the help of the example mentioned for this limitation in the previous section, the relationship between the author and their book can be expressed by using doc keys of two types. Therefore, book A, which has a particular doc key (doc key A) to represent it, and this book has its author B, will be represented by its different doc key (doc key B). So, whenever the user correlates that book A was published by author B, the user relates the doc keys in the relationship field of the update graph.
**Q: How does this help in local and embedded environments?**
A: Devices with embedded local software can retain full document histories and execute Time Traveling Queries without relying on cloud connections, ideal for environments with intermittent connectivity.

---

For the performance limitations, snapshots can be the next step toward the elimination of the performance limitation. Currently, we keep a cached version of the present or the current state of the document. Once an update is made, this cached version will get replaced with the current version. Therefore, at any given point in time, there would only be a single cached version of the current state. However, developers can choose to trade this space to decrease the time taken for the query execution. This can be achieved by creating snapshots at various points in the update history. For a document undergoing millions of updates, snapshots can be taken at every 1000th update and a cached version of this snapshot can be created such that if we need to query the 2000th update, we just need to go back to the closest snapshot instead of having to go back to the Genesis state and then moving 2000 states to get to the Target state. For example, if we need to query the 1010th update, then we only need to execute 10 steps backward and 10 steps forward from the cached update, i.e., the 1000th update. Therefore, depending on the interval of the cache set by the user, for every 'x' number of updates, they would be required to execute 'x' number of steps after the closest cached version of the snapshot.
## Conclusion

Time Traveling Queries bring powerful historical insights into local-first applications. Whether you're building collaborative editing tools, audit-traceable systems, or privacy-preserving edge infrastructure, this feature offers a novel and efficient way to access the past—securely and locally. By enabling immutable, CID-based versioning, DefraDB gives applications embedded with local compute the ability to understand not just the current state of data, but how it evolved.