Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Support for database schema version ranges #9933

Merged
merged 8 commits into from
Jun 11, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@
- [Git Usage](dev/git.md)
- [Testing]()
- [OpenTracing](opentracing.md)
- [Database Schemas](development/database_schema.md)
- [Synapse Architecture]()
- [Log Contexts](log_contexts.md)
- [Replication](replication.md)
Expand All @@ -84,4 +85,4 @@
- [Scripts]()

# Other
- [Dependency Deprecation Policy](deprecation_policy.md)
- [Dependency Deprecation Policy](deprecation_policy.md)
95 changes: 95 additions & 0 deletions docs/development/database_schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Synapse database schema files

Synapse's database schema is stored in the `synapse.storage.schema` module.

## Logical databases

Synapse supports splitting its datastore across multiple physical databases (which can
be useful for large installations), and the schema files are therefore split according
to the logical database they are apply to.
richvdh marked this conversation as resolved.
Show resolved Hide resolved

At the time of writing, the following "logical" databases are supported:

* `state` - used to store Matrix room state (more specifically, `state_groups`,
their relationships and contents.)
richvdh marked this conversation as resolved.
Show resolved Hide resolved
* `main` - stores everything else.

Addionally, the `common` directory contains schema files for tables which must be
richvdh marked this conversation as resolved.
Show resolved Hide resolved
present on *all* physical databases.

## Synapse schema versions

Synapse manages its database schema via "schema versions". These are mainly used to
help avoid confusion if the Synapse codebase is rolled back after the database is
updated. They work as follows:

* The Synapse codebase defines a constant `synapse.storage.schema.SCHEMA_VERSION`
which represents the expectations made about the database by that version. For
example, as of Synapse v1.36, this is `59`.

* The database stores a "compatibility version" in
`schema_compat_version.compat_version` which defines the `SCHEMA_VERSION` of the
oldest version of Synapse which will work with the database. On startup, if
`compat_version` is found to be newer than `SCHEMA_VERSION`, Synapse will refuse to
start.

Synapse automatically updates this field from
`synapse.storage.schema.SCHEMA_COMPAT_VERSION`.

* Whenever a backwards-incompatible change is made to the database format (normally
via a `delta` file), `synapse.storage.schema.SCHEMA_COMPAT_VERSION` is also updated
so that administrators can not accidentally roll back to a too-old version of Synapse.

Generally, the goal is to maintain compatibility with at least one or two previous
releases of Synapse, so any substantial change tends to require multiple releases and a
bit of forward-planning to get right.

As a worked example: we want to remove the `room_stats_historical` table. Here is how it
might pan out.

1. Replace any code that *reads* from `room_stats_historical` with alternative
implementations, but keep writing to it in case of rollback to an earlier version.
Also, increase `synapse.storage.schema.SCHEMA_VERSION`. In this
instance, there is no existing code which reads from `room_stats_historical`, so
our starting point is:

v1.36.0: `SCHEMA_VERSION=59`, `SCHEMA_COMPAT_VERSION=59`

2. Next (say in Synapse v1.37.0): remove the code that *writes* to
`room_stats_historical`, but don’t yet remove the table in case of rollback to
v1.36.0. Again, we increase `synapse.storage.schema.SCHEMA_VERSION`, but
because we have not broken compatibility with v1.36, we do not yet update
`SCHEMA_COMPAT_VERSION`. We now have:

v1.37.0: `SCHEMA_VERSION=60`, `SCHEMA_COMPAT_VERSION=59`.

3. Later (say in Synapse v1.38.0): we can remove the table altogether. This will
break compatibility with v1.36.0, so we must update `SCHEMA_COMPAT_VERSION` accordingly.
There is no need to update `synapse.storage.schema.SCHEMA_VERSION`, since there is no
change to the Synapse codebase here. So we end up with:

v1.38.0: `SCHEMA_VERSION=60`, `SCHEMA_COMPAT_VERSION=60`.

If in doubt about whether to update `SCHEMA_VERSION` or not, it is generally best to
lean towards doing so.

## Full schema dumps

In the `full_schemas` directories, only the most recently-numbered snapshot is used
(`54` at the time of writing). Older snapshots (eg, `16`) are present for historical
reference only.

### Building full schema dumps

If you want to recreate these schemas, they need to be made from a database that
has had all background updates run.

To do so, use `scripts-dev/make_full_schema.sh`. This will produce new
`full.sql.postgres` and `full.sql.sqlite` files.

Ensure postgres is installed, then run:

./scripts-dev/make_full_schema.sh -p postgres_username -o output_dir/

NB at the time of writing, this script predates the split into separate `state`/`main`
databases so will require updates to handle that correctly.
14 changes: 13 additions & 1 deletion synapse/storage/prepare_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from synapse.storage.database import LoggingDatabaseConnection
from synapse.storage.engines import BaseDatabaseEngine
from synapse.storage.engines.postgres import PostgresEngine
from synapse.storage.schema import SCHEMA_VERSION
from synapse.storage.schema import SCHEMA_COMPAT_VERSION, SCHEMA_VERSION
from synapse.storage.types import Cursor

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -374,6 +374,18 @@ def _upgrade_existing_database(
assert config is not None
check_database_before_upgrade(cur, database_engine, config)

# update schema_compat_version before we run any upgrades, so that if synapse
# gets downgraded again, it won't try to run against the upgraded database.
if (
current_schema_state.compat_version is None
or current_schema_state.compat_version < SCHEMA_COMPAT_VERSION
):
cur.execute("DELETE FROM schema_compat_version")
cur.execute(
"INSERT INTO schema_compat_version(compat_version) VALUES (?)",
(SCHEMA_COMPAT_VERSION,),
)

start_ver = current_schema_state.current_version

# if we got to this schema version by running a full_schema rather than a series
Expand Down
90 changes: 2 additions & 88 deletions synapse/storage/schema/README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,4 @@
# Synapse Database Schemas

This directory contains the schema files used to build Synapse databases.

Synapse supports splitting its datastore across multiple physical databases (which can
be useful for large installations), and the schema files are therefore split according
to the logical database they are apply to.

At the time of writing, the following "logical" databases are supported:

* `state` - used to store Matrix room state (more specifically, `state_groups`,
their relationships and contents.)
* `main` - stores everything else.

Addionally, the `common` directory contains schema files for tables which must be
present on *all* physical databases.

## Synapse schema versions

Synapse manages its database schema via "schema versions". These are mainly used to
help avoid confusion if the Synapse codebase is rolled back after the database is
updated. They work as follows:

* The Synapse codebase defines a constant `synapse.storage.schema.SCHEMA_VERSION`
which represents the expectations made about the database by that version. For
example, as of Synapse v1.33, this is `59`.

* The database stores a "compatibility version" in
`schema_compat_version.compat_version` which defines the `SCHEMA_VERSION` of the
oldest version of Synapse which will work with the database. On startup, if
`compat_version` is found to be newer than `SCHEMA_VERSION`, Synapse will refuse to
start.

* Whenever a backwards-incompatible change is made to the database format (normally
via a `delta` file), `schema_compat_version.compat_version` is also updated so that
administrators can not accidentally roll back to a too-old version of Synapse.

Generally, the goal is to maintain compatibility with at least one or two previous
releases of Synapse, so any substantial change tends to require multiple releases and a
bit of forward-planning to get right.

As a worked example: we want to remove the `room_stats_historical` table. Here is how it
might pan out.

1. Replace any code that *reads* from `room_stats_historical` with alternative
implementations, but keep writing to it in case of rollback to an earlier version.
Also, increase `synapse.storage.schema.SCHEMA_VERSION`. In this
instance, there is no existing code which reads from `room_stats_historical`, so
our starting point is:

v1.33.0: `SCHEMA_VERSION=59`, `compat_version=59`

2. Next (say in Synapse v1.34.0): remove the code that *writes* to
`room_stats_historical`, but don’t yet remove the table in case of rollback to
v1.33.0. Again, we increase `synapse.storage.schema.SCHEMA_VERSION`, but
because we have not broken compatibility with v1.33, we do not yet update
`compat_version`. We now have:

v1.34.0: `SCHEMA_VERSION=60`, `compat_version=59`.

3. Later (say in Synapse v1.36.0): we can remove the table altogether. This will
break compatibility with v1.33.0, so we must update `compat_version` accordingly.
There is no need to update `synapse.storage.schema.SCHEMA_VERSION`, since there is no
change to the Synapse codebase here. So we end up with:

v1.36.0: `SCHEMA_VERSION=60`, `compat_version=60`.

If in doubt about whether to update `SCHEMA_VERSION` or not, it is generally best to
lean towards doing so.

## Full schema dumps

In the `full_schemas` directories, only the most recently-numbered snapshot is useful
(`54` at the time of writing). Older snapshots (eg, `16`) are present for historical
reference only.

## Building full schema dumps

If you want to recreate these schemas, they need to be made from a database that
has had all background updates run.

To do so, use `scripts-dev/make_full_schema.sh`. This will produce new
`full.sql.postgres` and `full.sql.sqlite` files.

Ensure postgres is installed, then run:

./scripts-dev/make_full_schema.sh -p postgres_username -o output_dir/

NB at the time of writing, this script predates the split into separate `state`/`main`
databases so will require updates to handle that correctly.
This directory contains the schema files used to build Synapse databases. For more
information, see /docs/development/database_schema.md.
8 changes: 8 additions & 0 deletions synapse/storage/schema/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,11 @@
See `README.md <synapse/storage/schema/README.md>`_ for more information on how this
works.
"""


SCHEMA_COMPAT_VERSION = 59
"""Limit on how far the synapse codebase can be rolled back without breaking db compat

This value is stored in the database, and checked on startup. If the value in the
database is greater than SCHEMA_VERSION, then Synapse will refuse to start.
"""
17 changes: 0 additions & 17 deletions synapse/storage/schema/common/delta/59/13schema_compat_version.sql

This file was deleted.