Skip to content

Commit a74717f

Browse files
rkistnerkobiebotha
andauthored
Expand docs on self-hosting production setup (#98)
* Expand docs on self-hosting production setup. * Fix links. * Add docs on migrating between instances. * Apply suggestions from code review Co-authored-by: Kobie Botha <kobie@journeyapps.com> * Add more clarifications. * Fix typos. Thanks ChatGPT. --------- Co-authored-by: Kobie Botha <kobie@journeyapps.com>
1 parent df3fe74 commit a74717f

File tree

6 files changed

+168
-32
lines changed

6 files changed

+168
-32
lines changed

mint.json

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -358,9 +358,10 @@
358358
"pages": [
359359
"self-hosting/lifecycle-maintenance",
360360
"self-hosting/lifecycle-maintenance/securing-your-deployment",
361-
"self-hosting/lifecycle-maintenance/server-specs",
362361
"self-hosting/lifecycle-maintenance/healthchecks",
363-
"self-hosting/lifecycle-maintenance/telemetry"
362+
"self-hosting/lifecycle-maintenance/telemetry",
363+
"self-hosting/lifecycle-maintenance/migrating",
364+
"self-hosting/lifecycle-maintenance/multiple-instances"
364365
]
365366
},
366367
"self-hosting/enterprise",
@@ -575,6 +576,10 @@
575576
{
576577
"source": "/integration-guides/supabase",
577578
"destination": "/integration-guides/supabase-+-powersync"
579+
},
580+
{
581+
"source": "/self-hosting/lifecycle-maintenance/server-specs",
582+
"destination": "/self-hosting/lifecycle-maintenance"
578583
}
579584
],
580585
"footerSocials": {

self-hosting/installation/powersync-service-setup.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The PowerSync Service requires a storage backend for sync buckets. You can use e
2929

3030
### MongoDB Storage
3131

32-
MongoDB requires at least one replica set node. A single node is fine for development/staging environments, but a 3-node replica set is recommended [for production](/self-hosting/lifecycle-maintenance/server-specs).
32+
MongoDB requires at least one replica set node. A single node is fine for development/staging environments, but a 3-node replica set is recommended [for production](/self-hosting/lifecycle-maintenance).
3333

3434
[MongoDB Atlas](https://www.mongodb.com/products/platform/atlas-database) enables replica sets by default for new clusters.
3535

Lines changed: 115 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,118 @@
11
---
22
title: "Lifecycle / Maintenance"
3-
description: "Notes for sysadmins"
3+
description: "Self-hosting setup and maintenance"
44
sidebarTitle: Overview
55
---
66

7-
## Migrations
7+
## Minimal Setup
88

9-
Migrations run automatically by default.
9+
A minimal "development" setup (e.g. for a staging or a QA environment) is:
10+
11+
1. A single PowerSync "compute" container (API + replication) with 512MB memory, 1 vCPU.
12+
2. A single MongoDB node in replica set mode, 2GB memory, 1 vCPU. M10+ when using Atlas.
13+
3. Load balancer for TLS.
14+
15+
This setup has no redundancy. If the replica set fails, you may need to recreate it from scratch which will re-sync all clients.
16+
17+
## Production
18+
19+
For production, we recommend running a high-availability setup:
20+
21+
1. 1x PowerSync replication container, 1GB memory, 1 vCPU
22+
2. 2+ PowerSync API containers, 1GB memory each, 1vCPU each.
23+
3. A 3-node MongoDB replica set, 2+GB memory each. Refer to the MongoDB documentation for deployment requirements. M10+ when using Atlas.
24+
4. A load balancer with redundancy.
25+
5. Run a daily compact job.
26+
27+
For scaling up, add 1x PowerSync API container per 100 connections. The MongoDB replica set should be scaled based on CPU and memory usage.
28+
29+
### Replication Container
30+
31+
The replication container handles replicating from the source database to PowerSync's bucket storage.
32+
33+
The replication process is run using the docker command `start -r sync`, for example `docker run powersync start -r sync`.
34+
35+
Only one process can replicate at a time. If multiple are running concurrently, you may see an error `[PSYNC_S1003] Sync rules have been locked by another process for replication`.
36+
If you use rolling deploys, it is normal to see this error for a short duration while multiple processes are running.
37+
38+
Memory and CPU usage of the replication container is primarily driven by write load on the source database. A good starting point is 1GB memory and 1 vCPU for the container, but this may be scaled down depending on the load patterns.
39+
40+
Set the environment variable `NODE_OPTIONS=--max-old-space-size=800` for 800MB, or set to 80% of the total assigned memory if scaling up or down.
41+
42+
### API Containers
43+
44+
The API container handles streaming sync connections, as well as any other API calls.
45+
46+
The replication process is run using the docker command `start -r api`, for example `docker run powersync start -r api`.
47+
48+
Each API container is limited to 200 concurrent connections, but we recommend targeting 100 concurrent connections or less per container. This may change as we implement additional performance optimizations.
49+
50+
Memory and CPU usage of API containers are driven by:
51+
1. Number of concurrent connections.
52+
2. Number of buckets per connection.
53+
3. Amount of data synced to each connection.
54+
55+
A good starting point is 1GB memory and 1 vCPU per container, but this may be scaled up or down depending on the specific load patterns.
56+
57+
Set the environment variable `NODE_OPTIONS=--max-old-space-size=800` for 800MB, or set to 80% of the total assigned memory if scaling up or down.
58+
59+
### Compact Job
60+
61+
We recommend running a compact job daily as a cron job, or after any large maintenance jobs. For details, see the documentation on [Compacting Buckets](/usage/lifecycle-maintenance/compacting-buckets).
62+
63+
Run the compact job using the docker command `compact`, for example `docker run powersync compact`.
64+
65+
The compact job uses up to 1GB memory for compacting, if available. Set the environment variable `NODE_OPTIONS=--max-old-space-size=800` for 800MB, or set to 80% of the total assigned memory if scaling up or down.
66+
67+
### Load Balancer
68+
69+
A load balancer is required in front of the API containers to provide TLS support and load balancing. Most cloud providers have built-in options for load balancing, such as ALB on AWS.
70+
71+
It is currently required to host the API container on a dedicated subdomain - we do not support running it on the same subdomain as another service.
72+
73+
For self-hosting, [nginx](https://nginx.org/en/) is always a good option. A basic nginx configuration could look like this:
74+
75+
```yaml
76+
server {
77+
listen 443 ssl;
78+
server_name powersync.example.org;
79+
80+
# SSL configuration here
81+
82+
# Reverse proxy settings
83+
location / {
84+
proxy_pass http://powersync_server_ip:powersync_port; # Replace with your powersync details
85+
proxy_set_header Host $host;
86+
proxy_set_header X-Real-IP $remote_addr;
87+
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
88+
89+
# Disable proxy response buffering.
90+
# This is not relevant for websocket connections, but is important when using
91+
# HTTP streaming connections (configured in the PowerSync client SDK).
92+
proxy_buffering off;
93+
}
94+
}
95+
```
96+
97+
When using nginx as a Kubernetes ingress, set the proxy buffering option as an annotation on the ingress:
98+
99+
```yaml
100+
nginx.ingress.kubernetes.io/proxy-buffering: "off"
101+
```
102+
103+
### Health Checks
104+
105+
If the load balancer supports health checks, it may be configured to poll the API container at `/probes/liveness`. This endpoint is expected to have a 200 response when the container is healthy. See [Healthchecks](./lifecycle-maintenance/healthchecks) for details.
106+
107+
### Migrations
108+
109+
Occasionally, new versions of the PowerSync service image may require migrations on the underlying storage database. This is also specifically required the first time the service starts up on a new storage database.
110+
111+
By default, migrations are run as part of the replication and API containers. In some cases, a migration may add significant delay to the container startup.
112+
113+
To avoid this startup delay, the migrations may be run as a separate job on each update, before replacing the rest of the containers. To run the migrations, run the docker command `migrate up`, for example `docker run powersync migrate up`.
114+
115+
In this case, disable automatic migrations in the config:
10116

11117
```yaml
12118
# powersync.yaml
@@ -16,10 +122,14 @@ migrations:
16122
# When set to true, migrations must be triggered manually by modifying the container `command`.
17123
disable_auto_migration: true
18124
```
19-
MongoDB locks ensure migrations are executed exactly once, even when multiple containers start simultaneously.
125+
126+
Note that if you disable automatic migrations, and do not run the migration job manually,
127+
the service may run with an outdated storage schema version. This may lead to unexpected and potentially difficult-to-debug errors in the service.
20128
21129
## Backups
22130
23131
We recommend using Git to backup your configuration files.
24132
25-
The sync bucket storage database doesn't require backups as it can be easily reconstructed.
133+
None of the containers use any local storage, so no backups are required there.
134+
135+
The sync bucket storage database may be backed up using the recommendations for the storage database system. This is not a strong requirement, since this data can be recovered by re-replicating from the source database.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: "Migrating between instances"
3+
description: "Migrating users between PowerSync instances"
4+
---
5+
6+
## Overview
7+
8+
In some cases, you may want to migrate users between PowerSync instances. This may be between cloud and self-hosted instances, or even just to change the endpoint.
9+
10+
If the PowerSync instances use the same source database and have the same basic configuration and sync rules, you can migrate users by just changing the endpoint to the new instance.
11+
12+
To make this process easier, we recommend using an API to retrieve the PowerSync endpoint, instead of hardcoding the endpoint in the client application. If you're using custom authentication, this can be done in the same API call as getting the authentication token.
13+
14+
There should be no downtime for users when switching between endpoints. The client will have to re-sync all data, but this will all happen automatically, and the client will atomically switch between the two. The main effect visible to users will be a delay in syncing new data while the client is re-syncing. All data will remain available to read on the client for the entire process.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: "Multiple PowerSync Instances"
3+
description: "Scaling using multiple instances"
4+
---
5+
6+
## Overview
7+
8+
<Warning>
9+
Multiple instances are not required in most cases. See the [Overview](self-hosting/lifecycle-maintenance) for details on standard horizontal scaling setups.
10+
</Warning>
11+
12+
When exceeding a couple thousand concurrent connections, the standard PowerSync setup may not scale sufficiently to handle the load. In this case, we recommend you [contact us](/resources/contact-us) to discuss the options. However, we give a basic overview of using multiple PowerSync instances to scale here.
13+
14+
Each PowerSync "instance" is a single endpoint (URL), that is backed by:
15+
1. One replication container.
16+
2. Multiple API containers, scaling horizontally.
17+
3. One bucket storage database.
18+
19+
This setup is described in the [Overview](self-hosting/lifecycle-maintenance).
20+
21+
To scale further, multiple copies of this setup can be run, using the same source database.
22+
23+
## Mapping users to PowerSync endpoints
24+
25+
Since each PowerSync instance maintains its own copy of the bucket data, the exact list of operations and associated checksum will be different between them. This means the same client must connect to the same endpoint every time, otherwise they will have to re-sync all their data every time they switch. Multiple PowerSync instances cannot be load-balanced behind the same subdomain.
26+
27+
To ensure the same user always connects to the same endpoint, we recommend:
28+
1. Do an API lookup from the client application to get the PowerSync endpoint, don't hardcode it in the application.
29+
2. Either store the endpoint associated with each user, or compute it automatically using a hash function on the user id e.g. `hash(user_id) % n` where `n` is your number of instances.
30+
31+

self-hosting/lifecycle-maintenance/server-specs.mdx

Lines changed: 0 additions & 24 deletions
This file was deleted.

0 commit comments

Comments
 (0)