-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
research: compatibility with citus #577
research: compatibility with citus #577
Comments
+1 |
Stream/Physical replicationCurrent situationThe main problem is at building the index, when index log is
but the backend hook
ImplementWe need to write a custom WAL manager like this PR. These operations should be logged:
graph LR
A[ambuild<br>main] --> B[XLogBeginInsert<br>main]
B[XLogBeginInsert<br>main] -->|auto| C[Write custom WAL<br>main]
C[Write custom WAL<br>main] -->|publish| D[standby]
E[standby] -->|hook| F[decode & replay<br>standby]
F[decode & replay<br>standby] --> G[rpc.create<br>standby]
ref:
Limitation
|
Citus🎉Citus is basically compatible with PGVecto.rs at the verification. graph LR
1[items] --> 2[items_102016]
1[items] --> 3[items_102017]
1[items] --> 4[items_102018]
1[items] --> 5[items_102019]
2[items_102016] --> 6[items_embedding_idx_102016]
3[items_102017] --> 7[items_embedding_idx_102017]
4[items_102018] --> 8[items_embedding_idx_102018]
5[items_102019] --> 9[items_embedding_idx_102019]
6[items_embedding_idx_102016] --> 10[items_embedding_idx]
7[items_embedding_idx_102017] --> 10[items_embedding_idx]
8[items_embedding_idx_102018] --> 10[items_embedding_idx]
9[items_embedding_idx_102019] --> 10[items_embedding_idx]
Checklist
postgres=# EXPLAIN SELECT id FROM items ORDER BY embedding <-> '[1,2,3]' LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=500.00..500.00 rows=1 width=12)
-> Sort (cost=500.00..750.00 rows=100000 width=12)
Sort Key: remote_scan.worker_column_2
-> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=12)
Task Count: 4
Tasks Shown: All
-> Task
Node: host=localhost port=5432 dbname=postgres
-> Limit (cost=0.00..0.02 rows=1 width=12)
-> Index Scan using items_embedding_idx_102016 on items_102016 items (cost=0.00..502.74 rows=23979 width=12)
Order By: (embedding <-> '[1, 2, 3]'::vector)
-> Task
Node: host=localhost port=5432 dbname=postgres
-> Limit (cost=0.00..0.02 rows=1 width=12)
-> Index Scan using items_embedding_idx_102017 on items_102017 items (cost=0.00..546.92 rows=26074 width=12)
Order By: (embedding <-> '[1, 2, 3]'::vector)
-> Task
Node: host=localhost port=5432 dbname=postgres
-> Limit (cost=0.00..0.02 rows=1 width=12)
-> Index Scan using items_embedding_idx_102018 on items_102018 items (cost=0.00..462.53 rows=22042 width=12)
Order By: (embedding <-> '[1, 2, 3]'::vector)
-> Task
Node: host=localhost port=5432 dbname=postgres
-> Limit (cost=0.00..0.02 rows=1 width=12)
-> Index Scan using items_embedding_idx_102019 on items_102019 items (cost=0.00..584.81 rows=27905 width=12)
Order By: (embedding <-> '[1, 2, 3]'::vector)
(26 rows)
root@7b2d6e048a03:/# cat /var/lib/postgresql/data/pg_vectors/startup/0 ; echo
{"indexes":[{"tenant_id":0,"cluster_id":7410314701760323622,"database_id":5,"index_id":17744},{"tenant_id":0,"cluster_id":7410314701760323622,"database_id":5,"index_id":17746},{"tenant_id":0,"cluster_id":7410314701760323622,"database_id":5,"index_id":17745},{"tenant_id":0,"cluster_id":7410314701760323622,"database_id":5,"index_id":17742},{"tenant_id":0,"cluster_id":7410314701760323622,"database_id":5,"index_id":17743}]}
root@7b2d6e048a03:/# cat /var/lib/postgresql/data/pg_vectors/indexes/0000000000000000000000000000000066d6b9f5b04760260000000500004552/sealed_segments/1/storage/len ; echo
27899
root@7b2d6e048a03:/# cat /var/lib/postgresql/data/pg_vectors/indexes/0000000000000000000000000000000066d6b9f5b04760260000000500004551/sealed_segments/1/storage/len ; echo
22061
postgres=# EXPLAIN SELECT id FROM items ORDER BY embedding <=> '[1,1,1]' LIMIT 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3821.93..3822.18 rows=100 width=12)
-> Sort (cost=3821.93..4071.93 rows=100000 width=12)
Sort Key: remote_scan.worker_column_2
-> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=12)
Task Count: 4
Tasks Shown: All
-> Task
Node: host=172.18.0.1 port=5430 dbname=postgres
-> Limit (cost=0.00..2.10 rows=100 width=12)
-> Index Scan using items_embedding_idx_102028 on items_102028 items (cost=0.00..501.98 rows=23918 width=12)
Order By: (embedding <=> '[1, 1, 1]'::vector)
-> Task
Node: host=172.18.0.1 port=5431 dbname=postgres
-> Limit (cost=0.00..2.10 rows=100 width=12)
-> Index Scan using items_embedding_idx_102029 on items_102029 items (cost=0.00..545.11 rows=26009 width=12)
Order By: (embedding <=> '[1, 1, 1]'::vector)
-> Task
Node: host=172.18.0.1 port=5430 dbname=postgres
-> Limit (cost=0.00..2.10 rows=100 width=12)
-> Index Scan using items_embedding_idx_102030 on items_102030 items (cost=0.00..464.84 rows=22147 width=12)
Order By: (embedding <=> '[1, 1, 1]'::vector)
-> Task
Node: host=172.18.0.1 port=5431 dbname=postgres
-> Limit (cost=0.00..2.10 rows=100 width=12)
-> Index Scan using items_embedding_idx_102031 on items_102031 items (cost=0.00..585.08 rows=27926 width=12)
Order By: (embedding <=> '[1, 1, 1]'::vector)
(26 rows)
Limitation
Without citus, it shoule be:
|
Thanks for your research! |
Research if we could support citus
The text was updated successfully, but these errors were encountered: