Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append mode: Do not try to delete objects that can't exist in middle #2006

Merged
merged 1 commit into from
Jul 20, 2023

Conversation

joto
Copy link
Collaborator

@joto joto commented Jul 17, 2023

When osm2pgsql runs in append mode it deletes all objects for which it gets new versions from the middle tables before then adding the new version. For a typical diff many of these deletes will be unnecessary because the objects are new. With this commit the behaviour changes slightly: We first get the maximum id from the nodes/ways/relations middle tables. This operation is fast, because the PostgreSQL max() function is aware of the btree index on those tables. Later, before we delete an object we check the id against that maximum id, if it is larger the object can't be in the table and we don't do the delete.

(Note that in theory we could use the fact that an object has version number 1 to figure out that it must be new. But this is much less robust than what we are doing here, for instance when the diff overlaps with the original import.)

Performance improvement for small (minutely) diffs is not measurable, for large diffs about 10%.

When osm2pgsql runs in append mode it deletes all objects for which it
gets new versions from the middle tables before then adding the new
version. For a typical diff many of these deletes will be unnecessary
because the objects are new. With this commit the behaviour changes
slightly: We first get the maximum id from the nodes/ways/relations
middle tables. This operation is fast, because the PostgreSQL max()
function is aware of the btree index on those tables. Later, before we
delete an object we check the id against that maximum id, if it is
larger the object can't be in the table and we don't do the delete.

(Note that in theory we could use the fact that an object has version
number 1 to figure out that it must be new. But this is much less robust
than what we are doing here, for instance when the diff overlaps with
the original import.)
@joto joto force-pushed the check-max-id-before-delete branch from d276b6e to c885922 Compare July 17, 2023 09:12
@lonvia lonvia merged commit 4c35c03 into osm2pgsql-dev:master Jul 20, 2023
27 checks passed
@joto joto deleted the check-max-id-before-delete branch July 30, 2023 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants