Append mode: Do not try to delete objects that can't exist in middle #2006
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When osm2pgsql runs in append mode it deletes all objects for which it gets new versions from the middle tables before then adding the new version. For a typical diff many of these deletes will be unnecessary because the objects are new. With this commit the behaviour changes slightly: We first get the maximum id from the nodes/ways/relations middle tables. This operation is fast, because the PostgreSQL max() function is aware of the btree index on those tables. Later, before we delete an object we check the id against that maximum id, if it is larger the object can't be in the table and we don't do the delete.
(Note that in theory we could use the fact that an object has version number 1 to figure out that it must be new. But this is much less robust than what we are doing here, for instance when the diff overlaps with the original import.)
Performance improvement for small (minutely) diffs is not measurable, for large diffs about 10%.