Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ram middle #1461

Merged
merged 2 commits into from
Apr 29, 2021
Merged

New ram middle #1461

merged 2 commits into from
Apr 29, 2021

Conversation

joto
Copy link
Collaborator

@joto joto commented Apr 26, 2021

This implements a completely new ram (non-slim) middle that is compatible with the old one but much more memory-efficient. The first three commits build some infrastructure for it, the last contains the new middle.

The new middle usually only stores node locations and way nodes which are needed to build way and relation geometries. If the output uses two-stage processing it can now tell the middle that that's the case and then also complete way objects (including the tags and attributes) are stored. The new middle code has provisions for storing node and relation objects, too, but these are not used yet, because two-stage processing does not use them yet.

When not using two-stage processing the memory requirements are much much smaller than with the old ram middle. Rule of thumb is, you'll need about 1GB plus 2.5 times the size of the PBF file as memory. This makes it possible to import even continent-sized data on reasonably-sized machines.

When using two-stage processing the memory requirements are larger than without it. Currently OSM objects are just stored in the libosmium internal format, which is meant for quick access, not for saving memory. There could be considerable space savings there with a better implementation, but considering that two-stage processing is seldom used and you can always fall back to the slim middle, improving this has been left for a later time.

Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for first two commits. I'll do the rest later.

src/options.hpp Outdated
{
bool full_nodes = false;
bool full_ways = false;
bool full_relations = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add documentation here. The variable names are not self-explanatory.

index.reserve(block_size);
}

bool full() const noexcept { return index.size() == index.capacity(); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to work, i.e. is vector allowed to pre-emptively enlarge its capacity before it is full?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/container/vector says iterators are only invalidated on, for example push_back() or resize() if the vector changed capacity. Documenting it this way makes only sense if we can know whether a vector changed capacity. So I think we are good here.

@joto joto mentioned this pull request Apr 26, 2021
@joto
Copy link
Collaborator Author

joto commented Apr 26, 2021

I have added a new PR #1464 which contains the first two commits of this one with added docs. Once that is through, I'll reissue this PR here.

This is a very memory-efficient storage which will be used for the new
ram middle.
Replaces the somewhat dated middle_ram_t by a completely new
implementation for importing small to medium sized files into a
non-updateable database. It works completely in memory, no data is
written to disk.

The following traits of OSM objects can be stored. All are optional:
- Node locations for building geometries of ways.
- Way node ids for building geometries of relations based on ways.
- Tags and attributes for nodes, ways, and/or relations for full
  2-stage-processing support.
- Attributes for untagged nodes.
@joto
Copy link
Collaborator Author

joto commented Apr 26, 2021

New version of this PR with only the two last commits slightly updated and rebased.

Copy link
Collaborator

@lonvia lonvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so much more readable.

@lonvia lonvia merged commit bf4d427 into osm2pgsql-dev:master Apr 29, 2021
@joto joto deleted the new-ram-middle branch April 29, 2021 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants