-
-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New ram middle #1461
New ram middle #1461
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments for first two commits. I'll do the rest later.
src/options.hpp
Outdated
{ | ||
bool full_nodes = false; | ||
bool full_ways = false; | ||
bool full_relations = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add documentation here. The variable names are not self-explanatory.
src/ordered-index.hpp
Outdated
index.reserve(block_size); | ||
} | ||
|
||
bool full() const noexcept { return index.size() == index.capacity(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guaranteed to work, i.e. is vector allowed to pre-emptively enlarge its capacity before it is full?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/cpp/container/vector says iterators are only invalidated on, for example push_back()
or resize()
if the vector changed capacity. Documenting it this way makes only sense if we can know whether a vector changed capacity. So I think we are good here.
I have added a new PR #1464 which contains the first two commits of this one with added docs. Once that is through, I'll reissue this PR here. |
This is a very memory-efficient storage which will be used for the new ram middle.
Replaces the somewhat dated middle_ram_t by a completely new implementation for importing small to medium sized files into a non-updateable database. It works completely in memory, no data is written to disk. The following traits of OSM objects can be stored. All are optional: - Node locations for building geometries of ways. - Way node ids for building geometries of relations based on ways. - Tags and attributes for nodes, ways, and/or relations for full 2-stage-processing support. - Attributes for untagged nodes.
New version of this PR with only the two last commits slightly updated and rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's so much more readable.
This implements a completely new ram (non-slim) middle that is compatible with the old one but much more memory-efficient. The first three commits build some infrastructure for it, the last contains the new middle.
The new middle usually only stores node locations and way nodes which are needed to build way and relation geometries. If the output uses two-stage processing it can now tell the middle that that's the case and then also complete way objects (including the tags and attributes) are stored. The new middle code has provisions for storing node and relation objects, too, but these are not used yet, because two-stage processing does not use them yet.
When not using two-stage processing the memory requirements are much much smaller than with the old ram middle. Rule of thumb is, you'll need about 1GB plus 2.5 times the size of the PBF file as memory. This makes it possible to import even continent-sized data on reasonably-sized machines.
When using two-stage processing the memory requirements are larger than without it. Currently OSM objects are just stored in the libosmium internal format, which is meant for quick access, not for saving memory. There could be considerable space savings there with a better implementation, but considering that two-stage processing is seldom used and you can always fall back to the slim middle, improving this has been left for a later time.