WIP: Add code to generalize data using different strategies #1830

joto · 2022-11-21T08:11:39Z

This commit adds the code to generalize polygons using two different strategies. In both cases polygons are merged and simplified on a tile-by-tile basis.

The "vector" strategy uses the commonly used approach where polygons are buffered, their union is calculated and then a reverse buffer applied, followed by a final buffer again.

The "raster" strategy does a similar thing but does it in raster space which is much faster. First the polygons are rendered into a raster, an open/close operation is called (which basically does the same thing as the buffering in vector space) and finally the resulting raster is vectorized again.

For the raster support this adds two new library dependency: CImg and potrace.

The functionality is accessed through a new command line program called osm2pgsql-gen. Call it with -h to get some usage information. This program is for testing only, eventually the functionality should be accessible from osm2pgsql itself (using the Lua config file for configuration).

Some additional information is in README-gen.md

pnorman · 2022-11-21T08:19:47Z

Is the main goal here to verify that the algorithms produce useful results before implementing in osm2pgsql?

joto · 2022-11-21T08:27:12Z

This is part of the generalization effort (https://osm2pgsql.org/generalization/).

For some context see

Is the main goal here to verify that the algorithms produce useful results before implementing in osm2pgsql?

Yes. And to see how performant it is. It doesn't help us if it can't be used in real-world scenarios.

mboeringa · 2022-11-27T13:00:26Z

@joto

Interesting reads those blog posts.

A couple of remarks though:

Unless you use ST_Buffer's 'num_seg_quarter_circle' setting in the vector option, you may in fact end up with far more vertices in the output than in the input polygons, and not be generalizing the data at all... Using 'num_seg_quarter_circle' with a low value like 2 or 3, should considerably speed up the (subsequent) processing as well.
It may be highly recommended to add an additional true generalization step using 'ST_Simplify/ST_SimplifyPreserveTopology/ST_SimplifyVW' after the buffer operation, for the same reason as listed above: truly reduce the number of vertices and processing requirements.
There is a high chance of producing invalid output with Polygon generalization. In fact, when I ran the QGIS's 'Check Validity' option two years ago or so against the generalized land and water polygons as created by the coastline processing you refer to and as available on https://osmdata.openstreetmap.de/data/, QGIS detected multiple errors. I don't know the current situation, I should re-check it, but from my own experience developing my own algorithms, it is easy to run into trouble with invalid Polygons (and it can be really hard to fix it and get 100% valid output). In my particular case, running 'ST_MakeValid' was not enough to deal with all issues (but ArcGIS is even more finicky in this respect due to the different Polygon validity framework, although any OGC valid one is also valid in the ESRI world according to https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/check-geometry.htm).
Anyway, since the goal of osm2pgsql was to always produce valid output, I would certainly recommend carefully reviewing any generalization results from the developed algorithms for issues, as invalid geometries can cause a range of issues for subsequent use.

mboeringa · 2022-11-29T07:19:07Z

A few more remarks:

I also recommend testing any generalization algorithm at a global scale. Small regional extracts may not reveal all issues, especially related to the anti-meridian, and how some PostGIS commands deal with that.
Mind the "fine print" with any of the 'ST_Simplify/ST_SimplifyPreserveTopology/ST_SimplifyVW' PostGIS commands as listed in the PostGIS documentation.
Output from the current, non-generalized, osm2pgsql processing pipeline, has shown to be very robust in both QGIS and ArcGIS for me, with fully valid readable output, even with the more advanced 'flex' processing.

joto · 2022-12-05T08:17:14Z

You are raising some valid points, @mboeringa. In fact the first version of the code in this PR would generate polygons that were all invalid! (The rings were not closed, which is fixed now.) I also added an optional ST_MakeValid() step to the raster-union strategy which seems to work well in my experiments (use -p make_valid to enable) and seems to not take too much time. But overall I am not too worried about validity. The generalized data is not that accurate anyway, that's the whole point of generalization, and small rendering errors will probably not matter. Still, it is better to have things valid. We'll need some more real-world experience with this to see how important that is and whether it is worth the extra effort.

Unfortunately the potrace library will sometimes generate invalid polygons. I have seen two cases: Sometimes it snips off corners of the tiles it renders, this can be mitigated by having larger margins around tiles, but that's certainly something to look out for. The other happens when rendering polygons touching in single points (think of the figure "8") which must be modelled as two polygons, not one. There is another problem with the potrace library which I couldn't figure out yet: The left side of all polygons is slightly to the left of where it should be, the right side is okay. Maybe some kind of rounding error. You can't usually see it in the results though, because the effect is small, but still somewhat annoying.

You also mentioned the num_seg_quarter_circle setting and the need for simplification. I have played around with this in a different context, but didn't do that in this project. Using fewer segments for the buffer makes processing noticably faster, so that is an important optimization, but I am focusing on the new raster-based approach which is even faster and it generates far smaller polygons (with less vertexes) to begin with, so I believe there is no need for the extra simplification step. But it would be interesting to try this out. There is a huge experimental space here and I fully expect that we'll tweak code and parameters for a long time to come to better results. Anybody who wants to play around with this, please go ahead and report your findings here.

pnorman · 2022-12-05T08:52:26Z

The generalized data is not that accurate anyway, that's the whole point of generalization, and small rendering errors will probably not matter. Still, it is better to have things valid

Having a guarantee of validity is very important since lots of PostGIS functions require validity.

pnorman · 2022-12-12T10:49:34Z

README-gen.md

+* `-w, --width=WIDTH`: Size of the imaged rendered when using the `raster-union`
+  strategy, not used in the `vector-union` strategy.
+* `-b, --buffer=BUFFER`: The amount by which the polygons will be buffered. For
+  the `raster-union` strategy this is in pixels, for the `vector-union` strategy


The documentation doesn't say what resolution is used per tile

Yeah, sorry, there is a lot the documentation doesn't spell out clearly. That's because this is still work-in-progress. You have to do a bit of experimenting if you want to try this.

pnorman

A few things I noticed while getting it running. Not a complete review.

There's a bunch of hard-coded 3857 assumptions. If we're okay with these, we need to document them.

pnorman · 2022-12-13T06:33:54Z

src/osm2pgsql-gen.cpp

+    connection->query(
+        PGRES_COMMAND_OK,
+        "PREPARE get_extent AS WITH ex AS "
+        "(SELECT ST_EstimatedExtent('{}', 'geom') AS e) "


This hard-codes the geometry column to geom

fixes in new version

pnorman · 2022-12-13T06:35:50Z

src/osm2pgsql-gen.cpp

+            xmax = 0;
+            ymax = 0;
+        } else {
+            if (x == 0 && y == 0 && xmax == 0 && ymax == 0) {


Zero shouldn't be treated as a special value, because you might just want to generate the 0/0 tile.

fixed in new version

pnorman · 2022-12-13T06:43:38Z

README-gen.md

+* `-b, --buffer=BUFFER`: The amount by which the polygons will be buffered. For
+  the `raster-union` strategy this is in pixels, for the `vector-union` strategy
+  this is in Mercator map units.
+* `-g, --group-by-column=COL`: Set this to the column describing the type of


How are multiple columns indicated?

That's not possible. If that's really something we need, we can add that later. But I am trying to keep the config to a minimum.

joto · 2022-12-13T19:53:37Z

New version documents that this only works for 3857. Also added a "make_valid" option to ensure results are valid polygons. Lots of bug fixes. And other stuff... I'll get around to documenting this eventually...

pnorman · 2022-12-14T06:25:43Z

README-gen.md

+called `type` with the type of the landcover (forest, grass, etc.).
+
+Create a table `landcover_z10` with (at least) columns `geom` and `type` as
+well as integer columns `x`, and `y`. Then call this:


An index on x, y is also needed

If you use osm2pgsql-gen on larger extracts or the planet you need to run it with -D on first import to suppress DELETEs, then built indexes on x, y and when you do updates run without the -D. The problem is that by default osm2pgsql-gen will DELETE all features in a tile in the output table before loading them. But when you start that for the first time, i.e. when the PREPARE is done, the table is empty, so even if there is an index PostgreSQL will decide not to use it. That's fine in the beginning, but as the table grows everything becomes slower and slow.

After the import you create the index and do an ANALYZE and from then on everything is fine when doing updates.

This is a bit tricky, but once we put this functionality into osm2pgsql itself, we know whether we are in import or append mode and can do the right thing automatically.

pnorman · 2022-12-16T15:49:54Z

I've been experimenting with the results now, and am having an error where the lower-left corner of each tile's polygon is being clipped away

I created the base table with

CREATE MATERIALIZED VIEW
osm_planet_trees
AS
SELECT
    osm_id,
    way,
    name,
    ref,
    way_area
FROM planet_osm_polygon
WHERE "natural" = 'wood' OR landuse = 'forest'

then ran
PGDATABASE=flex time ~/osm/osm2pgsql/build/osm2pgsql-gen -t planet_osm_trees -T trees_z10 -g way -G name -s raster-union -x0 -X1024 -y0 -Y1024 -z10 -m 0.125

The result in qgis looks like this:

joto · 2022-12-16T19:43:50Z

I've been experimenting with the results now, and am having an error where the lower-left corner of each tile's polygon is being clipped away

I believe that's already been fixed in the current version of this PR. Did you use that?

This commit adds the code to generalize various types of data using different strategies. The following strategies work on a tile-by-tile basis and operate on polygons: The "vector-union" strategy buffers and unionizes polygons using vector operations. The "raster-union" strategy does a similar thing but does it in raster space which is much faster. First the polygons are rendered into a raster, an open/close operation is called (which basically does the same thing as the buffering in vector space) and finally the resulting raster is vectorized again. The "builtup" strategy is intended to derive a layer of builtup areas from landuse=residential/industrial etc. as well as building cover and dense road networks. This still needs some work... Also a new "discrete-isolation" strategy which rates places based on some importance metric. (This is not tile-based.) The new "rivers" strategy finds important rivers, this is still very much work in progress. For the raster support this adds two new library dependency: CImg and potrace. The functionality is accessed through a new command line program called osm2pgsql-gen. It reads the same Lua config file that osm2pgsql reads. Call it with -h to get some usage information. This program is for testing only, eventually the functionality should be accessible from osm2pgsql itself. See also https://osm2pgsql.org/generalization/ .

joto · 2023-03-02T09:06:56Z

Closing in favour of #1942.

joto force-pushed the wip-gen-tile branch from 3416ed3 to eaa6fd0 Compare November 26, 2022 14:02

joto force-pushed the wip-gen-tile branch from eaa6fd0 to facfe18 Compare December 5, 2022 07:52

pnorman reviewed Dec 12, 2022

View reviewed changes

pnorman requested changes Dec 13, 2022

View reviewed changes

joto force-pushed the wip-gen-tile branch from facfe18 to 05e2a2b Compare December 13, 2022 19:50

pnorman reviewed Dec 14, 2022

View reviewed changes

joto force-pushed the wip-gen-tile branch 4 times, most recently from dd1c8f3 to 7891280 Compare December 22, 2022 15:30

This comment was marked as off-topic.

Sign in to view

joto changed the title ~~WIP: Add code to generalize polygons using two different strategies~~ WIP: Add code to generalize data using different strategies Dec 28, 2022

joto force-pushed the wip-gen-tile branch from 7891280 to aef6b3b Compare January 23, 2023 09:02

joto force-pushed the wip-gen-tile branch 3 times, most recently from e56cb0e to f7430aa Compare January 31, 2023 18:19

joto force-pushed the wip-gen-tile branch 5 times, most recently from 285255d to 0613222 Compare February 25, 2023 11:11

joto force-pushed the wip-gen-tile branch from 0613222 to 198a384 Compare February 26, 2023 10:19

joto force-pushed the wip-gen-tile branch from 198a384 to 6f45315 Compare February 27, 2023 20:51

joto closed this Mar 2, 2023

joto deleted the wip-gen-tile branch March 6, 2023 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add code to generalize data using different strategies #1830

WIP: Add code to generalize data using different strategies #1830

joto commented Nov 21, 2022

pnorman commented Nov 21, 2022

joto commented Nov 21, 2022

mboeringa commented Nov 27, 2022 •

edited

Loading

mboeringa commented Nov 29, 2022

joto commented Dec 5, 2022

pnorman commented Dec 5, 2022

pnorman Dec 12, 2022

joto Dec 12, 2022

pnorman left a comment

pnorman Dec 13, 2022

joto Dec 13, 2022

pnorman Dec 13, 2022

joto Dec 13, 2022

pnorman Dec 13, 2022

joto Dec 13, 2022

joto commented Dec 13, 2022

pnorman Dec 14, 2022

joto Dec 14, 2022 •

edited

Loading

pnorman commented Dec 16, 2022

joto commented Dec 16, 2022

This comment was marked as off-topic.

joto commented Mar 2, 2023

WIP: Add code to generalize data using different strategies #1830

WIP: Add code to generalize data using different strategies #1830

Conversation

joto commented Nov 21, 2022

pnorman commented Nov 21, 2022

joto commented Nov 21, 2022

mboeringa commented Nov 27, 2022 • edited Loading

mboeringa commented Nov 29, 2022

joto commented Dec 5, 2022

pnorman commented Dec 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pnorman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joto commented Dec 13, 2022

Choose a reason for hiding this comment

joto Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

pnorman commented Dec 16, 2022

joto commented Dec 16, 2022

This comment was marked as off-topic.

joto commented Mar 2, 2023

mboeringa commented Nov 27, 2022 •

edited

Loading

joto Dec 14, 2022 •

edited

Loading