Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

other(data): Country map viz: geoboundary downloader + more maps #18778

Closed
wants to merge 4 commits into from
Closed

other(data): Country map viz: geoboundary downloader + more maps #18778

wants to merge 4 commits into from

Conversation

Yann-J
Copy link
Contributor

@Yann-J Yann-J commented Feb 17, 2022

While contributing some new maps to the legacy country map visualization in #18745, I realized a few limitations, that this PR intends to address, i.e.:

  • Support a new source to download high-quality geoJSON boundary files, the amazing geoBoundaries - in particular, this source supports up to 4 levels of administrative divisions (whereas the current source only supports 1), thanks to a new python notebook
  • Support for an attribution in the map display
  • Better naming convention for map names, e.g. capitalizing every word in names with spaces
  • A bunch of new maps coming from geoBoundaries relevant to my organization, with several administrative subdivision levels

SUMMARY

Technical / design choices and caveats:

  • The attributions comment displayed in the map is read from the GeoJSON source's attribution property at the top level of the FeatureCollection.
  • WARNING: Many admin areas do not have an ISO code... The geoBoundaries datasource's ID is used instead, but it requires the user to know about these. Further improvements might be needed to allow the use of the area name instead, or make it easier to discover the expected codes...

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Sample map for Kenya Counties (notice the new attribution comment at the bottom):

image

Snapshot of all the added geojson maps:

maps

TESTING INSTRUCTIONS

Set up a data source with county/province-level data, including their ISO 3166-2 code, for instance uploading this sample random CSV:

ISO,name,value
ET-AA,Addis Ababa,58.24388052899692
ET-AF,Afar,82.39096125651662
ET-AM,Amhara,5.823758668250489
ET-BE,Beneshangul Gumu,4.906103785510196
ET-DD,Dire Dawa,64.04758252062041
ET-GA,Gambela,71.45936608261624
ET-HA,Hareri,6.278288599268822
ET-OR,Oromia,60.625174539397065
ET-SN,SNNPR,93.6721133500065
ET-SO,Somali,59.03513300561172
ET-TI,Tigray,34.66035278225454

Create a new Country Map chart with this dataset (in this example, for the 'Ethiopia Regions' country name).

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@Yann-J Yann-J changed the title Country map viz: geoboundary downloader other:Country map viz: geoboundary downloader + more maps Feb 17, 2022
@Yann-J Yann-J changed the title other:Country map viz: geoboundary downloader + more maps other(data): Country map viz: geoboundary downloader + more maps Feb 17, 2022
@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #18778 (85c0b30) into master (9db6ed6) will increase coverage by 0.00%.
The diff coverage is 40.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #18778   +/-   ##
=======================================
  Coverage   66.32%   66.32%           
=======================================
  Files        1620     1620           
  Lines       63087    63091    +4     
  Branches     6372     6372           
=======================================
+ Hits        41840    41844    +4     
- Misses      19590    19591    +1     
+ Partials     1657     1656    -1     
Flag Coverage Δ
javascript 51.26% <40.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../legacy-plugin-chart-country-map/src/CountryMap.js 0.00% <0.00%> (ø)
...s/legacy-plugin-chart-country-map/src/countries.ts 100.00% <100.00%> (ø)
.../explore/components/controls/TextControl/index.tsx 86.66% <0.00%> (+10.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9db6ed6...85c0b30. Read the comment docs.

@villebro
Copy link
Member

villebro commented Feb 17, 2022

Thanks so much for this improvement @Yann-J ! I'll check some countries that I'm familiar with and see what the quality is vs the current source. I know the current Finland map has a few errors and in some parts is very lacking in detail to the point of being unrecognizable vs a real map, so this could potentially be a great improvement. Also tagging @ktmud for a review as he wrote the initial notebook.

@Yann-J
Copy link
Contributor Author

Yann-J commented Feb 17, 2022

Thanks, and kudos to @ktmud as well on the original notebook... the new one is of course very heavily inspired from it, and I'm not python expert myself (and a total geopandas noob), so any feedback appreciated.

There are maybe 2 areas I'd draw some attention to:

  • Regarding the quality of the maps, just like with the original script, the source files are a bit heavy and need to be down-sampled. There's clearly some room to tune the simplify parameters. The original algorithm used the (angular?) surface of the bounding box as a criteria, which I felt might favor large countries, so I've used its square root instead (i.e. the angular tolerance should be roughly proportional to the country's angular diameter)... Could be improved but the maps I generated for Africa all seemed to come out well...
  • I've initially hit a major hurdle when loading the geojson files produced by geopandas... It took me many hours to figure out, because the files looked perfectly OK, and loaded without issues in all the geojson viewers I've tried, but kept giving a crazy centroid (in fact the antipode of the expected one) / bounding box (the entire world) when loaded in the web UI by D3, throwing off the projection entirely... It turns out that D3 (but apparently not geopandas or even the source files from geoBoundaries, or any geojson viewer I tried??) will respect the geoJSON convention on orienting the polygons, i.e. that the interior of the polygons is expected to be on the right side (or is it the other way?) of the line... If the points are provided in the opposite order, D3 will consider the interior of the polygons to be the rest of the world rather than the intended shape... This is the reason for the ugly hack loading the raw json and applying reverse on the geometry lists... There might be a more elegant way to force that in geopandas / fiona but couldn't figure it out.

@stale
Copy link

stale bot commented Apr 18, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 18, 2022
@Yann-J
Copy link
Contributor Author

Yann-J commented Apr 20, 2022

Hello @villebro @ktmud I hope this one is still on the table?

@stale stale bot removed the inactive Inactive for >= 30 days label Apr 20, 2022
@villebro
Copy link
Member

Hi @Yann-J yes let me put this on my list of PRs that need proper review (this requires setting aside some dedicated time to do comprehensive review/testing)

@ktmud
Copy link
Member

ktmud commented Apr 20, 2022

Sorry that I missed this. I'll try to find some time to review this this week as well.

@ktmud
Copy link
Member

ktmud commented Apr 22, 2022

The new geo data source is a very nice addition. @Yann-J can you also add a screenshot of how the attribution looks like in a dashboard when the chart is small?

Looking at the size of the new files, I'm wondering whether it is now time to think about a long-term maintenance mode for geo-boundaries in Superset. While these new maps are relevant to your organization, it may never be used by 90% of other Superset users. We had a section in Superset docs specifically for users who want to add their own maps so that the size of Superset pip package does not bloat up.

A potential solution is to publish these maps in a separate GitHub repo and add a superset command to download them to ./superset/static/assets/geoboundaries on demand. The list of available countries can be configured in config.py, with the default being the list of all available files in the geoboundaries directory.

@Yann-J
Copy link
Contributor Author

Yann-J commented Apr 23, 2022

Indeed, I agree it may not be sustainable to host these large files in the core repo, especially since most shapes will likely not be relevant to most people, and some mechanism to host and load them externally might be best...

Rather than a command, maybe a configuration in settings.py could also work...

I'm not sure I can confidently suggest a proper solution for that myself, as I'm not really familiar enough with Superset's core architecture...

@one-acre-fund one-acre-fund closed this by deleting the head repository Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants