Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_sf_: abyssmal performance on Win Server 2022 #2239

Closed
bthierry-udm opened this issue Oct 2, 2023 · 15 comments
Closed

_sf_: abyssmal performance on Win Server 2022 #2239

bthierry-udm opened this issue Oct 2, 2023 · 15 comments

Comments

@bthierry-udm
Copy link

I have sf, version 1.0-14 running in R v4.3.1 on a workstation configured with Windows Server 2022.
I recently moved one project from my iMac, which is running the exact same sf and R versions, to that workstation and noticed that many if not all sf functions seem to run way slower than on my Mac. One of them is st_transform -- see below:

# Win server
> system.time(DA16 %>% head(100) %>% st_transform(4326))
   user  system elapsed 
   0.28    0.00    0.28 
> system.time(DA16 %>% head(1000) %>% st_transform(4326))
   user  system elapsed 
  29.10    0.00   29.12 
> system.time(DA16 %>% head(2000) %>% st_transform(4326))
   user  system elapsed 
  90.34    0.00   90.35 
# MacOS 
> system.time(DA16 %>% head(100) %>% st_transform(4326))
   user  system elapsed 
  0.040   0.008   0.071 
> system.time(DA16 %>% head(1000) %>% st_transform(4326))
   user  system elapsed 
  0.148   0.001   0.149 
> system.time(DA16 %>% head(2000) %>% st_transform(4326))
   user  system elapsed 
  0.380   0.006   0.389  

Win workstation is equipped with twice the RAM and a SSD, not configured as a virtual server and I was the only logged user. Hence ressource issues are probably not the explanation here.

BTW, the same snippet run on a Win 10 laptop gives timings similar to the Mac. As a side note, running the same kind of operation using PostGIS does not lead to meaningfull time difference between Mac and WinServer.

Any idea why sf is so slow on a Win server compared to desktop OSes ?

@rsbivand
Copy link
Member

rsbivand commented Oct 2, 2023

Without a minimal reproducible example and access to the version of Windows you are complaining about, use of invectives is most unlikely to get you anywhere. Someone else has to be able to demonstrate this claim independently.

@bthierry-udm
Copy link
Author

Yes, sorry -- I was more looking for any similar experiences from the community as I'm clueless about what's going on here with my setup.

I've attached a sample of the DA16 shapefile used by my script above. (The full dataset can be retrieved on StatCan web site.)

# Running on Win Server 2022
> library(sf)
Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

> DA16 <- st_read('lda_000b16a_e_sample.shp')
Reading layer `lda_000b16a_e_sample' from data source 
  `I:\Benoit\TEMP\DA16_sample\lda_000b16a_e_sample.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 2000 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7609465 ymin: 1228651 xmax: 9015737 ymax: 2986430
Projected CRS: PCS_Lambert_Conformal_Conic

> system.time(DA16 %>% head(100) %>% st_transform(4326))
   user  system elapsed 
   0.28    0.00    0.29 

> system.time(DA16 %>% head(1000) %>% st_transform(4326))
   user  system elapsed 
  30.11    0.00   30.10 

> system.time(DA16 %>% head(2000) %>% st_transform(4326))
   user  system elapsed 
  85.56    0.02   85.60 

As for the system on which the script is running, I've included the details below:

Edition Windows Server 2022 Standard
Version 21H2
Installed on ‎2023-‎01-‎27
OS build 20348.1970

Device name volvic
Full device name volvic.XXX
Processor AMD Ryzen Threadripper PRO 3945WX 12-Cores 4.00 GHz
Installed RAM 96.0 GB (95.9 GB usable)
Device ID XXX
Product ID XXX
System type 64-bit operating system, x64-based processor
Pen and touch No pen or touch input is available for this display

I hope it helps fill in the gaps in my previous post. Thanks for any insight.

DA16_sample.zip

@rsbivand
Copy link
Member

rsbivand commented Oct 2, 2023

No access to any such system. Please create the subset prior to st_transform and do not use pipes in the timing. Please check that all package versions agree.

@kadyb
Copy link
Contributor

kadyb commented Oct 2, 2023

I can confirm this issue on Windows 10 with PROJ 9.2.0. In terra::project() it looks exactly the same.

system.time(st_transform(DA16[1:100, ], crs = "EPSG:4326"))
#> user  system elapsed 
#> 0.25    0.00    0.25

system.time(st_transform(DA16[1:1000, ], crs = "EPSG:4326"))
#>  user  system elapsed 
#> 22.46    0.00   22.48 
Session Info
    GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
"3.11.2"        "3.6.2"        "9.2.0"         "true"         "true"        "9.2.0" 

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-14    terra_1.7-46

loaded via a namespace (and not attached):
 [1] utf8_1.2.3         R6_2.5.1           codetools_0.2-19   tidyselect_1.2.0  
 [5] e1071_1.7-13       magrittr_2.0.3     glue_1.6.2         tibble_3.2.1      
 [9] KernSmooth_2.23-21 pkgconfig_2.0.3    generics_0.1.3     dplyr_1.1.3       
[13] lifecycle_1.0.3    classInt_0.4-10    cli_3.6.1          fansi_1.0.4       
[17] vctrs_0.6.3        grid_4.3.1         DBI_1.1.3          proxy_0.4-27      
[21] class_7.3-22       compiler_4.3.1     rstudioapi_0.15.0  tools_4.3.1       
[25] pillar_1.9.0       Rcpp_1.0.11        rlang_1.1.1        units_0.8-4    

@kadyb
Copy link
Contributor

kadyb commented Oct 2, 2023

The same is for ogr2ogr:

system.time(
  gdal_utils("vectortranslate",
             "lda_000b16a_e_sample.shp",
             "test.gpkg",
             options = c("-t_srs", "EPSG:4326",
                         "-limit", 1000))
)
#>  user  system elapsed 
#> 23.00    0.02   23.02

@bthierry-udm, what PROJ version do you have on macOS and Windows 10 (see sf::sf_extSoftVersion())?

@bthierry-udm
Copy link
Author

The DA16 sample already contains the first 2000 records used for the third test.

Here is the result of the snippet without the pipes and subsetting:

> system.time(st_transform(DA16, 4326))
   user  system elapsed 
  85.36    0.03   85.37 

Below, the sessionInfo for the Win Server workstation:

          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.11.2"        "3.6.2"        "9.2.0"         "true"         "true"        "9.2.0" 

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-14

loaded via a namespace (and not attached):
 [1] utf8_1.2.3         R6_2.5.1           tidyselect_1.2.0   e1071_1.7-13       magrittr_2.0.3    
 [6] glue_1.6.2         tibble_3.2.1       KernSmooth_2.23-22 pkgconfig_2.0.3    generics_0.1.3    
[11] dplyr_1.1.3        lifecycle_1.0.3    classInt_0.4-10    cli_3.6.1          fansi_1.0.4       
[16] grid_4.3.1         vctrs_0.6.3        DBI_1.1.3          proxy_0.4-27       class_7.3-22      
[21] compiler_4.3.1     rstudioapi_0.15.0  tools_4.3.1        pillar_1.9.0       Rcpp_1.0.11       
[26] rlang_1.1.1        units_0.8-4     

And the MacOS one:

          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.11.0"        "3.5.3"        "9.1.0"         "true"         "true"        "9.1.0" 

R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-14

loaded via a namespace (and not attached):
 [1] utf8_1.2.3         R6_2.5.1           tidyselect_1.2.0   e1071_1.7-13       magrittr_2.0.3    
 [6] glue_1.6.2         tibble_3.2.1       KernSmooth_2.23-21 pkgconfig_2.0.3    generics_0.1.3    
[11] dplyr_1.1.3        lifecycle_1.0.3    classInt_0.4-9     cli_3.6.1          fansi_1.0.4       
[16] grid_4.3.1         vctrs_0.6.3        DBI_1.1.3          proxy_0.4-27       class_7.3-22      
[21] compiler_4.3.1     rstudioapi_0.15.0  tools_4.3.1        pillar_1.9.0       Rcpp_1.0.11       
[26] rlang_1.1.1        units_0.8-3       

The Win 10 laptop is running on some older version (and is not subjected to the same slow processing as seen in Win Server):

          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ
      "3.10.2"        "3.4.1"        "7.2.1"         "true"         "true"        "7.2.1"

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] sf_1.0-12

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7         magrittr_2.0.1     units_0.7-2        tidyselect_1.1.1   R6_2.5.1           rlang_0.4.12       fansi_0.5.0
 [8] dplyr_1.0.7        tools_4.1.2        grid_4.1.2         KernSmooth_2.23-20 utf8_1.2.2         e1071_1.7-9        DBI_1.1.1
[15] ellipsis_0.3.2     class_7.3-19       assertthat_0.2.1   tibble_3.1.5       lifecycle_1.0.1    crayon_1.4.2       purrr_0.3.4
[22] vctrs_0.3.8        glue_1.6.2         proxy_0.4-26       compiler_4.1.2     pillar_1.6.4       generics_0.1.1     classInt_0.4-3
[29] pkgconfig_2.0.3

@kadyb Updating the Win 10 laptop to the latest R and sf versions, I can indeed reproduce the issue.

# Win 10, updated to R 4.3.1 and sf 1.0-14
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ
      "3.11.2"        "3.6.2"        "9.2.0"         "true"         "true"        "9.2.0"

@kadyb
Copy link
Contributor

kadyb commented Oct 2, 2023

So the problem probably is with PROJ 9.2.0. Maybe this issue is related and already fixed. Unfortunately, to update PROJ, we have to wait for the new version of R on Windows.

@edzer
Copy link
Member

edzer commented Oct 2, 2023

Could well be, as the provided dataset has datum NAD83.

@rsbivand
Copy link
Member

rsbivand commented Oct 5, 2023

> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
...
> library(sf) # 1.0.14
Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
> getwd()
[1] "C:/Users/RB/work/slow_transform"
> DA16 <- st_read('lda_000b16a_e_sample.shp')
Reading layer `lda_000b16a_e_sample' from data source 
  `C:\Users\RB\work\slow_transform\lda_000b16a_e_sample.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 2000 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7609465 ymin: 1228651 xmax: 9015737 ymax: 2986430
Projected CRS: PCS_Lambert_Conformal_Conic
> system.time(st_transform(DA16[1:100,], 4326))
   user  system elapsed 
   0.09    0.04    0.14 
> 
> system.time(st_transform(DA16[1:1000,], 4326))
   user  system elapsed 
  43.74    0.11   52.64 
> system.time(st_transform(DA16, 4326))
   user  system elapsed 
 156.13    0.42  219.03 
> library(sf) # 1.0.12 and 1.0.14
Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
> DA16 <- st_read('lda_000b16a_e_sample.shp')
Reading layer `lda_000b16a_e_sample' from data source 
  `C:\Users\RB\work\slow_transform\lda_000b16a_e_sample.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 2000 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7609465 ymin: 1228651 xmax: 9015737 ymax: 2986430
Projected CRS: PCS_Lambert_Conformal_Conic
> system.time(st_transform(DA16[1:100,], 4326))
   user  system elapsed 
   0.00    0.03    0.09 
> system.time(st_transform(DA16[1:1000,], 4326))
   user  system elapsed 
   0.27    0.00    0.27 
> system.time(st_transform(DA16, 4326))
   user  system elapsed 
   0.74    0.00    0.73 
> 

@edzer
Copy link
Member

edzer commented Oct 5, 2023

With PROJ 9.1.1:

sessionInfo()
# R version 4.3.1 (2023-06-16)

# ...

# loaded via a namespace (and not attached):
# [1] compiler_4.3.1
library(sf) # 1.0.14
# Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE
DA16 <- st_read('DA16_sample')
# Reading layer `lda_000b16a_e_sample' from data source 
#   `/home/edzer/Downloads/DA16_sample' using driver `ESRI Shapefile'
# Simple feature collection with 2000 features and 22 fields
# Geometry type: MULTIPOLYGON
# Dimension:     XY
# Bounding box:  xmin: 7609465 ymin: 1228651 xmax: 9015737 ymax: 2986430
# Projected CRS: PCS_Lambert_Conformal_Conic
system.time(st_transform(DA16[1:100,], 4326))
#    user  system elapsed 
#   0.035   0.008   0.054 
system.time(st_transform(DA16[1:1000,], 4326))
#    user  system elapsed 
#   0.238   0.000   0.238 
system.time(st_transform(DA16, 4326))
#    user  system elapsed 
#   0.507   0.004   0.512 

@rsbivand
Copy link
Member

rsbivand commented Oct 5, 2023

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora Linux 38 (Workstation Edition)
> library(sf) # 1.0.14
Linking to GEOS 3.12.0, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
> DA16 <- st_read('lda_000b16a_e_sample.shp')
Reading layer `lda_000b16a_e_sample' from data source 
  `/home/rsb/tmp/bigshape/DA16_sample/lda_000b16a_e_sample.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 2000 features and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7609465 ymin: 1228651 xmax: 9015737 ymax: 2986430
Projected CRS: PCS_Lambert_Conformal_Conic
> system.time(st_transform(DA16[1:100,], 4326))
   user  system elapsed 
  0.037   0.003   0.041 
> system.time(st_transform(DA16[1:1000,], 4326))
   user  system elapsed 
  0.169   0.000   0.169 
> system.time(st_transform(DA16, 4326))
   user  system elapsed 
  0.427   0.001   0.430 

So the analysis in #2239 (comment) was correct, and PROJ for binary package builds should be moved beyond 9.2.0 to avoid the regression. Wrt. #2240 (comment), and that it is a long time to 4.4, should we explore the consequences of updating MXE to PROJ. 9.3.0 and GDAL 3.7.2?

@olivroy
Copy link
Contributor

olivroy commented Oct 13, 2023

I ran into this too!

Almost impossible to understand what's going on, nor to know where it is coming from.

Now that I found the source, I added this to my code.

if (sf::sf_extSoftVersion()["PROJ"] == "9.2.0") {
    cli::cli_warn("It is not possible to switch crs to 4326 currently. On Windows, only possible with R 4.2 or R 4.4.")
}

@rsbivand
Copy link
Member

Wrong, only transformation to NAD83 may be affected by taking more time than before or after, results remain correct. Pleade do not jump to conclusions.

@olivroy
Copy link
Contributor

olivroy commented Oct 13, 2023

But as I am using also data from Statistics Canada, it is the only format provided. Because I have been affected by #2104, I remember having used st_transform(4326) + sf_simplify and the result was wrong..

@olivroy
Copy link
Contributor

olivroy commented Oct 16, 2023

After an upgrade request in Rtools
https://bugs.r-project.org/show_bug.cgi?id=18614.

PROJ now has version 9.3.0 in Rtools 43 https://cran.r-project.org/bin/windows/Rtools/rtools43/news.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants