Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soilgrids extract function for the CMS SDA #2903

Closed
wants to merge 14 commits into from

Conversation

serbinsh
Copy link
Member

@serbinsh serbinsh commented Jan 12, 2022

Description

Frist pass at the soilgrids extract function for the NASA CMS SDA

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of .zenodo.json
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@serbinsh
Copy link
Member Author

serbinsh commented Jan 12, 2022

We are providing here as a place to be reviewed by the CMS team and move forward with a full function. A few things to follow up on

  1. Does this belong in data land or data remote?
  2. What are all of the function arguments we want to have?
  3. We need to pass lat/longs from a bety site query to this function as a site list not just a lat/long (or both?)
  4. We need to set this function up to "loop over" sites after connecting to the virtual raster and not connect to the vrt for each lat/long. The data extract is slow so we want to extract all sites at once not in a loop
  5. @DongchenZ How do you want the output data formatted for integration into the SDA workflow? In the short term maybe we do this and in the long term create an intermediate format that then gets parsed to SDA format (list of lists?)? @mdietze?

I know I am forgetting some things here but wanted to get this up now so Cherry can move forward on completing this if the basic skeleton looks good to folks?

In addition here are some warnings we might want to look into

There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In fn(par, ...) : NAs introduced by coercion
2: In fn(par, ...) : NAs introduced by coercion
3: In fn(par, ...) : NAs introduced by coercion
4: In fn(par, ...) : NAs introduced by coercion
5: In fn(par, ...) : NAs introduced by coercion
6: In fn(par, ...) : NAs introduced by coercion
7: In fn(par, ...) : NAs introduced by coercion
8: In fn(par, ...) : NAs introduced by coercion
9: In fn(par, ...) : NAs introduced by coercion
10: In fn(par, ...) : NAs introduced by coercion
11: In fn(par, ...) : NAs introduced by coercion
12: In fn(par, ...) : NAs introduced by coercion
13: In fn(par, ...) : NAs introduced by coercion
14: In fn(par, ...) : NAs introduced by coercion
15: In fn(par, ...) : NAs introduced by coercion
16: In fn(par, ...) : NAs introduced by coercion
17: In fn(par, ...) : NAs introduced by coercion
18: In fn(par, ...) : NAs introduced by coercion
19: In fn(par, ...) : NAs introduced by coercion
20: In fn(par, ...) : NAs introduced by coercion
21: In fn(par, ...) : NAs introduced by coercion
22: In fn(par, ...) : NAs introduced by coercion
23: In qgamma(qstat, theta[1], theta[2]) : NaNs produced

@serbinsh serbinsh self-assigned this Jan 12, 2022
@serbinsh serbinsh marked this pull request as draft January 12, 2022 21:00
@serbinsh serbinsh added the Status: Ready Pull request ready for merge, or issue ready to be closed label Feb 15, 2022
@serbinsh serbinsh changed the title init commit of a working soilgrids extract function for the CMS SDA - WIP soilgrids extract function for the CMS SDA Feb 15, 2022
@serbinsh serbinsh marked this pull request as ready for review February 15, 2022 15:09
@serbinsh
Copy link
Member Author

The main function is in the soilgrids.R file and is currently called "soilgrids.soc.extract"

An example of how the function works at this time:

rm(list = ls())
library(DBI)
library(PEcAn.data.remote)

# get site info
db <- 'betydb'
host_db <- 'modex.bnl.gov'
db_port <- '5432'
db_user <- 'bety'
db_password <- 'bety'

bety <- list(user='bety', password='bety', host='modex.bnl.gov',
             dbname='betydb', driver=RPostgres::Postgres(),write=FALSE)
con <- DBI::dbConnect(drv=bety$driver, dbname=bety$dbname, host=bety$host, password=bety$password, user=bety$user)
con


# test individual sites
suppressWarnings(site_qry <- glue::glue_sql("SELECT *, ST_X(ST_CENTROID(geometry)) AS lon,
                                              ST_Y(ST_CENTROID(geometry)) AS lat FROM sites WHERE id IN ({ids*})",
                                            ids = c("676","622","678","766","764"), .con = con))

suppressWarnings(qry_results.1 <- DBI::dbSendQuery(con,site_qry))
suppressWarnings(qry_results.2 <- DBI::dbFetch(qry_results.1))

qry_results.2
DBI::dbClearResult(qry_results.1)
dbDisconnect(con)

site_list <- qry_results.2
verbose <- TRUE
system.time(result_soc <- soilgrids.soc.extract(site_info=site_list, verbose=verbose))
result_soc

and the output looks like this

> result_soc
$Mean_soc
    site_id      lat        lon    0-5cm   5-15cm  15-30cm  30-60cm 60-100cm 100-200cm
622     622 46.24202  -89.34757 65.68854 37.25930 20.36566 7.826109 5.011070  3.126902
676     676 45.80593  -90.07961 49.61828 31.22689 16.61261 8.230412 2.395168  1.734718
678     678 45.94080  -90.27000 55.58001 32.35441 14.74488 6.919386 2.666095  1.773701
764     764 44.31570 -121.60800 74.78708 22.80451 12.64082 8.590361 3.230583  2.568780
766     766 44.43720 -121.56700 69.53522 31.53413 14.27996 7.598909 4.507278  2.793447

$Sdev_soc
    site_id      lat        lon     0-5cm   5-15cm   15-30cm  30-60cm 60-100cm 100-200cm
622     622 46.24202  -89.34757  58.89990 38.01215 13.210002 5.472139 1.294284 2.2844188
676     676 45.80593  -90.07961  45.24691 30.98490  8.879574 6.299148 1.049219 0.6686239
678     678 45.94080  -90.27000  52.04211 33.99952  9.816848 5.382589 1.407709 0.7605814
764     764 44.31570 -121.60800 130.80391 15.02392  9.033595 9.215060 2.353715 1.4176229
766     766 44.43720 -121.56700  77.75313 27.73879  9.968512 6.702437 3.756643 2.3921458

The present function works by creating internal virtual rasters for each soilgrids depth then extracts the values for each site in the provided site list.

@serbinsh serbinsh removed the Status: Ready Pull request ready for merge, or issue ready to be closed label Feb 16, 2022
@serbinsh serbinsh marked this pull request as draft February 16, 2022 16:30
@serbinsh
Copy link
Member Author

serbinsh commented Feb 16, 2022

Still needed: the ability to write out an Rdata file containing the results so that we dont have to run the extraction every time....duh

I will update

@serbinsh
Copy link
Member Author

Updated to output an Rdata file but have not yet set up the function to record the file/results in BETY as I am not yet sure how to manage the lack of a time element but where we do have the site ID. Do we try to capture that so that future CMS runs know where it already has SOC data? How do we record what sites are already in the SOC file? Seems complicated if we are doing runs that only share part of the same site list!

@serbinsh serbinsh marked this pull request as ready for review February 17, 2022 16:14
@serbinsh serbinsh added the Status: Ready Pull request ready for merge, or issue ready to be closed label Feb 17, 2022
modules/data.remote/R/soilgrids.R Show resolved Hide resolved
modules/data.remote/R/soilgrids.R Show resolved Hide resolved
modules/data.remote/R/soilgrids.R Show resolved Hide resolved
modules/data.remote/R/soilgrids.R Show resolved Hide resolved
modules/data.remote/R/soilgrids.R Show resolved Hide resolved
modules/data.remote/R/soilgrids.R Show resolved Hide resolved
modules/data.remote/R/soilgrids.R Show resolved Hide resolved
dat <- split(soc_fit, list(soc_fit$siteids, soc_fit$Depth))

#assume the soc profile follows gamma distribution best
cgamma <- function(theta, val, stat) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation of function args?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there any reason to have these function internal to this function rather than just part of this pecan package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving function out to a utils file to make these functions exposed in data.remote()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these are pretty specific to the needs for calculating soil C by depth so I dont know if it makes sense to move these out into a general function as I wouldnt yet know why/what else we would use it for? To make it general we would need to pass the function the variable names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it may be useful for other soil C products other than SoilGrids

return(sum((pred - val) ^ 2))
}

fitQ <- function(x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - moving out and exposing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as stated above I dont actually know if it makes sense to move these if these arent really used anywhere else?

@serbinsh serbinsh removed the Status: Ready Pull request ready for merge, or issue ready to be closed label Mar 18, 2022
@serbinsh
Copy link
Member Author

Updating output format too:

> output_data
    site_id      lat       lon    0-5cm   5-15cm  15-30cm  30-60cm 60-100cm 100-200cm stdv_0-5cm stdv_5-15cm stdv_15-30cm stdv_30-60cm stdv_60-100cm stdv_100-200cm
622     622 45.80593 -90.07961 49.61828 31.22689 16.61261 8.230412 2.395168  1.734718   45.24691    30.98490     8.879574     6.299148      1.049219      0.6686239
676     676 46.24202 -89.34757 65.68854 37.25930 20.36566 7.826109 5.011070  3.126902   58.89990    38.01215    13.210002     5.472139      1.294284      2.2844188
678     678 45.94080 -90.27000 55.58001 32.35441 14.74488 6.919386 2.666095  1.773701   52.04211    33.99952     9.816848     5.382589      1.407709      0.7605814

@serbinsh serbinsh marked this pull request as draft March 18, 2022 17:57
@mdietze
Copy link
Member

mdietze commented Mar 24, 2022

@serbinsh your comments mention a lot of fixes, but they don't seem to have been pushed into the PR yet.

@serbinsh
Copy link
Member Author

serbinsh commented Oct 7, 2022

This is now superseded by @Qianyuxuan new PR #3040

There are tests in here that at some point we may want to bring into data.land but have left those out for now as they are API tests we dont want to hold up checks if the API/URL is unreachable

@serbinsh serbinsh closed this Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants