Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Potential Bug]: Members being ingested with inconsistent time lengths/missing time chunks #174

Open
AbbySh opened this issue Jun 11, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@AbbySh
Copy link

AbbySh commented Jun 11, 2024

Description

Trying to intake many members/models that satisfy the following criteria:

cat = col.search(
    variable_id=['tos', 'sos', 'chl', 'mlotst', 'spco2'],
    table_id=['Omon'], # monthly ocean output only
    experiment_id=['ssp245','historical'],
    require_all_on=['source_id', 'member_id', 'grid_label'] # this ensures that results will have all variables and experiments available
)

However, many of the ssp245 and historical time periods do not match with each other. For instance,

ddict['CMIP.MPI-M.MPI-ESM1-2-LR.historical.r41i1p1f1.Omon.spco2.gn.none.r41i1p1f1.v20190815.gs://cmip6/cmip6-pgf-ingestion-test/zarr_stores/9103515710_1/CMIP6.CMIP.MPI-M.MPI-ESM1-2-LR.historical.r41i1p1f1.Omon.spco2.gn.v20190815.zarr']

will show you an ingested historical member with a time period that ends in 1969, whereas a few weeks ago every historical member ingested had data through 2014. Some ssp245 time periods are ending in 2022, some in 2024, some in 2100, it appears. Is there a way to only ingest members based on whether they satisfy certain year ranges? I know that, in the case of the historical member listed above, I would want that filtered out, as its associated ssp245 data starts in 2015, meaning 1970-2014 are missing.

@AbbySh AbbySh added the bug Something isn't working label Jun 11, 2024
@AbbySh AbbySh changed the title [Potential Bug]: Members being ingested with inconsistent time lengths [Potential Bug]: Members being ingested with inconsistent time lengths/missing time chunks Jun 11, 2024
@jbusecke
Copy link
Collaborator

Thanks for raising this issue @AbbySh.

We need to confirm if these 'short' runs are short as they are submitted (then there is nothing we can do from our end), or if there is a bug in the ingestion (this we can and should fix, and rerun these particular stores).

I will try to get to this in the next days, but for your purposes I recommend adding an additional check along the lines of

filtered_datasets = []
for ds in all_your_datasets:
    ds = ds.sel(time=slice('2014', '2100')) # Some members are run longer than 2100, and those should just be trimmed.
    if ds.time[0].data.year == 2014 and ds.time[-1].data.year == 2100:
        filtered_datasets.append(ds)
    else:
        print(f"{ds=} did not pass the time test")

@AbbySh
Copy link
Author

AbbySh commented Jun 11, 2024

Another example is:

ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp245.r21i1p1f1.Omon.chl.gn.none.r21i1p1f1.v20200922.gs://cmip6/CMIP6_LEAP_legacy/a618127503-6099939925-3/CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp245.r21i1p1f1.Omon.chl.gn.v20200922.zarr

Appears to go from 2045 to 2054, only

@jbusecke
Copy link
Collaborator

Thanks @AbbySh these are super helpful examples. I will see that I can make some progress here in the coming week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants