Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for finished migration to public GCS bucket. #82

Open
16 of 37 tasks
jbusecke opened this issue Jan 18, 2024 · 1 comment
Open
16 of 37 tasks

Tracking issue for finished migration to public GCS bucket. #82

jbusecke opened this issue Jan 18, 2024 · 1 comment

Comments

@jbusecke
Copy link
Collaborator

jbusecke commented Jan 18, 2024

Once we moved the ingestion to the public GCS bucket the instructions in the README should be updated and this issue closed.

Some steps to track the evolution of this milestone. We are getting VERY CLOSE.

  • Implement mover stage (Add mover/copy stage #67)

  • Test access to public bucket from mover stage (Change move bucket #87)

  • Refactor all database content to a single bigquery table:

  • Test and copy LEAP stores (one time action, info -> 'leap-pangeo.cmip6_pgf_ingestion.leap_legacy'

  • Test PANGEO legacy stores (one time action, info -> 'leap-pangeo.cmip6_pgf_ingession.pangeo_legacy'

  • Add all legacy entries to beta test table "leap-pangeo.testcmip6.cmip6_consolidated_manual_testing"

  • Set up revised json catalog files
    Beta version of catalog available here:

    import intake
    url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json" # Only stores that pass current tests
    # url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_noqc.json" # Only stores that fail current tests
    # url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_retracted.json" # Only stores that have been retracted by ESGF
    col = intake.open_esm_datastore(url)
    
  • Get some folks to beta test.

  • Rebase on leap-data-management-utils release (Use new release of leap-data-management-utils #85)

  • Implement consolidated coordinates (Test consolidates dim coords from PR #88)

  • Optional/Later: Consolidate coordinates of LEAP legacy stores?

  • Setup final table and repeat ingestion from legacy bq tables.

    • Change all target prefixes to the final location
    • Ingest data from 'leap-pangeo.cmip6_pgf_ingestion.leap_legacy'
    • Ingest data from 'leap-pangeo.cmip6_pgf_ingession.pangeo_legacy'
    • Redirect to proper csv in json catalog files.
  • Delete and clean temp table/directories and populate final table from scratch.

    • Save/move reports
    • cmip6/cmip6-pgf-ingestion-test/catalogs/
    • cmip6/cmip6-pgf-ingestion-test/
    • cmip6/CMIP6_LEAP_legacy_test
    • Delete all the permanent stores on LEAP buckets
  • Remove retraction logic from https://github.com/pangeo-data/pangeo-cmip6-cloud/blob/master/create_filtered_catalog.py and document new location there.

  • Adopt new catalog in xMIP

  • Write a Blogpost about this

  • Dance in the streets 🕺

  • (Optional/Later) Implement regular retesting of all stores

    • Tune dask performance
    • Speed up testing with coiled?
@jbusecke
Copy link
Collaborator Author

#67 Made substantial progress towards this goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant