Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random vs 'deterministic' data_node selection #160

Open
jbusecke opened this issue May 11, 2024 · 0 comments
Open

Random vs 'deterministic' data_node selection #160

jbusecke opened this issue May 11, 2024 · 0 comments
Labels
architecture question Further information is requested

Comments

@jbusecke
Copy link
Collaborator

In the new async client I made the choice of not selecting the data_nodes (if there are several options) from a list of preferred nodes, but just take the first complete one.

My thinking behind this was that it might be good to randomize the sources, in case there is something wrong with a particular of the preferred notes in combination with a certain dataset. I still think that is a good choice overall, but what I noticed in running deployments for #72, is that (to no surprise) does redownload all files (in this case there are a LOT)

image

So this somewhat negates the advantage of a file cache. I think that pangeo-forge/pangeo-forge-recipes#713 will ultimately help with this and give the benefit of not always using the same data node, but for now I am thinking to re-implement the node sorting?

Lets see how https://console.cloud.google.com/dataflow/jobs/us-central1/2024-05-11_06_56_55-9660281334429566451;step=Creating%20CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.so.gn.v20200514%7COpenURLWithFSSpec%7COpenWithXarray%7CPreprocessor%7CStoreToZarr%7CInjectAttrs%7CConsolidateDimensionCoordinates%7CConsolidateMetadata%7CCopy%7CLogging%20to%20bigquery%20%28non-QC%29%7CTestDataset%7CLogging%20to%20bigquery%20%28QC%29;graphView=0?project=leap-pangeo&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))&authuser=1 goes.

@jbusecke jbusecke added question Further information is requested architecture labels May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant