Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to leverage other data while gridding? (Enhancement) #96

Closed
Leon6j opened this issue Feb 14, 2022 · 6 comments
Closed

How to leverage other data while gridding? (Enhancement) #96

Leon6j opened this issue Feb 14, 2022 · 6 comments

Comments

@Leon6j
Copy link

Leon6j commented Feb 14, 2022

As another thread (#90) points out, one of the major weaknesses of the DIVA gridding is the poor handling of regions with no data. Bullseyes are often created.

I wonder if you could consider enhancing the DIVAnd.jl, so that it can automatically leverage values from another grid when there are no data and the associated gridding errors are too large?

Here is an example user case:

  • I have a 3-column observational data of Variable 1 (Longitude, Latitude, Variable 1). It does not have good coverage anywhere in the global ocean, and there will be some data sparse regions for sure. My goal is to grid the data onto a global grid.
  • I have a good satellite data based algorithm that allows me to calculate a Variable 1 value anywhere on the same global grid. It has larger uncertainty than a real measurement, but it is way better than DIVAnd.jl extrapolated values in data sparse regions.
  • Is it possible to allow DIVAnd.jl to grid Variable 1 based on the 3-column observational data, but for grid points where there are no real data to use, and as a result, the errors are known to be large, satellite algorithm derived values will be used to fill in?

Many thanks for your consideration!

@jmbeckers
Copy link
Member

So I assume your in situ data are near surface values also.

Technically you can use satellite data as if they were in situ data by adding them into the observational array using (lon, lat, val) of each pixel.

This will however give too much importance to the satellite data and will make cross-validation approaches more difficult (error correlations within the satellite data). To alleviate this and also address the relative errors between the two data types you can
a) subsample satellite data to have a coverage which is more comparable to the in situ coverage
b) use a different epsilon2 for both data sets. To do this you need to create an array of epsilon2 in which each element refers to a specific value of epsilon2 for a data points.

@Leon6j
Copy link
Author

Leon6j commented Feb 15, 2022

Many thanks for the reply!
Yes, my observational data is also surface data.

  • First of all, the satellite algorithm based data are already on the global grid, so there is no need to worry about gridding the satellite data.
  • Secondly, let's not worry about whether satellite is good enough for my research purpose. My satellite based values are actually really good. I just want to use observational data to enhance them where I can.
  • My question basically comes down to how to subsample satellite data and fill in those areas where DIVAnd does not have enough observational data and will create bulls eyes during the gridding?

@ctroupin
Copy link
Member

I guess you can also use the satellite observations to create a background field: as you know the DIVA analysis are performed on anomalies with respect to a background or reference field, which is, in simple cases, a uniform field with a value equal to the average value of all the observations. Here you can do this:

  1. Create a background field using the satellite data, with a long correlation long and a large noise-to-signal ratio
  2. Extract the values of this background field at the locations of the in situ observations.
  3. Perform the interpolation on the newly compute anomalies, with a smaller value of L and epsilon2.

Doing so, you ensure that the solution, in regions where no in situ obs. are available, take the value of the background field. And you don't need to sub-sample the satellite data.

@Leon6j
Copy link
Author

Leon6j commented Feb 15, 2022

Many thanks for chiming in!

Not sure I understood fully for #2 and #3. So you assume that when the anomalies are gridded, there won't be bulls eyes in regions where obs. data are not available? I doubt that is the case.

I'm thinking maybe I should do this instead:
Is there a way I can find the index of the grid points where the gridding errors are too high due to a lack of observational data? May I do so reliably based on the CPME estimates? Once that index is figured out, I can use that info to enhance my gridded results, i.e., replacing the gridded values at those grid points with satellite based values.

@ctroupin
Copy link
Member

Yes, I think that using a background field obtained by gridding the satellite observations, then performing the analysis of the in situ observations with that background field can help avoid the bulls eyes.

I'm now checking the doc and the examples to see if there is an example of how to do it.

Concerning the use of CPME, that can be a possibility, though I'm not sure what is the best way to extract the error fields at the locations of the observations.

@jmbeckers
Copy link
Member

If your satellite data are already on the same grid as the analysis you want to do, just work with anomalies with respect to your satellite data. That way you automatically will have satellite data in regions where you do not have observations.

Technically, to easily calculate anomalies you can first do a dummy analysis with your in situ data, recover the structure s from it
fi,s= DIVAndrun

and then use

newobs=DIVAnd_residualobs(s, fisat);

where fisat is the griddes satellite data (needs to be exactly on the same grid as your analysis)

Then you do an analysis with the anomalies of newobs and at the end sum up that analysis with your fisat.

Hope it makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants