Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closure test L1 consistency in random noise generation #1695

Merged
merged 6 commits into from
May 17, 2023

Conversation

comane
Copy link
Member

@comane comane commented Mar 17, 2023

The idea of this PR is that currently (in the master) the MULT uncertainties part of the Covmat used to compute L1 noise in a closure test are computed from the experimental central values (which do not exist within a CT).
In this branch the MULT unc. are computed from the L0 central values, i.e., the theory predictions.

@RoyStegeman
Copy link
Member

At some point I disconnected from the discussion last Wednesday, but could you explain why it has to be done this way? Would be useful if that could also be added to the as a comment in the code or to the docs, whichever is more appropriate.

Copy link
Member Author

@comane comane Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scarlehoff, @andreab1997, @giovannidecrescenzo In the code meeting of the 15.03.2023 it was discussed that the needed modifications should be:

covmat = dataset_inputs_covmat_from_systematics(
commondata_wc,
level0_commondata_wc,
dataset_input_list,
use_weights_in_covmat=False,
norm_threshold=None,
_list_of_central_values=None,
_only_additive=sep_mult,
)

level1_data = make_replica(
level0_commondata_wc, filterseed, covmat, sep_mult=sep_mul, genrep=True
)

and adding sep_mult as an arg of make_level1_data.
However, I do not think that this would be correct since when _only_additive=True only the ADD types (see the cd.additive_errors method) in the systype file are selected that is only a subset of the actual points.
It is in fact also clear by running a vp-setupfit that the modifications wrt to the master in the central L1 values would be huge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's surprising. The only difference should be whether the multiplicative error is added with the covmat or after and we said in the past that it didn't make a (big) difference. @andreab1997 maybe we got the combination of true/false wrong?

afair in the fit we have the true-true option for the sampling.

@comane comane requested a review from RoyStegeman March 17, 2023 18:25
@comane
Copy link
Member Author

comane commented Mar 17, 2023

At some point I disconnected from the discussion last Wednesday, but could you explain why it has to be done this way?

I think the point is that Experimental Central values are not known within a Closure test.
Thus, I think, that the multiplicative errors should be generated from the Level 0 central values in order to be more consistent

@RoyStegeman
Copy link
Member

That I find unconvincing without further explanation. In a closure test we aim to reproduce a known underlying law, thus as L0 data we use predictions from an input PDF. Then we do the fits based on some loss function and see if we can recover the input pdf. Fine so far.

Then, we have a choice regarding the loss function, either we keep it as in the real case (thus multiplicative exp uncs determined using the exp central value) or we determine the uncs going into the loss function using the input pdf. I would like to keep the closure test as close to the real scenario as possible, so in principle I prefer to use the same covmat as in the real scenario.

If there is some reason why using the experimental covmat is wrong/inconsistent, we'll have to accept that we need to deviate from the real covmat and use the input pdf as you propose. My point is that right now I don't see where we go wrong if we use the usual exp covmat.

@comane
Copy link
Member Author

comane commented Mar 17, 2023

Then, we have a choice regarding the loss function, either we keep it as in the real case (thus multiplicative exp uncs determined using the exp central value) or we determine the uncs going into the loss function using the input pdf. I would like to keep the closure test as close to the real scenario as possible, so in principle I prefer to use the same covmat as in the real scenario.

If there is some reason why using the experimental covmat is wrong/inconsistent, we'll have to accept that we need to deviate from the real covmat and use the input pdf as you propose. My point is that right now I don't see where we go wrong if we use the usual exp covmat.

I see your point, however, I think that the real (experimental) covariance matrix is never really used when fitting in a closure test.
This is because, correct me if I am wrong, the covmat used when fitting, i.e. the one used to construct the chi2, as well as the one used to add noise to Level 1 data for each data replica, is generated from the Level1 data that have been written in the runcard_name/filter folder (after vp-setupfit) and there, in particular, the central values won't be the experimental ones but the Level 0 ones augmented by some noise. So the question is from which covariance matrix do we sample the noise that we add to Level 0 data in order to get Level 1 data.
In any case, I think that the difference is really small.

@scarlehoff
Copy link
Member

@RoyStegeman this is only for the creation of the data and should not affect the fit. This change should make the level 1 data closer to what is done in the fit when we generate a replica.

@RoyStegeman
Copy link
Member

This change should make the level 1 data closer to what is done in the fit when we generate a replica.

Good point, somehow I forgot about that for a sec. Nevertheless, I'm not so interested in revisiting this discussion since you've talked about it for a long time and seemed to have come to some sort of agreement which is why I asked for the reasoning behind the choice to be documented (which might be useful since you apparently already remember something different from Mark after two days).

@scarlehoff
Copy link
Member

scarlehoff commented Mar 17, 2023

(which might be useful since you apparently already remember something different from Mark after two days)

I believe you two were just talking about different things (or were thinking about different things).

In any case, the difference between both methods should be negligible. Having the separate multiplicative flag in the runcard will help checking whether the difference is truly negligible.

@alecandido
Copy link
Member

I see your point, however, I think that the real (experimental) covariance matrix is never really used when fitting in a closure test.

@comane: re the real covmat I agree with you, but, of course, the additive part of it is exactly that.

Then, we have a choice regarding the loss function, either we keep it as in the real case (thus multiplicative exp uncs determined using the exp central value) or we determine the uncs going into the loss function using the input pdf. I would like to keep the closure test as close to the real scenario as possible, so in principle I prefer to use the same covmat as in the real scenario.

@RoyStegeman about the concept itself I'd agree with @comane: the exp central values should never enter a closure test, not even in the covmat.
The idea is that you want to test the methodology in a controlled scenario, i.e.:

  • I make up data for the central values of which I know the underlying law (the PDF I used for L0 generation)
  • and I try to reconstruct it as if I were doing the usual fit

But the usual fit prescription is:

  1. take the central value
  2. take additive and multiplicative uncertainties as given by experimentalist (and this we decided not to fake, apart from the closure tests with inconsistent data project)
  3. construct the covmat with central values and exp uncertanties
  4. initialize
  5. start training, with the loss compute using the covmat

But in a closure test step 1. is replaced by the fake central data, and the experimental input should be limited to everything but central values, consistently. Otherwise, we would be leaking information about the exp central values, to which the closure test should be completely blind (even if just through the covmat).

However, I believe that this is the only argument in favor of not using the actual exp covmat, but it really depends on how you decide to factorize experimental input, i.e. what you consider to be independent (in a statistical sense).
If the two independent inputs were the central values and the covmat, that you would've been right, but in that case I'd say the experimentalist would quote those values. If instead the independent inputs are central values and add+mult unc, then you should replace the central values at this level with the closure test ones.

@RoyStegeman
Copy link
Member

I don't agree that the closure test should be completely blind to the exp central values purely for the sake of it. I do agree that the point of the closure test is to have a perfectly controlled/self-consistent scenario. If the closure test needs to be completely blind to the exp central values to achieve this, then that implies that the closure test should be blind to the exp central values as well, but this needs to be explained (which so far has not been done to my satisfaction, but if that's just me then please write down the argument in the docs).

If you say we should use the L1 data for the mult uncs. e.g. because that is consistent with the L1 central data we want to fit while using the exp central values for the mult uncs is not, or because we need the same covmat for sampling and fitting, that would be a perfectly sensible reason to me. Though also those reasons would require explaining why they are more important than sampling the fluctuations from the same covmat as in a real fit.

@alecandido
Copy link
Member

I don't agree that the closure test should be completely blind to the exp central values purely for the sake of it.

It's not for the sake of it, it is because exp central values are a specific instance of possible central values, and if you keep correlating to them in all your tests you can't prove that your methodology is good enough independently, because it might be still good only for some feature of that specific instance.
In a sense, it depends on the goal we are attributing to the closure tests.

So, let's say like this: if we had the choice of picking a PDF that reproduce exactly the known central values at L0, should we use that one? Or should we pick a random one anyhow?

If you say we should use the L1 data for the mult uncs. e.g. because that is consistent with the L1 central data we want to fit while using the exp central values for the mult uncs is not, or because we need the same covmat for sampling and fitting, that would be a perfectly sensible reason to me.

I don't see exactly how to spell out the consistency. It is a rather vague requirement.
About the second example (we need the same for sampling and fitting), well, that in a sense we'd need, but at L2: the way the NNPDF methodology is spelled out, it only allows having the same covmat for sampling and loss (up to t0, of course).
At L1 generation there is no comparison with the actual fit.

Though also those reasons would require explaining why they are more important than sampling the fluctuations from the same covmat as in a real fit.

That's the point, that is the same one I mentioned above: I believe you want a test that is as independent as possible from the actual fit, to prove your methodology to be good enough in a large enough space.
But maybe I'm misinterpreting what people claimed about closure tests (even though I heard many times during the hopscotch saga that not overfitting to be proven by closure tests).

@comane
Copy link
Member Author

comane commented Mar 21, 2023

@RoyStegeman , @scarlehoff , @alecandido ,
To recap, the idea of the modification introduced by this PR:

A consistent Closure Test should be defined as:

  1. Generate L0 data: L0 = G[f] (f is the underlying law, G ~ FK table)
  2. Generate L1 data: L1 = L0 + eta, eta ~ N(0,C)
  3. Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,C)

The above Closure test is consistent since both eta as eps_k are sampled from N(0,C), i.e. using the same covariance matrix.

What is actually done in the code (master branch) right now:

  1. Generate L0 data: L0 = G[f]
  2. Generate L1 data: L1 = L0 + eta, eta ~ N(0,Cexp)
  3. Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)

where CL1 is a covariance matrix whose 'additive part' is exactly the same as Cexp (and therefore consistent) the 'multiplicative part', however, has been generated from L1 central values.
Therefore, in general, Cexp != CL1.

What is done in this PR:

  1. Generate L0 data: L0 = G[f]
  2. Generate L1 data: L1 = L0 + eta, eta ~ N(0,CL0)
  3. Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)

where CL0 is a covariance matrix whose 'additive part' is exactly the same as Cexp (and therefore consistent) the 'multiplicative part', however, has been generated from L0 central values.
Note that in this case one still has CL0 != CL1, however, we can expect the 'multiplicative entries' of CL0 to more closely resemble those of CL1 (since those of Cexp could deviate by a significant amount from those of CL1 if there is no good data - theory agreement)

@comane
Copy link
Member Author

comane commented Mar 21, 2023

please write down the argument in the docs

Hi @RoyStegeman , thank you for your comments.
By writing in the docs, do you mean to add something to https://docs.nnpdf.science
or simply to comment the make_level1_data() function that has been modified in this PR?

@RoyStegeman
Copy link
Member

RoyStegeman commented Mar 21, 2023

In my first comment I said "whichever is more appropriate", though since the argument contains some subtleties that cannot be summarized in one or two sentences I would suggest adding it to the actual documentation https://docs.nnpdf.science/. These docs are built from the docs folder in this repo.

It may still be useful to add "what" the function does as a small comment in the code, but to me the docs seem like a more appropriate place to answer "why" it's done the way it is. There's no strict rules though so just do whatever you think is best, people will have to review this before it gets merged anyway so it could even be changed later.

@alecandido
Copy link
Member

By writing in the docs, do you mean to add something to docs.nnpdf.science
or simply to comment the make_level1_data() function that has been modified in this PR?

Just to mention it: in principle even the docstring of make_level1_data() is part of the docs on https://docs.nnpdf.science. Even if it is not reachable by the TOC (but it is discoverable searching).
https://docs.nnpdf.science/modules/validphys/validphys.html?highlight=all_preds#validphys.pseudodata.make_level1_data

@comane
Copy link
Member Author

comane commented Apr 19, 2023

report for Rbv computed on a multiclosure test

https://vp.nnpdf.science/CCq46px1QAygO0h3HlPULQ==/#example-report

Note: Rbv is not evaluated on test data but on the same data used in training

@comane
Copy link
Member Author

comane commented Apr 24, 2023

@RoyStegeman , @Zaharid , @scarlehoff
Multiclosure report for Rbv

https://vp.nnpdf.science/7sXVqI6BRgyFFRFzDRazuA==/

the training and test datasets are the same as the one used in NNPDF4.0 CT, see tab 6.2 of NNPDF4.0 paper.
Results seem to be consistent with the ones of M.W.

@scarlehoff
Copy link
Member

Am I reading this correctly? So this is consistent with the closure tests of MW that would mean we’ve had bugs in either closure tests and replica generation for the last two years… (well, we know this is the case because we have solved many, what I mean is that the difference with MW tests were these bugs)

@RoyStegeman
Copy link
Member

Thanks for these results. What changed between these last two reports? Only the in-sample and out-of sample dataset or also changes in the code?

@comane
Copy link
Member Author

comane commented Apr 25, 2023

Only the in-sample and out-of sample dataset or also changes in the code?

Hi @RoyStegeman, the only thing that changed is the in-sample and out-of sample dataset.

Apparently the Rbv strongly depends on the test (out of sample) datasets, the ones I used for this fits are the same that have been used in the NNPDF4.0 paper.
From the report below (which is the same but with table per process) it is clear that the JETS and DIJETS processes have a strong impact on Rbv.

https://vp.nnpdf.science/7sXVqI6BRgyFFRFzDRazuA==/

@RoyStegeman
Copy link
Member

Thanks. Having a quick look at the input PDFs, am I correct that the settings resulting in Rbv=0.8 did not have any out-of-sample datasets? Instead, it looks like it's a fit to the full NNPDF4.0 dataset and Rbv is calculated to this full dataset. Do you know if this is what Samuele has been doing?

Just to make sure - for the report with Rbv=1.0, did you use the exact same settings as MW?

@comane
Copy link
Member Author

comane commented Apr 25, 2023

am I correct that the settings resulting in Rbv=0.8 did not have any out-of-sample datasets? Instead, it looks like it's a fit to the full NNPDF4.0 dataset and Rbv is calculated to this full dataset.

Yes, you are correct.

Do you know if this is what Samuele has been doing?

No, I think that is not what Samuele was doing. Samuele had some out-of-sample datasets, however (@giovannidecrescenzo correct me if I am wrong), these were not the same as the ones used by MW.

Just to make sure - for the report with Rbv=1.0, did you use the exact same settings as MW?

Same fakepdf and same datasets (which are the NNPDF3.1 datasets), some of the other parameters like smallx and largex initialisation are slightly different.

This is the MW fit I am referring to: 210326-mw-001

@alecandido
Copy link
Member

No, I think that is not what Samuele was doing. Samuele had some out-of-sample datasets, however (@giovannidecrescenzo correct me if I am wrong), these where not the same as the ones used by MW.

This I can confirm, Stefano told us last week: Samuele had to pick a different choice of in/out-of-sample datasets, because the main project was to introduce the inconsistencies, and Michael's split was not perfectly suitable for the purpose.

However, we mentioned the option of using the folds of K-folding (and possibly drawing folds at random), in order to split more evenly, and to exploit future improvements in folds creations.

validphys2/src/validphys/filters.py Outdated Show resolved Hide resolved
validphys2/src/validphys/filters.py Outdated Show resolved Hide resolved
validphys2/src/validphys/filters.py Outdated Show resolved Hide resolved
validphys2/src/validphys/filters.py Outdated Show resolved Hide resolved
validphys2/src/validphys/filters.py Outdated Show resolved Hide resolved
validphys2/src/validphys/pseudodata.py Outdated Show resolved Hide resolved
@comane
Copy link
Member Author

comane commented May 11, 2023

Hi @scarlehoff thank you for the review!
Just wondering about this line @pytest.mark.linux in test/test_rebuilddata.py. I had to uncomment it in order to rebuild the regressions/test_filter_rebuild_closure_data.csv file. So I guess that the test was passing for macOS because it was skipped and hence not actually being tested.

@scarlehoff
Copy link
Member

So I guess that the test was passing for macOS because it was not actually being tested.

Yes, indeed. Let's hope the data rebuilt in macOS is bit-by-bit the same as linux otherwise the test will fail just the same.

@comane
Copy link
Member Author

comane commented May 15, 2023

Hi @scarlehoff , not sure why this test is not passing. It looks like an assert error of erf_tr in n3fit/tests/test_fit.py
However, the test passes if I comment the @pytest.mark.linux line and run it on my Mac.
Also, I would expect it to be independent of the above modifications since quickcard.yaml does not have a fakepdf.

@comane comane force-pushed the using_L0_cv_for_MULT_generation_in_CT branch from 905742f to 7ce3d51 Compare May 15, 2023 07:28
@comane comane requested a review from scarlehoff May 15, 2023 11:21
@scarlehoff
Copy link
Member

@giovannidecrescenzo @andreab1997

just a heads up that I'm merging this, so if you were using the latest master for the closure tests then there will be modifications to the numerical values you obtain.

@scarlehoff scarlehoff merged commit 0584835 into master May 17, 2023
@scarlehoff scarlehoff deleted the using_L0_cv_for_MULT_generation_in_CT branch May 17, 2023 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants