Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closure test L1 consistency in random noise generation #1695

Merged
merged 6 commits into from
May 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions validphys2/src/validphys/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,19 +98,19 @@ def export_mask(path, mask):
np.savetxt(path, mask, fmt='%d')


def filter_closure_data(filter_path, data, fakepdf, fakenoise, filterseed):
def filter_closure_data(filter_path, data, fakepdf, fakenoise, filterseed, sep_mult):
"""Filter closure data. In addition to cutting data points, the data is
generated from an underlying ``fakepdf``, applying a shift to the data
if ``fakenoise`` is ``True``, which emulates the experimental central values
being shifted away from the underlying law.

"""
log.info('Filtering closure-test data.')
return _filter_closure_data(filter_path, data, fakepdf, fakenoise, filterseed)
return _filter_closure_data(filter_path, data, fakepdf, fakenoise, filterseed, sep_mult)


def filter_closure_data_by_experiment(
filter_path, experiments_data, fakepdf, fakenoise, filterseed, experiments_index
filter_path, experiments_data, fakepdf, fakenoise, filterseed, experiments_index, sep_mult
):
"""
Like :py:func:`filter_closure_data` except filters data by experiment.
Expand All @@ -129,7 +129,7 @@ def filter_closure_data_by_experiment(
]
res.append(
_filter_closure_data(
filter_path, exp, fakepdf, fakenoise, filterseed, experiment_index
filter_path, exp, fakepdf, fakenoise, filterseed, experiment_index, sep_mult
)
)

Expand Down Expand Up @@ -182,6 +182,7 @@ def _filter_real_data(filter_path, data):

def _filter_closure_data(
filter_path, data, fakepdf, fakenoise, filterseed, experiments_index
, sep_mult
):
"""
This function is accessed within a closure test only, that is, the fakedata
Expand Down Expand Up @@ -248,6 +249,7 @@ def _filter_closure_data(
closure_data,
filterseed,
experiments_index,
sep_mult
)

#====== write commondata and systype files ======#
Expand Down
44 changes: 30 additions & 14 deletions validphys2/src/validphys/pseudodata.py
Copy link
Member Author

@comane comane Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scarlehoff, @andreab1997, @giovannidecrescenzo In the code meeting of the 15.03.2023 it was discussed that the needed modifications should be:

covmat = dataset_inputs_covmat_from_systematics(
commondata_wc,
level0_commondata_wc,
dataset_input_list,
use_weights_in_covmat=False,
norm_threshold=None,
_list_of_central_values=None,
_only_additive=sep_mult,
)

level1_data = make_replica(
level0_commondata_wc, filterseed, covmat, sep_mult=sep_mul, genrep=True
)

and adding sep_mult as an arg of make_level1_data.
However, I do not think that this would be correct since when _only_additive=True only the ADD types (see the cd.additive_errors method) in the systype file are selected that is only a subset of the actual points.
It is in fact also clear by running a vp-setupfit that the modifications wrt to the master in the central L1 values would be huge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's surprising. The only difference should be whether the multiplicative error is added with the covmat or after and we said in the past that it didn't make a (big) difference. @andreab1997 maybe we got the combination of true/false wrong?

afair in the fit we have the true-true option for the sampling.

Original file line number Diff line number Diff line change
Expand Up @@ -287,13 +287,32 @@ def level0_commondata_wc(data, fakepdf):


def make_level1_data(
data, level0_commondata_wc, filterseed, experiments_index
data, level0_commondata_wc, filterseed, experiments_index, sep_mult
):
"""
Given a list of level0 commondata instances, return the same list
with central values replaced by level1 data


Given a list of Level 0 commondata instances, return the
same list with central values replaced by Level 1 data.

Level 1 data is generated using validphys.make_replica.
The covariance matrix, from which the stochastic Level 1
noise is sampled, is built from Level 0 commondata
instances (level0_commondata_wc). This, in particular,
means that the multiplicative systematics are generated
from the Level 0 central values.

Note that the covariance matrix used to generate Level 2
pseudodata is consistent with the one used at Level 1
up to corrections of the order eta * eps, where eta and
eps are defined as shown below:

Generate L1 data: L1 = L0 + eta, eta ~ N(0,CL0)
Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)

where CL0 and CL1 means that the multiplicative entries
have been constructed from Level 0 and Level 1 central
values respectively.


Parameters
----------

Expand All @@ -304,9 +323,10 @@ def make_level1_data(
all datasets within one experiment. The central value is replaced
by Level 0 fake data. Cuts already applied.

filterseed: int
filterseed : int
random seed used for the generation of Level 1 data

experiments_index : pandas.MultiIndex

Returns
-------
Expand All @@ -325,25 +345,21 @@ def make_level1_data(
>>> l1_cd
[CommonData(setname='NMC', ndata=204, commondataproc='DIS_NCE', nkin=3, nsys=16)]
"""
# =============== generate experimental covariance matrix ===============#

dataset_input_list = list(data.dsinputs)

commondata_wc = data.load_commondata_instance()

covmat = dataset_inputs_covmat_from_systematics(
commondata_wc,
level0_commondata_wc,
dataset_input_list,
use_weights_in_covmat=False,
norm_threshold=None,
_list_of_central_values=None,
_only_additive=False,
_only_additive=sep_mult,
)

# ================== generation of pseudo data ======================#
# = generate pseudo data starting from theory predictions
# ================== generation of Level1 data ======================#
level1_data = make_replica(
level0_commondata_wc, filterseed, covmat, sep_mult=False, genrep=True
level0_commondata_wc, filterseed, covmat, sep_mult=sep_mult, genrep=True
)

indexed_level1_data = indexed_make_replica(experiments_index, level1_data)
Expand Down
Loading