Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge sans2d and zoom code into isis module #77

Merged
merged 36 commits into from
Feb 15, 2024
Merged

Conversation

nvaytet
Copy link
Member

@nvaytet nvaytet commented Feb 9, 2024

This unifies the sans2d and zoom code into the isis module.

I also used mantidio to make new files for sans2d. In the process, I also dropped the dummy tof coordinate from the data which just had a single (not very useful) bin. So I also made new files for zoom.

Fixes #48 because the new files all have the gravity coordinate added.
Gravity is needed to know which direction is up/down to apply beam center corrections perpendicular to the beam

Edit: Converting to draft because there is a small difference in the results. We are now just masking the second detector panel, when before it was sliced out. Maybe it can make a small difference to the beam center finder?

@@ -51,6 +52,16 @@ def get_monitor_data(
return RawMonitor[RunType, MonitorType](out)


def calibrate_monitor_position(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the fact that I needed to add this. Should the loki.get_monitor_data just return CalibratedMonitor instead of RawMonitor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we know that the raw files contain calibrated positions then we should directly return that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends what we mean by "calibrated". Does it mean that the positions have been modified to be correct, or does it simply mean that the positions are correct?

If the former, then we can't really say that this would return CalibratedMonitor because the positions have not been modified, they are "raw".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to return CalibratedMonitor directly.

@nvaytet nvaytet marked this pull request as draft February 10, 2024 09:39
@@ -246,8 +246,14 @@
"params[WavelengthBins] = sc.linspace(\n",
" 'wavelength', start=2.0, stop=16.0, num=141, unit='angstrom'\n",
")\n",
"params[FileList[TransmissionRun[SampleRun]]] = params[FileList[SampleRun]]\n",
"params[FileList[EmptyBeamRun]] = ['SANS2D00063091.hdf5']\n",
"params[isis.Filename[TransmissionRun[SampleRun]]] = params[isis.Filename[SampleRun]]\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #46, it seems it was forgotten to apply the fix here?

Comment on lines 66 to 67
if dim not in da.coords:
da = da.bin({dim: 1})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to bin twice? That can be costly.

Edit: I see now it was moved here from conversions.py. Still my question stands.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a very lazy way of getting bin bounds that encompassed all the events.
I now changed it to computing the limits by hand if needed.

Comment on lines 34 to 37
def to_path(filename: FilenameType, path: DataFolder) -> FilePath[FilenameType]:
return f'{path}/{filename}'


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed? This is needed for generic instrument-independent (and pooch-independent) setup. My thought was that this would actually move to the top level eventually, i.e., not remain in the isis module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this back in when I made a common data.py file. See if that works.

Comment on lines 76 to 84
def load_run(filename: FilePath[Filename[RunType]]) -> LoadedFileContents[RunType]:
return LoadedFileContents[RunType](sc.io.load_hdf5(filename))


def load_direct_beam(filename: FilePath[DirectBeamFilename]) -> DirectBeam:
return DirectBeam(sc.io.load_hdf5(filename))


providers = (read_xml_detector_masking, load_run, load_direct_beam)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel these belong here, since they only work with some special files that we made for the docs. isis.io should remain clear of that, i.e., contain only functionality that works on "regular" files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid naming this isis.zoom. It contains special data for docs/testing, not just a general workflow for Zoom. I think we should very clearly separate the two. Maybe isis.zoom_data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I then just make 2 sub-folders for sans2d and zoom?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause I thought it would be a little strange to have a zoom_data.py, and a sans2d.py file that contains both things to do with (pooch) data and masking. So I wanted to make:

sans2d/data.py
sans2d/masking.py
zoom/data.py

I guess we could also just make

sans2d_data.py
sans2d_masking.py
zoom_data.py

but I thought the separation would be clearer with subfolders.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I could make a common data.py file for both zoom and sans2d, and then just have a sans2d.py file that contains masking providers (where future sans2d-specific providers could then be added).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you have sans2d contain both things? We should treat the two in the same manner and split the files by purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought masking from masks files is ISIS-specific, not instrument specific, so I would leave that as isis.masking?

@@ -51,6 +52,16 @@ def get_monitor_data(
return RawMonitor[RunType, MonitorType](out)


def calibrate_monitor_position(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we know that the raw files contain calibrated positions then we should directly return that.

),
)
else:
new_bins = np.union1d(edges.values, da.coords[dim].values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the units do not match?

Copy link
Member Author

@nvaytet nvaytet Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually ok in this particular case, because below, we do

new_bins = sc.array(dims=[dim], values=new_bins, unit=edges.unit)
out = da.bin({dim: new_bins})

So the bad unit would be caught by the da.bin (I think?).

However, it's not so obvious to someone reading the code.
I can add an explicit check if you like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a conversion (or check) of the mask edges to the correct unit close to the beginning of the function, where also other checks are performed. Otherwise we get a cryptic error from sc.bin here, or worse, someone will modify the code and the coincidental unit check will suddenly disappear.

@nvaytet
Copy link
Member Author

nvaytet commented Feb 12, 2024

So I've narrowed down the difference in results between the old implementation and new implementation for the SANS2D workflow to the values of the variances.

The data values on the final I(Q) are identical.

For the variances, if I use the default pipeline, I get larger variances (visible ay high Q) in the new implementation.
Screenshot at 2024-02-12 16-42-45

If I use the mode which drops the variances when broadcasting, the curves seem to overlap
Screenshot at 2024-02-12 16-43-37

I think it comes from the fact that the difference between the old and the new is that in the new, we are keeping both detector panels (and masking the second panel), while in the old we were slicing that panel out when loading the data.
I think if we are broadcasting variances, we are broadcasting to the number of pixels, not looking at the masks, so the variances could still end up being larger, even if we have masked the pixels?

Interestingly, looking directly at the variances of the final result (sc.variances(iofq)) shows that even with the UncertaintyBroadcastMode.drop mode, the variances in the new implementation are still larger that in the old one.
Screenshot at 2024-02-12 16-46-36

Is this because there are still some reduction operations that occur over all the pixels?

@nvaytet nvaytet marked this pull request as ready for review February 12, 2024 16:28
@SimonHeybrock
Copy link
Member

I think if we are broadcasting variances, we are broadcasting to the number of pixels, not looking at the masks, so the variances could still end up being larger, even if we have masked the pixels?

That may be it. We should probably take into account masks in the upper-bound estimate, i.e., exclude masked pixels from the scale factor? Can you try if you observe the same difference in event-mode?

Is this because there are still some reduction operations that occur over all the pixels?

Can't think of anything. The solid angle should ignore masking. Are you sure the same number of non-masked pixels is used? Is there some duplication?

@nvaytet nvaytet marked this pull request as draft February 14, 2024 15:11
@nvaytet
Copy link
Member Author

nvaytet commented Feb 14, 2024

It seems I can no longer reproduce the differences above if I set params[UncertaintyBroadcastMode] = UncertaintyBroadcastMode.drop

I set up the params and then did

providers = (
    sans.providers + isis.providers + isis.data.providers + isis.sans2d.providers
)
providers = list(providers + (
    sans.transmission_from_background_run,
    sans.transmission_from_sample_run,
))

pipeline_new = sciline.Pipeline(providers=providers, params=params)

pipeline_old = sciline.Pipeline(providers=providers, params=params)
pipeline_old.insert(isis.general.get_detector_data_sliced)

where

def get_detector_data_sliced(
    dg: LoadedFileContents[RunType],
) -> RawData[RunType]:
    return RawData[RunType](dg['data']['spectrum', :61440].copy())

Then, when computing IofQ

t = BackgroundSubtractedIofQ
old = pipeline_old.compute(t)
new = pipeline_new.compute(t)

both results have identical variances.

I must have messed up in my notebook above.

@nvaytet
Copy link
Member Author

nvaytet commented Feb 14, 2024

Can you try if you observe the same difference in event-mode?

Results are the same in dense and event modes.

@nvaytet nvaytet marked this pull request as ready for review February 14, 2024 16:04
@nvaytet
Copy link
Member Author

nvaytet commented Feb 15, 2024

We should probably take into account masks in the upper-bound estimate, i.e., exclude masked pixels from the scale factor?

Should we do that in a separate PR?

@SimonHeybrock
Copy link
Member

SimonHeybrock commented Feb 15, 2024

We should probably take into account masks in the upper-bound estimate, i.e., exclude masked pixels from the scale factor?

Should we do that in a separate PR?

Yes (see #89), but I am not sure what you said above. Is the issue resolved otherwise, or is there still an unknown source of difference?

@nvaytet
Copy link
Member Author

nvaytet commented Feb 15, 2024

Is the issue resolved otherwise, or is there still an unknown source of difference?

I will check one last time today, but I think there is no difference if we drop the variances. I think I must have messed up in my notebook when I first reported the issue. I will also check with a quick fix that if we take into account the masks in the broadcast of variances, it is also ok.

@nvaytet
Copy link
Member Author

nvaytet commented Feb 15, 2024

I found the bug in my notebook. I was checking sc.variances(new.hist() - old.hist()) instead of sc.variances(new.hist()) - sc.variances(old.hist()). The latter is zero everywhere.

@nvaytet
Copy link
Member Author

nvaytet commented Feb 15, 2024

I have now also verified that taking the masks into account when doing the variance broadcast yields the same results as before.

@nvaytet nvaytet merged commit 53caad4 into main Feb 15, 2024
3 checks passed
@nvaytet nvaytet deleted the merge-sans2d-zoom branch February 15, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gravity vector is needed even if gravity is not used to compute two_theta
2 participants