Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option to slice out straws/tubes/layers/pixels #57

Closed
wants to merge 3 commits into from

Conversation

nvaytet
Copy link
Member

@nvaytet nvaytet commented Jan 29, 2024

A request from users: while playing around with the direct beam iterations, it would be nice to be able to just run it on a single layer/tube etc to speed up development.

This PR adds the option to slice out the data at the correct point in the workflow in a way that retains identical results.

Comment on lines 36 to +37
def solid_angle(
data: CalibratedMaskedData[RunType],
data: SlicedMaskedData[RunType],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit unhappy about this change. I think the previous version, which said "we compute the solid angle based on calibrated data" was clearly expressing the behavior and the intent of the pipeline. As it is now, it could be anything. If we later change the order of slicing and calibration this will be completely broken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would simply changing the name to SlicedCalibratedMaskedData help or is that not what you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would help partially. I still wonder though if this is correctly expressing the intent. Solid angle could be computed before slicing or after, so this choice is due to cost?

@@ -200,7 +201,7 @@ def calibrate_positions(
# for RawData, MaskedData, ... no reason to restrict necessarily.
# Would we be fine with just choosing on option, or will this get in the way for users?
def detector_to_wavelength(
detector: CalibratedMaskedData[RunType],
detector: SlicedMaskedData[RunType],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above... losing information that this is calibrated.

Could the slicing be combined with masking, in terms of the graph structure, to avoid this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the slicing has to be done after the masking because we need all the detector pixels to run the beam center finder, and the masking needs to be done before doing the beam center finder.

If we run the beam center finder on a single layer we will get a different offset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there actually a compute cost reason for changing this in the workflow? Or could we consider, e.g., returning reduced data per pixel (or in event mode)? Maybe that would take too much memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the PR description, it was because of performance. When Judith asked for this I said she could just inspect the results at the end in a single layer or tube. But she said it would be nice if it was faster when playing around with parameters.

But I am open to alternatives

Copy link
Member

@SimonHeybrock SimonHeybrock Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't loading the files generally the slowest part? If we slice in the workflow it will mean the complete files will be loaded N times of we want to look at N tubes?

Edit: If this is mainly about the direct-beam iteration I suppose file-load is not the performance bottleneck. Would it be possible to load files, make a selection, and call the direct-beam workflow on that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always cache parts of the workflow by setting an intermediate result on the pipeline?
E.g.

res = pipeline.compute(CalibrateMaskedData[SampleRun])
pipeline[CalibrateMaskedData[SampleRun]] = res

But I am not sure if this is dangerous and shouldn't be recommended to the users?

If this is mainly about the direct-beam iteration I suppose file-load is not the performance bottleneck.

Note that we are calling the pipeline inside a loop (and it fact we are computing it twice inside the loop: once for full wavelength range and once for wavelength bands). I think that means we are loading all the files at every loop.
We should probably have better caching inside the iteration loop?

Presumably we can cache the solid angle, the numerator, the transmission fraction and monitors, and only evolve the direct beam?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing a similar thing can be done to speed up the beam_center_finder_from_iofq?

Copy link
Member

@SimonHeybrock SimonHeybrock Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just trying that, but it only helps a bit (3 seconds per iteration). There is another source of bad performance in the wavelength-bands pipeline, I am trying to track that down now.

In other words: We should check if this scientist request is actually an XY problem (this solution was suggested because performance is bad).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is this the merge_spectra function is the bottleneck: https://github.com/scipp/esssans/blob/main/src/esssans/i_of_q.py#L238

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. But it is not the event-mode one, but just the histogram mode. I have some ideas how to improve, will try to look into that tomorrow.

@SimonHeybrock SimonHeybrock mentioned this pull request Jan 31, 2024
@SimonHeybrock
Copy link
Member

I think this has been superseded, please reopen if I am wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants