-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UX: Improve readability for ISIS workflows #136
Conversation
docs/user-guide/isis/sans2d.ipynb
Outdated
"workflow[DirectBeamFilename] = 'DIRECT_SANS2D_REAR_34327_4m_8mm_16Feb16.dat.h5'\n", | ||
"workflow[Filename[SampleRun]] = 'SANS2D00063114.nxs.h5'\n", | ||
"workflow[Filename[BackgroundRun]] = 'SANS2D00063159.nxs.h5'\n", | ||
"workflow[Filename[EmptyBeamRun]] = 'SANS2D00063091.nxs.h5'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced by this approach. It is nice that get_sans2d_tutorial_data_folder
places all files in the same folder and that we can then simply list the file names. But this pretends that the mechanism is flexible while in reality, the filenames are determined by the get_*_data_folder
functions. So there is no need to list them separately.
If you want to use this as an example for users, this doesn't quite work either. Users would typically (I think) not provide a folder with all their files and list Filename
s. Instead, they would, if they use local files directly, list FilePath
s.
With SciCat, it would also not be possible to list file names after downloading because the names are required for the download itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to use this as an example for users, this doesn't quite work either. Users would typically (I think) not provide a folder with all their files and list Filenames. Instead, they would, if they use local files directly, list FilePaths.
Yes, this was intended as an example for users. But this mechanism is what we have previously implemented, using the DataFolder
mechanism. They can just set the to '.'
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With SciCat, it would also not be possible to list file names after downloading because the names are required for the download itself.
Isn't this a different purpose?
- For SciCat download, the user needs to list which files they need.
- For the workflow, they need to specify which file has which meaning (sample, background, ...).
So it is not clear to me if that duplication can be avoided? Even though one could imagine that download all files from an experiment, such that there is no need to list them individually for download?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was intended as an example for users. But this mechanism is what we have previously implemented, using the DataFolder mechanism. They can just set the to '.', no?
Only if file paths are specified relative to CWD. And then it still goes through the path resolution Filename
-> FilePath
. I think it is cleanest to specify a FilePath
directly to avoid any potential issues with having a bad resolver.
Isn't this a different purpose?
* For SciCat download, the user needs to list which files they need. * For the workflow, they need to specify which file has which meaning (sample, background, ...).
So it is not clear to me if that duplication can be avoided? Even though one could imagine that download all files from an experiment, such that there is no need to list them individually for download?
The datasets should, at least partially, indicate the meaning of a file. E.g., a sample dataset should link to a background dataset, explicitly stating that this is a background. See https://github.com/scipp/essreduce/blob/5f23eaca66727f2507eb0d89896a947d106029d0/src/ess/reduce/scicat.py#L29 as proposed in scipp/essreduce#18.
And conceptually, we also have a (Filename
, DatasetId
) -> FilePath
resolver. So it is pretty much the same as in the pooch case except that it needs an additional input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusion from discussion:
We are considering to remove DataFolder
and FilePath
. These were previously added to "hide" differences between local files and files obtained via Pooch or Scicat. However, they are confusing to users, and complicate the visualized task graph.
Instead, use something like:
workflow[Filename[SampleRun]] = download_tutorial_sample_data() # returns path after pooch download
workflow[Filename[SampleRun]] = download_scicat_dataset(12345)
There are variations to this where a SciCat helper finds, e.g., background runs related to the specified sample run, such that those can be inserted into the workflow automatically.
There are some related questions around other meta data in the SciCat dataset that may be needed by the workflow. There are several solutions to this, but none of them appears to interfere with the decision to remove DataFolder
and FilePath
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvaytet Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's great if we can remove DataFolder
and FilePath
.
I was wondering if the above could still be a bit confusing. If you read it twice, or are used to it, then it makes sense that download_tutorial_sample_data()
would download the file somewhere, and then return the path to that file.
But the first time I read it, putting myself in the mind of a scientist/user, I sort of expected download_tutorial_sample_data()
to return the actual data, not the Filename
.
Can a slighty different (but not too lengthy) name help?
def Sans2dTutorialWorkflow() -> sciline.Pipeline: | ||
""" | ||
Create Sans2d tutorial workflow. | ||
|
||
Equivalent to :func:`Sans2dWorkflow`, but with loaders for tutorial data instead | ||
of Mantid-based loaders. | ||
""" | ||
workflow = Sans2dWorkflow() | ||
workflow.insert(load_tutorial_run) | ||
workflow.insert(load_tutorial_direct_beam) | ||
return workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really make sense to hide the insertions in this function? It is convenient for the tutorials but I see 2 issues with it:
- It increases the number of functions that a user needs to know about.
- It is unclear what is different about this tutorial workflow compared to the regular one. Users may assume that there are more fundamental differences than just how files are loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why does it increase the number? It decreases it from 2 (
load_*
) to 1 (Sans2dTutorialWorkflow
). - Why is it unclear? It is in the docstring as well as mentioned in the notebook (even though that could be extended).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why does it increase the number? It decreases it from 2 (load_*) to 1 (Sans2dTutorialWorkflow).
ok
- Why is it unclear? It is in the docstring as well as mentioned in the notebook (even though that could be extended).
The explanation in the notebook is not completely clear. And many users won't find the docstring. I would like the explanation to be clear about the fact that the workflows are identical except for how they read files and that the files are entirely equivalent and just use a different format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to add an extra sentence in notebook to address this.
docs/user-guide/isis/sans2d.ipynb
Outdated
"We define the reduction parameters, with keys and types given by aliases or types defined in `ess.sans.types`:" | ||
"We begin by creating the Sans2d workflow object (this is a [sciline.Pipeline](https://scipp.github.io/sciline/generated/classes/sciline.Pipeline.html) which can be consulted for advanced usage).\n", | ||
"The Sans2d workflow uses Mantid to load files.\n", | ||
"This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:" | |
"This tutorial comes with files that do not require Mantid, so we use a slightly modified workflow that does not require Mantid:" |
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"workflow.insert(isis.data.transmission_from_background_run)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comments explaining a little more what they are doing? e.g. Use the background run as the transmission run for background, instead of a separate file.
etc...
"id": "0977a3dc-eb5f-4067-80c5-cc19137cc915", | ||
"metadata": {}, | ||
"source": [ | ||
"## Loading local files\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I still think scientists would like to have an example using local files spelled out, as it may not be completely obvious from the explanation about DataFolder
above? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I think we will remove DataFolder
and FilePath
, as it is not needed any more and too confusing.
docs/user-guide/isis/zoom.ipynb
Outdated
"\n", | ||
"We begin by creating the Zoom workflow object (this is a [sciline.Pipeline](https://scipp.github.io/sciline/generated/classes/sciline.Pipeline.html) which can be consulted for advanced usage).\n", | ||
"The Zoom workflow uses Mantid to load files.\n", | ||
"This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:" | |
"This tutorial comes with files that do not require Mantid, so we use a slightly modified workflow that does not require Mantid:" |
"]" | ||
"workflow = isis.zoom.ZoomTutorialWorkflow()\n", | ||
"# For real data use:\n", | ||
"# workflow = isis.zoom.ZoomWorkflow()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Zoom
need to be repeated in the name of the workflow? It's already in the zoom
module?
How about naming it isis.zoom.SANSWorkflow()
, to handle cases here some instruments can do different kinds of workflow (e.g. dream can do powder and sans)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see nothing wrong with the repetition, maybe we will find it useful in the name. Also, otherwise you have multiple identically named functions or classes in different submodules of the same package.
"params[CorrectForGravity] = True\n", | ||
"params[UncertaintyBroadcastMode] = UncertaintyBroadcastMode.upper_bound\n", | ||
"params[ReturnEvents] = False" | ||
"workflow.insert(isis.data.transmission_from_background_run)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments as suggested above for the sans2d notebook?
src/ess/isissans/data.py
Outdated
'SANS2D00063159.nxs.h5', | ||
): | ||
path = get_path(filename) | ||
return DataFolder(str(Path(path).parent)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just returning the path of the last file? Maybe that was intentional? Should we check the paths are all the same for all files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change as discussed above this would become irrelevant.
I think I address all comments now. |
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# left-right layout works better for this graph\n", | ||
"pipeline.visualize(BackgroundSubtractedIofQ, graph_attr={'rankdir': 'LR'})" | ||
"workflow[DirectBeamFilename] = isis.data.sans2d_tutorial_direct_beam()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not entirely sure about this change. I think it may be making the notebook less clear.
Our initial users will be ISs and IDSs, and I think they liked to be able to see the name of the file, because it's usually a file they gave us, and they can then know exactly what is being loaded (even if the file names were not the best).
In addition, it won't necessarily be clear to them what to do if they just want to load a local disk file.
It is also a bit annoying for us that we have to write a function for each file.
Could we make something in-between, like an object or a function that returns file paths? e.g.
finder = TutorialFileFinder()
workflow[DirectBeamFilename] = finder('DIRECT_SANS2D_REAR_34327_4m_8mm_16Feb16.dat')
workflow[Filename[SampleRun]] = finder('SANS2D00063114.nxs')
...
You could then have a ScicatFileFinder
or something with a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make something in-between, like an object or a function that returns file paths?
That worked out badly so far:
- We had to have special filename mappings (adding suffix, special handling of zip files)
- It is confusing for users, since they may thing that they can use the
finder
for their own files.
In addition, it won't necessarily be clear to them what to do if they just want to load a local disk file.
I would argue against that. It think it is much clearer now than before, and having finder
would imho just add more confusion.
It is also a bit annoying for us that we have to write a function for each file.
I think this is actually a positive change. It avoid hard-coding filenames in notebooks and unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since they may thing that they can use the finder for their own files.
True
It avoid hard-coding filenames in notebooks and unit tests.
I agree that it means if we want to change the file used in tests and the notebooks, we only have one place to change.
However, does this mean we should avoid putting file numbers in the name of the function then? e.g. loki_tutorial_sample_run_60339
as this would mean if we change that file we have to go and change multiple places in the codebase?
If so, we may struggle to come up with names for all these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this would mean if we change that file we have to go and change multiple places in the codebase?
A modern IDE can easily rename such functions (easier than finding all occurrences of a filename)?
I chose to include the run numbers in the Loki case (and only there), since the scientists are actually working with those files, and there are multiple. We can name them sample_run_1
and sample_run_2
if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really mind either way.
Maybe it's not really that we need to change multiple places, but it would prevent having a function named loki_tutorial_sample_run_60339()
whose contents get edited and it now returns the file 60777
if one forgets to rename the function.
But once again, I'm ok with either solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it then.
"workflow[Filename[EmptyBeamRun]] = isis.data.zoom_tutorial_empty_beam_run()\n", | ||
"workflow[isis.SampleOffset] = sc.vector([0.0, 0.0, 0.11], unit='m')\n", | ||
"workflow[isis.DetectorBankOffset] = sc.vector([0.0, 0.0, 0.5], unit='m')\n", | ||
"masks = isis.data.zoom_tutorial_mask_filenames()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reason as above, I'm not sure I like hiding the file names.
) | ||
|
||
|
||
def loki_tutorial_run_60392() -> Filename[TransmissionRun[BackgroundRun]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def loki_tutorial_run_60392() -> Filename[TransmissionRun[BackgroundRun]]: | |
def loki_tutorial_background_transmission_run() -> Filename[TransmissionRun[BackgroundRun]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that... it is actually also being used as empty beam run, so I left the name unspecific. Should it be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just also have a loki_tutorial_empty_beam_run()
function that returns the same file name, which could then be changed to return something else in the future if we need that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the user-visible code is "lying", since it will be invisible that the same run is used. I can change it if you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine, you can leave it as is.
Implements changes as suggested for guideline addition in scipp/essreduce#22.
This goes one step further and also improves the currently very confusing way of switching between tutorial data from pooch and "local" data.