UX: Improve readability for ISIS workflows #136

SimonHeybrock · 2024-05-01T08:12:28Z

Implements changes as suggested for guideline addition in scipp/essreduce#22.

This goes one step further and also improves the currently very confusing way of switching between tutorial data from pooch and "local" data.

jl-wynen · 2024-05-02T07:55:55Z

docs/user-guide/isis/sans2d.ipynb

+    "workflow[DirectBeamFilename] = 'DIRECT_SANS2D_REAR_34327_4m_8mm_16Feb16.dat.h5'\n",
+    "workflow[Filename[SampleRun]] = 'SANS2D00063114.nxs.h5'\n",
+    "workflow[Filename[BackgroundRun]] = 'SANS2D00063159.nxs.h5'\n",
+    "workflow[Filename[EmptyBeamRun]] = 'SANS2D00063091.nxs.h5'"


I'm not convinced by this approach. It is nice that get_sans2d_tutorial_data_folder places all files in the same folder and that we can then simply list the file names. But this pretends that the mechanism is flexible while in reality, the filenames are determined by the get_*_data_folder functions. So there is no need to list them separately.

If you want to use this as an example for users, this doesn't quite work either. Users would typically (I think) not provide a folder with all their files and list Filenames. Instead, they would, if they use local files directly, list FilePaths.

With SciCat, it would also not be possible to list file names after downloading because the names are required for the download itself.

If you want to use this as an example for users, this doesn't quite work either. Users would typically (I think) not provide a folder with all their files and list Filenames. Instead, they would, if they use local files directly, list FilePaths.

Yes, this was intended as an example for users. But this mechanism is what we have previously implemented, using the DataFolder mechanism. They can just set the to '.', no?

With SciCat, it would also not be possible to list file names after downloading because the names are required for the download itself.

Isn't this a different purpose?

For SciCat download, the user needs to list which files they need.

For the workflow, they need to specify which file has which meaning (sample, background, ...).

So it is not clear to me if that duplication can be avoided? Even though one could imagine that download all files from an experiment, such that there is no need to list them individually for download?

Yes, this was intended as an example for users. But this mechanism is what we have previously implemented, using the DataFolder mechanism. They can just set the to '.', no?

Only if file paths are specified relative to CWD. And then it still goes through the path resolution Filename -> FilePath. I think it is cleanest to specify a FilePath directly to avoid any potential issues with having a bad resolver.

Isn't this a different purpose?

* For SciCat download, the user needs to list which files they need. * For the workflow, they need to specify which file has which meaning (sample, background, ...).

So it is not clear to me if that duplication can be avoided? Even though one could imagine that download all files from an experiment, such that there is no need to list them individually for download?

The datasets should, at least partially, indicate the meaning of a file. E.g., a sample dataset should link to a background dataset, explicitly stating that this is a background. See https://github.com/scipp/essreduce/blob/5f23eaca66727f2507eb0d89896a947d106029d0/src/ess/reduce/scicat.py#L29 as proposed in scipp/essreduce#18.

And conceptually, we also have a (Filename, DatasetId) -> FilePath resolver. So it is pretty much the same as in the pooch case except that it needs an additional input.

Conclusion from discussion:

We are considering to remove DataFolder and FilePath. These were previously added to "hide" differences between local files and files obtained via Pooch or Scicat. However, they are confusing to users, and complicate the visualized task graph.
Instead, use something like:

workflow[Filename[SampleRun]] = download_tutorial_sample_data() # returns path after pooch download workflow[Filename[SampleRun]] = download_scicat_dataset(12345)

There are variations to this where a SciCat helper finds, e.g., background runs related to the specified sample run, such that those can be inserted into the workflow automatically.

There are some related questions around other meta data in the SciCat dataset that may be needed by the workflow. There are several solutions to this, but none of them appears to interfere with the decision to remove DataFolder and FilePath.

@nvaytet Thoughts?

I think it's great if we can remove DataFolder and FilePath.
I was wondering if the above could still be a bit confusing. If you read it twice, or are used to it, then it makes sense that download_tutorial_sample_data() would download the file somewhere, and then return the path to that file.
But the first time I read it, putting myself in the mind of a scientist/user, I sort of expected download_tutorial_sample_data() to return the actual data, not the Filename.
Can a slighty different (but not too lengthy) name help?

jl-wynen · 2024-05-02T07:58:47Z

src/ess/isissans/sans2d.py

+def Sans2dTutorialWorkflow() -> sciline.Pipeline:
+    """
+    Create Sans2d tutorial workflow.
+
+    Equivalent to :func:`Sans2dWorkflow`, but with loaders for tutorial data instead
+    of Mantid-based loaders.
+    """
+    workflow = Sans2dWorkflow()
+    workflow.insert(load_tutorial_run)
+    workflow.insert(load_tutorial_direct_beam)
+    return workflow


Does it really make sense to hide the insertions in this function? It is convenient for the tutorials but I see 2 issues with it:

It increases the number of functions that a user needs to know about.

It is unclear what is different about this tutorial workflow compared to the regular one. Users may assume that there are more fundamental differences than just how files are loaded.

Why does it increase the number? It decreases it from 2 (load_*) to 1 (Sans2dTutorialWorkflow).

Why is it unclear? It is in the docstring as well as mentioned in the notebook (even though that could be extended).

Why does it increase the number? It decreases it from 2 (load_*) to 1 (Sans2dTutorialWorkflow).

ok

Why is it unclear? It is in the docstring as well as mentioned in the notebook (even though that could be extended).

The explanation in the notebook is not completely clear. And many users won't find the docstring. I would like the explanation to be clear about the fact that the workflows are identical except for how they read files and that the files are entirely equivalent and just use a different format.

Agreed to add an extra sentence in notebook to address this.

nvaytet · 2024-05-02T10:53:24Z

docs/user-guide/isis/sans2d.ipynb

-    "We define the reduction parameters, with keys and types given by aliases or types defined in `ess.sans.types`:"
+    "We begin by creating the Sans2d workflow object (this is a [sciline.Pipeline](https://scipp.github.io/sciline/generated/classes/sciline.Pipeline.html) which can be consulted for advanced usage).\n",
+    "The Sans2d workflow uses Mantid to load files.\n",
+    "This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:"


Suggested change

"This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:"

"This tutorial comes with files that do not require Mantid, so we use a slightly modified workflow that does not require Mantid:"

nvaytet · 2024-05-02T11:13:28Z

docs/user-guide/isis/sans2d.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workflow.insert(isis.data.transmission_from_background_run)\n",


Can we add comments explaining a little more what they are doing? e.g. Use the background run as the transmission run for background, instead of a separate file.
etc...

nvaytet · 2024-05-02T11:20:44Z

docs/user-guide/isis/sans2d.ipynb

-   "id": "0977a3dc-eb5f-4067-80c5-cc19137cc915",
-   "metadata": {},
-   "source": [
-    "## Loading local files\n",


Hmm I still think scientists would like to have an example using local files spelled out, as it may not be completely obvious from the explanation about DataFolder above? What do you think?

See above, I think we will remove DataFolder and FilePath, as it is not needed any more and too confusing.

nvaytet · 2024-05-02T11:21:29Z

docs/user-guide/isis/zoom.ipynb

+    "\n",
+    "We begin by creating the Zoom workflow object (this is a [sciline.Pipeline](https://scipp.github.io/sciline/generated/classes/sciline.Pipeline.html) which can be consulted for advanced usage).\n",
+    "The Zoom workflow uses Mantid to load files.\n",
+    "This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:"


Suggested change

"This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:"

"This tutorial comes with files that do not require Mantid, so we use a slightly modified workflow that does not require Mantid:"

nvaytet · 2024-05-02T11:23:13Z

docs/user-guide/isis/zoom.ipynb

-    "]"
+    "workflow = isis.zoom.ZoomTutorialWorkflow()\n",
+    "# For real data use:\n",
+    "# workflow = isis.zoom.ZoomWorkflow()"


Does Zoom need to be repeated in the name of the workflow? It's already in the zoom module?
How about naming it isis.zoom.SANSWorkflow(), to handle cases here some instruments can do different kinds of workflow (e.g. dream can do powder and sans)?

I see nothing wrong with the repetition, maybe we will find it useful in the name. Also, otherwise you have multiple identically named functions or classes in different submodules of the same package.

nvaytet · 2024-05-02T11:23:53Z

docs/user-guide/isis/zoom.ipynb

-    "params[CorrectForGravity] = True\n",
-    "params[UncertaintyBroadcastMode] = UncertaintyBroadcastMode.upper_bound\n",
-    "params[ReturnEvents] = False"
+    "workflow.insert(isis.data.transmission_from_background_run)\n",


Add comments as suggested above for the sans2d notebook?

nvaytet · 2024-05-02T12:18:04Z

src/ess/isissans/data.py

+        'SANS2D00063159.nxs.h5',
+    ):
+        path = get_path(filename)
+    return DataFolder(str(Path(path).parent))


Is this just returning the path of the last file? Maybe that was intentional? Should we check the paths are all the same for all files?

If we change as discussed above this would become irrelevant.

SimonHeybrock · 2024-05-07T08:03:32Z

I think I address all comments now. FilePath and DataFolder are gone. Ready for another review!

nvaytet · 2024-05-07T09:40:39Z

docs/user-guide/isis/sans2d.ipynb

   "metadata": {},
   "outputs": [],
   "source": [
-    "# left-right layout works better for this graph\n",
-    "pipeline.visualize(BackgroundSubtractedIofQ, graph_attr={'rankdir': 'LR'})"
+    "workflow[DirectBeamFilename] = isis.data.sans2d_tutorial_direct_beam()\n",


I'm still not entirely sure about this change. I think it may be making the notebook less clear.
Our initial users will be ISs and IDSs, and I think they liked to be able to see the name of the file, because it's usually a file they gave us, and they can then know exactly what is being loaded (even if the file names were not the best).
In addition, it won't necessarily be clear to them what to do if they just want to load a local disk file.

It is also a bit annoying for us that we have to write a function for each file.

Could we make something in-between, like an object or a function that returns file paths? e.g.

finder = TutorialFileFinder() workflow[DirectBeamFilename] = finder('DIRECT_SANS2D_REAR_34327_4m_8mm_16Feb16.dat') workflow[Filename[SampleRun]] = finder('SANS2D00063114.nxs') ...

You could then have a ScicatFileFinder or something with a better name?

Could we make something in-between, like an object or a function that returns file paths?

That worked out badly so far:

We had to have special filename mappings (adding suffix, special handling of zip files)

It is confusing for users, since they may thing that they can use the finder for their own files.

In addition, it won't necessarily be clear to them what to do if they just want to load a local disk file.

I would argue against that. It think it is much clearer now than before, and having finder would imho just add more confusion.

It is also a bit annoying for us that we have to write a function for each file.

I think this is actually a positive change. It avoid hard-coding filenames in notebooks and unit tests.

since they may thing that they can use the finder for their own files.

True

It avoid hard-coding filenames in notebooks and unit tests.

I agree that it means if we want to change the file used in tests and the notebooks, we only have one place to change.
However, does this mean we should avoid putting file numbers in the name of the function then? e.g. loki_tutorial_sample_run_60339 as this would mean if we change that file we have to go and change multiple places in the codebase?
If so, we may struggle to come up with names for all these functions?

as this would mean if we change that file we have to go and change multiple places in the codebase?

A modern IDE can easily rename such functions (easier than finding all occurrences of a filename)?

I chose to include the run numbers in the Loki case (and only there), since the scientists are actually working with those files, and there are multiple. We can name them sample_run_1 and sample_run_2 if you prefer.

I don't really mind either way.
Maybe it's not really that we need to change multiple places, but it would prevent having a function named loki_tutorial_sample_run_60339() whose contents get edited and it now returns the file 60777 if one forgets to rename the function.
But once again, I'm ok with either solution.

I'll leave it then.

nvaytet · 2024-05-07T09:42:57Z

docs/user-guide/isis/zoom.ipynb

+    "workflow[Filename[EmptyBeamRun]] = isis.data.zoom_tutorial_empty_beam_run()\n",
+    "workflow[isis.SampleOffset] = sc.vector([0.0, 0.0, 0.11], unit='m')\n",
+    "workflow[isis.DetectorBankOffset] = sc.vector([0.0, 0.0, 0.5], unit='m')\n",
+    "masks = isis.data.zoom_tutorial_mask_filenames()\n",


For the same reason as above, I'm not sure I like hiding the file names.

nvaytet · 2024-05-07T10:46:02Z

src/ess/loki/data.py

+    )
+
+
+def loki_tutorial_run_60392() -> Filename[TransmissionRun[BackgroundRun]]:


Suggested change

def loki_tutorial_run_60392() -> Filename[TransmissionRun[BackgroundRun]]:

def loki_tutorial_background_transmission_run() -> Filename[TransmissionRun[BackgroundRun]]:

I was wondering about that... it is actually also being used as empty beam run, so I left the name unspecific. Should it be?

Maybe we can just also have a loki_tutorial_empty_beam_run() function that returns the same file name, which could then be changed to return something else in the future if we need that?

Then the user-visible code is "lying", since it will be invisible that the same run is used. I can change it if you prefer?

It's fine, you can leave it as is.

SimonHeybrock added 7 commits May 1, 2024 05:21

Begin improve notebook readability

5da28c8

Cleaner tutorial setup

244b2b2

Add Sans2dTutorialWorkflow

5b9dc84

Fix Sans2d tests

02a8fc5

Update Zoom notebook

3e21632

Add docstrings

b7b0dd9

Update Zoom tests

5ce77de

SimonHeybrock requested a review from nvaytet May 1, 2024 08:12

SimonHeybrock mentioned this pull request May 1, 2024

feat: add provider for downloading datasets from scicat scipp/essreduce#18

Closed

SimonHeybrock added 3 commits May 1, 2024 10:18

Add missing file

d638815

Update beam-center finder notebook

a082870

Fix __all__

3d0962a

SimonHeybrock mentioned this pull request May 1, 2024

Add guideline on conventional pattern for code-based user interface scipp/essreduce#22

Merged

jl-wynen reviewed May 2, 2024

View reviewed changes

Clarify (absence) of workflow differences

757d506

nvaytet reviewed May 2, 2024

View reviewed changes

nvaytet mentioned this pull request May 2, 2024

Refactor for next Sciline (Optional, param tables, and more) #135

Merged

SimonHeybrock added 9 commits May 7, 2024 05:42

Remove FilePath and DataFolder

9675adf

Update loki

866f82a

Hack to fix mapping test

69f4c9b

Update final tests

cb785bd

Update beam center docs

6c72dd4

Add missing unzip option

f793132

Fix grammar etc

18c62ae

Fix pooch zip handling

698fdc4

Add LokiAtLarmorWorkflow

b01cfa3

Merge branch 'main' into ux-readability

445128c

nvaytet reviewed May 7, 2024

View reviewed changes

nvaytet approved these changes May 7, 2024

View reviewed changes

SimonHeybrock merged commit 53f868a into main May 7, 2024
3 checks passed

SimonHeybrock deleted the ux-readability branch May 7, 2024 11:09

nvaytet mentioned this pull request Jun 6, 2024

Add tests for Workflow objects #145

Closed

SimonHeybrock mentioned this pull request Jun 12, 2024

Workflow guidelines in amor reduction scipp/essreflectometry#48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UX: Improve readability for ISIS workflows #136

UX: Improve readability for ISIS workflows #136

SimonHeybrock commented May 1, 2024

jl-wynen May 2, 2024

SimonHeybrock May 2, 2024

SimonHeybrock May 2, 2024

jl-wynen May 2, 2024

SimonHeybrock May 2, 2024 •

edited

Loading

SimonHeybrock May 2, 2024

nvaytet May 2, 2024 •

edited

Loading

jl-wynen May 2, 2024

SimonHeybrock May 2, 2024 •

edited

Loading

jl-wynen May 2, 2024

SimonHeybrock May 2, 2024

nvaytet May 2, 2024

nvaytet May 2, 2024

nvaytet May 2, 2024

SimonHeybrock May 2, 2024

nvaytet May 2, 2024

nvaytet May 2, 2024

SimonHeybrock May 2, 2024

nvaytet May 2, 2024

nvaytet May 2, 2024

SimonHeybrock May 2, 2024

SimonHeybrock commented May 7, 2024

nvaytet May 7, 2024

SimonHeybrock May 7, 2024

nvaytet May 7, 2024

SimonHeybrock May 7, 2024

nvaytet May 7, 2024

SimonHeybrock May 7, 2024

nvaytet May 7, 2024

nvaytet May 7, 2024

SimonHeybrock May 7, 2024

nvaytet May 7, 2024 •

edited

Loading

SimonHeybrock May 7, 2024

nvaytet May 7, 2024

	"This tutorial comes with files that do not require Mantid, so we use slightly modified a workflow that does not require Mantid:"
	"This tutorial comes with files that do not require Mantid, so we use a slightly modified workflow that does not require Mantid:"

		)


		def loki_tutorial_run_60392() -> Filename[TransmissionRun[BackgroundRun]]:

	def loki_tutorial_run_60392() -> Filename[TransmissionRun[BackgroundRun]]:
	def loki_tutorial_background_transmission_run() -> Filename[TransmissionRun[BackgroundRun]]:

UX: Improve readability for ISIS workflows #136

UX: Improve readability for ISIS workflows #136

Conversation

SimonHeybrock commented May 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonHeybrock May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Conclusion from discussion:

Choose a reason for hiding this comment

nvaytet May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonHeybrock May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonHeybrock commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvaytet May 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonHeybrock May 2, 2024 •

edited

Loading

nvaytet May 2, 2024 •

edited

Loading

SimonHeybrock May 2, 2024 •

edited

Loading

nvaytet May 7, 2024 •

edited

Loading