fix: separate path and filename providers #25

jokasimr · 2024-02-02T10:07:11Z

The old version did not allow loading local files into the workflow, everything had to go through pooch.
With this change the user can directly specify a local file path or the name of a file that is in the pooch data repository.

jl-wynen · 2024-02-02T12:32:08Z

src/essreflectometry/types.py

The idea isn't bad. But I think I would stick with Filename as the type for the actual loader and be more explicit about the argument to pooch. E.g. PoochFilename or BuiltinFilename.

Also, looking ahead, we will need something similar for SciCat. But there, a simple name is not enough. We need at least a (DatasetID, SciCatFilename) tuple. Not sure if this should affect the current PR, but it might inform the way forward.

I'll change the names. Good point about scicat but I don't think that affects the current PR since (I imagine) it would just be a question about adding a SciCatFilePath(id: SciCatFileIdentifier[RunType]) -> Filepath[RunType] provider that downloads the data somewhere and returns the path where it was stored.

Possibly, but that can be difficult. You would need a way to map the RunType to a dataset id. And you need to know which file of the dataset you want.
But this is a discussion for another day

@SimonHeybrock had a different way of doing this for the Zoom data. It's about using different providers depending on whether your files are local or not.
You have to select which provider to use in the notebook.
See for example this local file provider compared to the remote file provider.

We should probably all agree on the same approach and use it everywhere.

Isn't this essentially the same approach but with different names?

I thought this is using different types, and the other approach is to use different providers but the same types?

Possibly, but that can be difficult. You would need a way to map the RunType to a dataset id. And you need to know which file of the dataset you want.
But this is a discussion for another day

Couldn't that mapping be provided by the user just like filenames are provided now? Like this:

pipeline[ScicatFileIdentifier[SampleRun]] = (dataset_id, 'sample_run.nxs') # ... then we'd have a provider like this somewhere: def download_scicat_data(identifier: ScicatFileIdentifier[RunType]) -> Filename[RunType]: ''' Downloads the dataset and puts it somewhere on disk, returns the path'''

I have to amend my original comment. In ESSsans, we use Filename for some basic, user provided filename and FilePath as the actual path on disk. So pooch converts a Filename into a FilePath.

A SciCat provider would convert something like your ScicatFileIdentifier to a FilePath.

This is also listed here scipp/essreduce#7
But this issue for now lists current usage, not a guideline yet. So if you don't like this naming, please comment!

jl-wynen · 2024-02-02T13:02:55Z

src/essreflectometry/amor/__init__.py

@@ -29,6 +29,7 @@
        conversions.providers,
        resolution.providers,
        beamline.providers,
+        data.providers,


Can you remove this? If we insert this provider by default, then every time the user want to provide the file name in a different way, they first have to remove the pooch provider.

Parameters override providers in Sciline so that is not a problem. See example below:

import sciline as sl def test(f: float) -> int: return int(f) sl.Pipeline([test], params={int: 1})

Similarly when replacing providers:

pl = sl.Pipeline([test], params={float: 1.0}) pl[int] = 2 pl.compute(int)

The above does not raise AmbiguousProviderError.
Or did I misunderstand you?

I agree with @jl-wynen, this should be removed. The providers we use for the docs are not useful for normal users. If they forget to insert data (or a data provider) we want an error, not something that silently loads some unknown data and processes this. This could result in someone wasting hours debugging.

It makes sense to exclude providers we only use in docs from the default provider list. I'll remove it.

YooSunYoung · 2024-02-02T14:59:08Z

docs/examples/amor.ipynb

    "    SampleRotation[Reference]: sc.scalar(0.8389, unit='deg'),\n",
-    "    Filename[Reference]: \"reference.nxs\",\n",
+    "    PoochFilename[Reference]: \"reference.nxs\",\n",
    "}"
   ]
  },


Should we also have a short cell that explains how to load local file by setting Filename[Run] in the params here, maybe...?

SimonHeybrock

Please have a look at how I solved a similar problem scipp/esssans#50. Not sure that is perfect, but the idea is that it would allow for adding things such as SciCat, which could convert a filename (or run ID) into a file path.

jokasimr · 2024-02-05T08:41:19Z

Please have a look at how I solved a similar problem scipp/esssans#50. Not sure that is perfect, but the idea is that it would allow for adding things such as SciCat, which could convert a filename (or run ID) into a file path.

@jl-wynen raised a similar concern earlier about future scicat file loading. I don't understand why the solution in this PR would hinder adding a SciCatDataset provider later, and I've described above how I would propose to solve that, but maybe there's something I'm missing, can you explain the problem?

SimonHeybrock · 2024-02-05T11:30:06Z

Please have a look at how I solved a similar problem scipp/esssans#50. Not sure that is perfect, but the idea is that it would allow for adding things such as SciCat, which could convert a filename (or run ID) into a file path.

@jl-wynen raised a similar concern earlier about future scicat file loading. I don't understand why the solution in this PR would hinder adding a SciCatDataset provider later, and I've described above how I would propose to solve that, but maybe there's something I'm missing, can you explain the problem?

The problem is that we have to avoid using a slightly different solution in every project, i.e., we should coordinate.

jokasimr · 2024-02-06T08:10:00Z

The problem is that we have to avoid using a slightly different solution in every project, i.e., we should coordinate.

Absolutely agree.

I was looking through the PR you mentioned for Zoom and I think the solutions are similar but with some differences.
Roughly the types in the Zoom PR corresponds to the types in this PR like this:

# Zoom , # Reflectometry
Filename <-> PoochFilename
FilePath   <-> Filename
DataFolder <-> Does not exist
FilenameType <-> Does not exist

Is there a specific reason to split the file path into folder and name? In my experience that is a source of errors, for example in case you have files with the same name in different folders it is easy to accidentally select the file in the wrong folder.

I also think it seems generally more complicated to add four types instead of two. The way I see it we need N+1 types, 1 for the file path from where we are actually going to read the file, and N more for the N different file sources (Pooch, SciCat, etc).

This is why I'm reluctant to change this to mirror the zoom pr immediately, but if you still think that's the best way to do it then I'll do that, it is very good if file loading is identical everywhere.

jl-wynen · 2024-03-01T12:23:20Z

src/essreflectometry/amor/data.py

@@ -23,11 +26,14 @@ def _make_pooch():
 _pooch = _make_pooch()


-def get_path(name: str) -> str:
+def getpath(name: PoochFilename[Run]) -> FilePath[Run]:


In ESSsans, these functions have this signature:

def get_path(filename: FilenameType) -> FilePath[FilenameType]:

with

FilenameType = TypeVar('FilenameType', bound=str) class FilePath(sciline.Scope[FilenameType, str], str): ...

I don't know why this is parametrised. @nvaytet Can you clarify?

Because FilePath can be used for any file, not just Run files, but also mask files, or the file containing the direct beam function. Does that answer your question?

Yes, thanks!

So @jokasimr, can you change this to match the implementation in ESSsans except for the FilenameType?

I can change it, but I'm not sure what you mean by matching the implementation in ESSsans except for FilenameType?

Should FilePath not be parameterized by anything or should it be parameterized by only the RunType?

It's just about the names. Call the function get_path and the argument Filename. And keep the way it is parametrised.

The old version did not allow loading local files into the workflow, everything had to go through pooch. With this change the user can directly specify a local file path or the name of a file that is in the pooch data repository.

jokasimr · 2024-03-25T15:18:17Z

Closing because this issue was addressed in #40

jokasimr requested a review from nvaytet February 2, 2024 10:07

jl-wynen reviewed Feb 2, 2024

View reviewed changes

jl-wynen requested changes Feb 2, 2024

View reviewed changes

YooSunYoung reviewed Feb 2, 2024

View reviewed changes

SimonHeybrock reviewed Feb 5, 2024

View reviewed changes

jl-wynen mentioned this pull request Feb 6, 2024

Metadata utilities scipp/scippneutron#473

Open

jokasimr requested a review from jl-wynen March 1, 2024 08:05

jokasimr force-pushed the fix-local-files branch from 8fbe33a to 1b52c89 Compare March 1, 2024 11:59

jl-wynen reviewed Mar 1, 2024

View reviewed changes

jokasimr force-pushed the fix-local-files branch 2 times, most recently from ea8ddc1 to fd86bf6 Compare March 11, 2024 09:15

jokasimr enabled auto-merge March 11, 2024 09:17

jokasimr requested a review from jl-wynen March 25, 2024 15:10

fix: separate path and filename providers

4032b0b

The old version did not allow loading local files into the workflow, everything had to go through pooch. With this change the user can directly specify a local file path or the name of a file that is in the pooch data repository.

jokasimr force-pushed the fix-local-files branch from 219b7a5 to 4032b0b Compare March 25, 2024 15:16

Apply automatic formatting

52e6bce

jokasimr closed this Mar 25, 2024

auto-merge was automatically disabled March 25, 2024 15:18
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: separate path and filename providers #25

fix: separate path and filename providers #25

jokasimr commented Feb 2, 2024

jl-wynen Feb 2, 2024

jokasimr Feb 2, 2024 •

edited

Loading

jl-wynen Feb 2, 2024

nvaytet Feb 2, 2024

jl-wynen Feb 2, 2024

nvaytet Feb 2, 2024

jokasimr Feb 5, 2024 •

edited

Loading

jl-wynen Mar 1, 2024

jl-wynen Feb 2, 2024

jokasimr Feb 2, 2024 •

edited

Loading

SimonHeybrock Feb 5, 2024

jokasimr Feb 5, 2024

YooSunYoung Feb 2, 2024

SimonHeybrock left a comment

jokasimr commented Feb 5, 2024 •

edited

Loading

SimonHeybrock commented Feb 5, 2024

jokasimr commented Feb 6, 2024

jl-wynen Mar 1, 2024

nvaytet Mar 8, 2024

jl-wynen Mar 11, 2024

jokasimr Mar 11, 2024

jl-wynen Mar 11, 2024

jokasimr commented Mar 25, 2024

fix: separate path and filename providers #25

fix: separate path and filename providers #25

Conversation

jokasimr commented Feb 2, 2024

Choose a reason for hiding this comment

jokasimr Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jokasimr Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jokasimr Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonHeybrock left a comment

Choose a reason for hiding this comment

jokasimr commented Feb 5, 2024 • edited Loading

SimonHeybrock commented Feb 5, 2024

jokasimr commented Feb 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jokasimr commented Mar 25, 2024

jokasimr Feb 2, 2024 •

edited

Loading

jokasimr Feb 5, 2024 •

edited

Loading

jokasimr Feb 2, 2024 •

edited

Loading

jokasimr commented Feb 5, 2024 •

edited

Loading