Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasources - part 4: documentation #3105

Merged
merged 15 commits into from
Sep 20, 2024
Merged

Conversation

rolnico
Copy link
Member

@rolnico rolnico commented Jul 18, 2024

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines

What kind of change does this PR introduce?
Documentation update --> add documentation on datasources.

Does this PR introduce a breaking change or deprecate an API?

  • Yes
  • No

@rolnico rolnico changed the base branch from main to nro/datasources_3_new_features July 18, 2024 13:33
@rolnico rolnico self-assigned this Jul 18, 2024
@rolnico rolnico changed the title Datasources - part 4: documentation WIP: Datasources - part 4: documentation Jul 18, 2024
@flo-dup flo-dup force-pushed the nro/datasources_3_new_features branch from 47bf1a7 to 2a4ba27 Compare August 1, 2024 14:43
Base automatically changed from nro/datasources_3_new_features to main August 2, 2024 15:11
Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
@flo-dup flo-dup force-pushed the nro/datasources_4_documentation branch from 8916995 to 86fd669 Compare August 9, 2024 15:00
Signed-off-by: Florian Dupuy <florian.dupuy@rte-france.com>
docs/index.md Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Aug 9, 2024

Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
@rolnico rolnico marked this pull request as ready for review September 9, 2024 12:40
@rolnico rolnico changed the title WIP: Datasources - part 4: documentation Datasources - part 4: documentation Sep 9, 2024
Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
So-Fras and others added 3 commits September 18, 2024 11:38
Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem
during Network imports when using `Network.read()` methods.

For importers and exporters, datasources are used to access files corresponding to a single network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For importers and exporters, datasources are used to access files corresponding to a single network
For importers and exporters, datasources are used to access files corresponding to a single network.

Note: this does not apply to compression extensions.

_**Example:**
For a file named `europe.west.xiidm.gz`, the base name could be `europe.west` for instance (or `europe` or `europe.w` or ...), while the data extension would be `xiidm`._
Copy link
Contributor

@jonenst jonenst Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For a file named `europe.west.xiidm.gz`, the base name could be `europe.west` for instance (or `europe` or `europe.w` or ...), while the data extension would be `xiidm`._
For a file named `europe.west.xiidm.gz`, the base name could be `europe.west` for instance (or `europe` or `europe.w` or ...), while the data extension would be `xiidm` and the compression extension `gz`._

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to remove the .gz in the filename since we are not yet talking about compression

Two classes implement the `DataSource` interface:
- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource`
- `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system,
either directly or in an archive.
Copy link
Contributor

@jonenst jonenst Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
either directly or in an archive.
either directly (see below the DirectoryDataSource class and its children) or in an archive (see below the AbstractArchiveDataSource and its children).

`ZstdDirectoryDataSource`.

`DirectoryDataSource` integrates the notions of base name and data extension:
- The base name is used to access files that all start with the same String. For example, `network` would
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The base name is used to access files that all start with the same String. For example, `network` would
- The base name is used to access files that all start with the same prefix. For example, `network` would

`(String suffix, String ext)` as parameters, you still have the possibility to use files that do not correspond to the
base name and data extension by using the methods with `(String filename)` as parameter, excluding the compression
extension if there is one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document that listNames filters by the basename contrary to exists(String filename)

given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the
data extension, as `<directory>/<basename>.<dataExtension>.<archiveExtension>.<compressionExtension>` with the
compression extension being optional depending on the archive format. For example `network.xiidm.zip` contains
`network.xiidm`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document that listnames lists everything without filtering by the basename

datasource.exists("network.south") // Returns false: the file "network.south.gz" does not exist
datasource.exists("network.xiidm") // Returns true: the file "network.xiidm.gz" exists

// Check if some files exist in the datasource by using the `exists(String fileName)` method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check if some files exist in the datasource by using the `exists(String fileName)` method
// Check if some files exist in the datasource by using the `exists(String suffix, String ext)` method

}

// List the files in the datasource
Set<String> files = datasource.listNames(".*") // returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

highlight that it filters out toto.xiidm.gz because of the basename filtering


// List the files in the datasource
Set<String> files = datasource.listNames(".*") // returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More things to eventually document (in this PR or in another):

  • Use different datasource on the same directory to select different network
  • exists(filename) on a file with a different basename that returns true

Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
## Principles

Datasources are Java-objects used for I/O operations around PowSyBl.
It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem
Copy link
Contributor

@jonenst jonenst Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem
A Datasource allows users to read and write files. It is for example used under the hood by Importers to access the filesystem

reading features.
It has two parameters:
- a base name, which is a prefix that can be used to consider only files with this prefix (while reading) or as a prefix for
the output file (while writing),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the output file (while writing),
the output file name (while writing),

`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides
reading features.
It has two parameters:
- a base name, which is a prefix that can be used to consider only files with this prefix (while reading) or as a prefix for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- a base name, which is a prefix that can be used to consider only files with this prefix (while reading) or as a prefix for
- a base name, which is a prefix that can be used to consider only files with names starting with this prefix (while reading) or as a prefix for

Two classes implement the `DataSource` interface:
- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource`
- `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system,
either (see below the DirectoryDataSource class and its children) or in an archive (see below the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
either (see below the DirectoryDataSource class and its children) or in an archive (see below the
either directly (see below the DirectoryDataSource class and its children) or in an archive (see below the

Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc.
- The data extension is the last extension of your main files, excluding the compression extension if they have one.
It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used
to disambiguate the files to use in the datasource, for example when you have files that differ only by the data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit similar to the earlier sentence "(optionally) a data extension, mainly used to disambiguate identically named data of different type." so maybe either remove or make it a lot more specific: not "mainly used". maybesomething like
"just like you can create 2 different datasources selecting a different subset of files in a folder based on a different prefix (e.g. france.xiidm and europe.xiidm), you can use the data extension to select either france.xiidm or france.uct"


// Using a datasource with different parameters allows to use other files, even on the same directory
GzDirectoryDataSource totoDatasource = new GzDirectoryDataSource(testDir, "toto", "xiidm", observer);
oolean totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory
Copy link
Contributor

@jonenst jonenst Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oolean totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory
totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory

Signed-off-by: Nicolas Rol <nicolas.rol@rte-france.com>
Copy link

sonarcloud bot commented Sep 19, 2024

@flo-dup flo-dup merged commit cb7d811 into main Sep 20, 2024
7 checks passed
@flo-dup flo-dup deleted the nro/datasources_4_documentation branch September 20, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants