From c8985ba1b12ecf41153ca7ecc7c126af68d5184d Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 18 Jul 2024 15:31:13 +0200 Subject: [PATCH 01/12] start datasource doc Signed-off-by: Nicolas Rol --- .../datasource/ReadOnlyDataSource.java | 4 +- docs/data/datasources.md | 59 +++++++++++++++++++ docs/index.md | 1 + 3 files changed, 62 insertions(+), 2 deletions(-) create mode 100644 docs/data/datasources.md diff --git a/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java b/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java index 91f29c53f2a..ad36c793775 100644 --- a/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java +++ b/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java @@ -24,9 +24,9 @@ default String getDataExtension() { /** * Check if a file exists in the datasource. The file name will be constructed as: - * {@code .}

+ * {@code .} * @param suffix Suffix to add to the basename of the datasource - * @param ext Extension of the file (for example: .iidm, .xml, .txt, etc.) + * @param ext Extension of the file (for example: iidm, xml, txt, etc.) * @return true if the file exists, else false */ boolean exists(String suffix, String ext) throws IOException; diff --git a/docs/data/datasources.md b/docs/data/datasources.md new file mode 100644 index 00000000000..9ab0c0c43e6 --- /dev/null +++ b/docs/data/datasources.md @@ -0,0 +1,59 @@ +(datasources)= +# Datasources + +## Principles + +Datasources are Java-objects used to facilitate I/O operations around PowSyBl. +It allows users to read and write files + + +## Types of datasources + +Multiple types of datasources exist, depending on whether it shall be writable or not, the kind of storage used, +data location, data compression, etc. + + +(readonlydatasources)= +### ReadOnlyDataSource + +`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides +reading features. +It has two parameters: a base name (corresponding to the starting part of the files the user wants to consider in the +datasource) and a main extension (corresponding to the file extension to consider, compression aside). + +_**Example:** +For a file named `foo.bar.xiidm.gz`, the base name would be `foo.bar` or `foo` while the main extension would be `xiidm`._ + +The main methods `ReadOnlyDataSource` provides are: + +- `exists(String fileName)` and `exists(String suffix, String ext)` to check if a file exists in the datasource +- `newInputStream(String fileName)` and `newInputStream(String suffix, String ext)` to read a file from the datasource +- `listNames(String regex)` to list the files in the datasource whose names match the regex + +The methods with `String suffix, String ext` as parameters look for a file which name will be constructed as +`.`. + +The classes inheriting directly `ReadOnlyDataSource` are: +- `ResourceDataSource`: datasource based on a list of resources +- `ReadOnlyMemDataSource`: datasource where data is stored in a `Map` in memory +- `MultipleReadOnlyDataSource`: datasource grouping multiple user-defined datasources +- `GenericReadOnlyDataSource`: datasource built by creating new datasources of multiple types + +(writabledatasources)= +### DataSource + +The `DataSource` interface extends `ReadOnlyDataSource` by adding writing features through the methods +`newOutputStream(String fileName, boolean append)` and `newOutputStream(String suffix, String ext, boolean append)`. +Those methods allow the user to write in a new file (if `append==false`) or at the end of an existing one (if +`append==true`). + +This interface also provides two methods to create a datasource from a file path (`fromPath(Path file)`) or from a +directory and a file name (`fromPath(Path directory, String fileNameOrBaseName)`) + +Two classes implement the `DataSource` interface: +- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource` +- `AbstractFileSystemDataSource`: datasource based on files present in the file system, either directly or in an archive + +(directorydatasources)= +### DirectoryDataSource + diff --git a/docs/index.md b/docs/index.md index 82cbef6adea..308e4955b76 100644 --- a/docs/index.md +++ b/docs/index.md @@ -22,6 +22,7 @@ grid_model/index.md grid_features/index.md simulation/index data/timeseries +data/datasources user/index.md ``` From 4343f38e7e7b0fcaa0f424739985e6626265a5ea Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 1 Aug 2024 11:37:04 +0200 Subject: [PATCH 02/12] add doc Signed-off-by: Nicolas Rol --- docs/data/datasources.md | 95 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 92 insertions(+), 3 deletions(-) diff --git a/docs/data/datasources.md b/docs/data/datasources.md index 9ab0c0c43e6..648fc3ca627 100644 --- a/docs/data/datasources.md +++ b/docs/data/datasources.md @@ -14,7 +14,7 @@ data location, data compression, etc. (readonlydatasources)= -### ReadOnlyDataSource +### Read-Only DataSource `ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides reading features. @@ -55,5 +55,94 @@ Two classes implement the `DataSource` interface: - `AbstractFileSystemDataSource`: datasource based on files present in the file system, either directly or in an archive (directorydatasources)= -### DirectoryDataSource - +### Directory DataSource + +`DirectoryDataSource` are datasources based on files located in a specific directory directly in the file system. + +Files stored and used via this type of datasource may be all compressed or not at all. Compression formats available are +defined in the class `CompressionFormat`. As of today, the following single-file compressions are available: +BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a corresponding datasource class inheriting +`DirectoryDataSource`: `Bzip2DirectoryDataSource`, `GzDirectoryDataSource`, `XZDirectoryDataSource`, +`ZstdDirectoryDataSource`. + +`DirectoryDataSource` integrates the notions of base name and data extension: +- The base name is used to facilitate the access to files that all start with the same String. For example, `foo` would +be a good base name if your files are `foo.xiidm`, `foo_bar.xiidm`, `foo_mapping.csv`, etc. +- The data extension is the last extension of your files, excluding the compression extension if they have one. +It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used +to identify the files to use in the datasource, for example when importing networks using the Importers implemented in +powsybl. + +Even if `DirectoryDataSource` integrates the notions of base name and data extension, you still have the possibility to +use files that do not correspond to the base name and data extension by directly providing their names, excluding the +compression extension. + +(archivedatasources)= +### Archive DataSource + +`AbstractArchiveDataSource` are datasources based on files located in a specific archive, in the file system. As of today, +two classes implements `AbstractArchiveDataSource`: `ZipArchiveDataSource` and `TarArchiveDataSource` + +While the files located in the archive **may not** be compressed, the archive file itself can be, depending on the +archive format: +- A Zip archive is also already compressed so the compression format for `ZipArchiveDataSource` is always ZIP. +- A Tar archive can be compressed by any compression format, excluding ZIP (since it would create a Zip archive containing +the Tar archive): BZIP2, GZIP, XZ or ZSTD. + +Just like `DirectoryDataSource`, the archive datasources integrate the notions of base name and data extension. If not +given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the +data extension, as `/...` with the +compression extension being optional depending on the archive format. + + +## Example + +Let's consider a directory containing the following files: + +```java +/* + directory + ├── foo + ├── foo.bar + ├── foo.xiidm.gz + ├── foo.v3.xiidm.gz + ├── foo.gz + └── toto.xiidm.gz + */ +``` + +A datasource on this directory could be used this way: + +```java +// Creation of a directory datasource with compression +GzDirectoryDataSource datasource = new GzDirectoryDataSource(testDir, "foo", "xiidm", observer); + +// Check if some files exist in the datasource by using the `exists(String fileName)` method +// Since the datasource uses Gzip compression, ".gz" is added to the provided fileName parameter +datasource.exists("test.toto") // Returns false: the file "test.toto.gz" does not exist in the directory +datasource.exists("foo.bar") // Returns false: the file "foo.bar.gz" does not exist +datasource.exists("foo.xiidm") // Returns true: the file "foo.xiidm.gz" exists + +// Check if some files exist in the datasource by using the `exists(String fileName)` method +datasource.exists("_bar", "baz") // Returns false: the file "foo_bar.baz.gz" does not exist in the directory +datasource.exists(null, "xiidm") // Returns true: the file "foo.xiidm.gz" exists in the directory +datasource.exists(null, null) // Returns true: the file "foo.gz" exists in the directory + +// We can create some a new file "foo_test.txt.gz" and write "line1" inside +try (OutputStream os = dataSource.newOutputStream("_test", "txt", false)) { + os.write("line1".getBytes(StandardCharsets.UTF_8)); +} + +// Another line can be added to the same file by setting the `append` boolean parameter to true +try (OutputStream os = dataSource.newOutputStream("_test", "txt", true)) { + os.write("line2".getBytes(StandardCharsets.UTF_8)); +} + +// We can read the file +try (InputStream is = dataSource.newInputStream("_test", "txt")) { + System.out.println(ByteStreams.toByteArray(is)); // Displays "line1" then "line2" +} + +// List the files in the datasource +Set files = datasource.listNames(".*") // returns a set containing: "foo", "foo.bar", "foo.xiidm", "foo.v3.xiidm", "foo_test.txt" +``` \ No newline at end of file From a976fd7068bdbdf18afb04195de814a77c4c5ca5 Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 1 Aug 2024 11:39:35 +0200 Subject: [PATCH 03/12] add doc Signed-off-by: Nicolas Rol --- docs/data/datasources.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data/datasources.md b/docs/data/datasources.md index 648fc3ca627..ea6543dbf95 100644 --- a/docs/data/datasources.md +++ b/docs/data/datasources.md @@ -87,7 +87,7 @@ While the files located in the archive **may not** be compressed, the archive fi archive format: - A Zip archive is also already compressed so the compression format for `ZipArchiveDataSource` is always ZIP. - A Tar archive can be compressed by any compression format, excluding ZIP (since it would create a Zip archive containing -the Tar archive): BZIP2, GZIP, XZ or ZSTD. +the Tar archive): BZIP2, GZIP, XZ or ZSTD. It can also not be compressed. Just like `DirectoryDataSource`, the archive datasources integrate the notions of base name and data extension. If not given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the From d45e7dbf996abf0f4db96344a14019af566d454e Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 1 Aug 2024 11:40:11 +0200 Subject: [PATCH 04/12] add doc Signed-off-by: Nicolas Rol --- docs/data/datasources.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/data/datasources.md b/docs/data/datasources.md index ea6543dbf95..073f92639ab 100644 --- a/docs/data/datasources.md +++ b/docs/data/datasources.md @@ -99,16 +99,14 @@ compression extension being optional depending on the archive format. Let's consider a directory containing the following files: -```java -/* - directory - ├── foo - ├── foo.bar - ├── foo.xiidm.gz - ├── foo.v3.xiidm.gz - ├── foo.gz - └── toto.xiidm.gz - */ +``` +directory +├── foo +├── foo.bar +├── foo.xiidm.gz +├── foo.v3.xiidm.gz +├── foo.gz +└── toto.xiidm.gz ``` A datasource on this directory could be used this way: From 86fd669d51f6f346a62cb4c7cf134dd6c4779d73 Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 1 Aug 2024 11:41:46 +0200 Subject: [PATCH 05/12] add doc Signed-off-by: Nicolas Rol --- docs/data/datasources.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/data/datasources.md b/docs/data/datasources.md index 073f92639ab..92fd6dd2ade 100644 --- a/docs/data/datasources.md +++ b/docs/data/datasources.md @@ -4,7 +4,8 @@ ## Principles Datasources are Java-objects used to facilitate I/O operations around PowSyBl. -It allows users to read and write files +It allows users to read and write files. It is for example used by Importers during Network imports when using +`Network.read()` methods. ## Types of datasources From 404b6e54e3c9bbe4815722c170ba57141a5000f8 Mon Sep 17 00:00:00 2001 From: Florian Dupuy Date: Fri, 9 Aug 2024 17:07:58 +0200 Subject: [PATCH 06/12] Replace main with data, base name examples, bullet points added Signed-off-by: Florian Dupuy --- docs/data/datasources.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/data/datasources.md b/docs/data/datasources.md index 92fd6dd2ade..331897c0c00 100644 --- a/docs/data/datasources.md +++ b/docs/data/datasources.md @@ -19,11 +19,13 @@ data location, data compression, etc. `ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides reading features. -It has two parameters: a base name (corresponding to the starting part of the files the user wants to consider in the -datasource) and a main extension (corresponding to the file extension to consider, compression aside). +It has two parameters: +- a base name (corresponding to the starting part of the files the user wants to consider in the +datasource), +- a data extension (corresponding to the file extension to consider, compression aside). _**Example:** -For a file named `foo.bar.xiidm.gz`, the base name would be `foo.bar` or `foo` while the main extension would be `xiidm`._ +For a file named `foo.bar.xiidm.gz`, the base name could be `foo.bar` for instance (or `foo` or `foo.b` or ...), while the data extension would be `xiidm`._ The main methods `ReadOnlyDataSource` provides are: From 0578d395cc568b312da1af826d7957986450edae Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Mon, 9 Sep 2024 14:40:00 +0200 Subject: [PATCH 07/12] moved files Signed-off-by: Nicolas Rol --- docs/data/index.md | 11 +++++++++++ .../going_further}/datasources.md | 0 docs/grid_exchange_formats/going_further/index.md | 11 +++++++++++ docs/grid_exchange_formats/index.md | 1 + docs/index.md | 3 +-- 5 files changed, 24 insertions(+), 2 deletions(-) create mode 100644 docs/data/index.md rename docs/{data => grid_exchange_formats/going_further}/datasources.md (100%) create mode 100644 docs/grid_exchange_formats/going_further/index.md diff --git a/docs/data/index.md b/docs/data/index.md new file mode 100644 index 00000000000..318f622136c --- /dev/null +++ b/docs/data/index.md @@ -0,0 +1,11 @@ +# Data models + +In this section, you'll discover how data is modelled in PowSyBl. + +```{toctree} +--- +maxdepth: 1 +--- +timeseries +``` + diff --git a/docs/data/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md similarity index 100% rename from docs/data/datasources.md rename to docs/grid_exchange_formats/going_further/datasources.md diff --git a/docs/grid_exchange_formats/going_further/index.md b/docs/grid_exchange_formats/going_further/index.md new file mode 100644 index 00000000000..13cf52f1722 --- /dev/null +++ b/docs/grid_exchange_formats/going_further/index.md @@ -0,0 +1,11 @@ +# Going further + +In this section, you'll discover some advanced features related to grid exchange formats and how to use them. + +```{toctree} +--- +maxdepth: 1 +--- +datasources +``` + diff --git a/docs/grid_exchange_formats/index.md b/docs/grid_exchange_formats/index.md index 37dbd4b5507..a0631ca72bf 100644 --- a/docs/grid_exchange_formats/index.md +++ b/docs/grid_exchange_formats/index.md @@ -33,4 +33,5 @@ ieee/ieee.md matpower/index.md psse/index.md ampl/index.md +going_further/index ``` diff --git a/docs/index.md b/docs/index.md index 308e4955b76..217b2293e8d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -21,8 +21,7 @@ grid_exchange_formats/index grid_model/index.md grid_features/index.md simulation/index -data/timeseries -data/datasources +data/index user/index.md ``` From 422596893ce65c442290a10a58e8fb8497e66270 Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Fri, 13 Sep 2024 11:59:36 +0200 Subject: [PATCH 08/12] rewrite some parts Signed-off-by: Nicolas Rol --- .../going_further/datasources.md | 76 ++++++++++--------- 1 file changed, 40 insertions(+), 36 deletions(-) diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md index 331897c0c00..3c302ca538b 100644 --- a/docs/grid_exchange_formats/going_further/datasources.md +++ b/docs/grid_exchange_formats/going_further/datasources.md @@ -4,8 +4,8 @@ ## Principles Datasources are Java-objects used to facilitate I/O operations around PowSyBl. -It allows users to read and write files. It is for example used by Importers during Network imports when using -`Network.read()` methods. +It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem +during Network imports when using `Network.read()` methods. ## Types of datasources @@ -20,12 +20,13 @@ data location, data compression, etc. `ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides reading features. It has two parameters: -- a base name (corresponding to the starting part of the files the user wants to consider in the -datasource), -- a data extension (corresponding to the file extension to consider, compression aside). +- a base name (corresponding to the common prefix in the names of multiple related files the user wants to +consider in the datasource), +- (optionally) a data extension, mainly used to disambiguate identically named data of different type. +Note: this does not apply to compression extensions. _**Example:** -For a file named `foo.bar.xiidm.gz`, the base name could be `foo.bar` for instance (or `foo` or `foo.b` or ...), while the data extension would be `xiidm`._ +For a file named `europe.west.xiidm.gz`, the base name could be `europe.west` for instance (or `europe` or `europe.w` or ...), while the data extension would be `xiidm`._ The main methods `ReadOnlyDataSource` provides are: @@ -37,10 +38,10 @@ The methods with `String suffix, String ext` as parameters look for a file which `.`. The classes inheriting directly `ReadOnlyDataSource` are: -- `ResourceDataSource`: datasource based on a list of resources +- `ResourceDataSource`: datasource based on a list of java classpath resources - `ReadOnlyMemDataSource`: datasource where data is stored in a `Map` in memory - `MultipleReadOnlyDataSource`: datasource grouping multiple user-defined datasources -- `GenericReadOnlyDataSource`: datasource built by creating new datasources of multiple types +- `GenericReadOnlyDataSource`: datasource used to read data from any known compressed format (writabledatasources)= ### DataSource @@ -50,12 +51,14 @@ The `DataSource` interface extends `ReadOnlyDataSource` by adding writing featur Those methods allow the user to write in a new file (if `append==false`) or at the end of an existing one (if `append==true`). -This interface also provides two methods to create a datasource from a file path (`fromPath(Path file)`) or from a -directory and a file name (`fromPath(Path directory, String fileNameOrBaseName)`) +This interface also provides two static convenience methods (`fromPath(Path file)` and +`fromPath(Path directory, String fileNameOrBaseName)`) for the different use cases like writing data to the local +filesystem, and ensuring that the target folder already exists. Two classes implement the `DataSource` interface: - `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource` -- `AbstractFileSystemDataSource`: datasource based on files present in the file system, either directly or in an archive +- `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system, +either directly or in an archive. (directorydatasources)= ### Directory DataSource @@ -69,16 +72,17 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi `ZstdDirectoryDataSource`. `DirectoryDataSource` integrates the notions of base name and data extension: -- The base name is used to facilitate the access to files that all start with the same String. For example, `foo` would -be a good base name if your files are `foo.xiidm`, `foo_bar.xiidm`, `foo_mapping.csv`, etc. +- The base name is used to facilitate the access to files that all start with the same String. For example, `network` would +be a good base name if your files are `network.xiidm`, `network_reduced.xiidm`, `network_mapping.csv`, etc. - The data extension is the last extension of your files, excluding the compression extension if they have one. It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used -to identify the files to use in the datasource, for example when importing networks using the Importers implemented in -powsybl. +to disambiguate the files to use in the datasource, for example when you have files that differ only by the data +extension (e.g. `network.xiidm` and `network.xml` in the same folder representing two different networks). -Even if `DirectoryDataSource` integrates the notions of base name and data extension, you still have the possibility to -use files that do not correspond to the base name and data extension by directly providing their names, excluding the -compression extension. +Even if `DirectoryDataSource` integrates the notions of base name and data extension in the methods with +`(String suffix, String ext)` as parameters, you still have the possibility to use files that do not correspond to the +base name and data extension by using the methods with `(String filename)` as parameter, excluding the compression +extension if there is one. (archivedatasources)= ### Archive DataSource @@ -86,16 +90,16 @@ compression extension. `AbstractArchiveDataSource` are datasources based on files located in a specific archive, in the file system. As of today, two classes implements `AbstractArchiveDataSource`: `ZipArchiveDataSource` and `TarArchiveDataSource` -While the files located in the archive **may not** be compressed, the archive file itself can be, depending on the -archive format: +While the files located in the archive **have to be uncompressed**, the archive file itself can be compressed, depending +on the archive format: - A Zip archive is also already compressed so the compression format for `ZipArchiveDataSource` is always ZIP. -- A Tar archive can be compressed by any compression format, excluding ZIP (since it would create a Zip archive containing -the Tar archive): BZIP2, GZIP, XZ or ZSTD. It can also not be compressed. +- A Tar archive can be compressed by: BZIP2, GZIP, XZ or ZSTD. It can also not be compressed. Just like `DirectoryDataSource`, the archive datasources integrate the notions of base name and data extension. If not given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the data extension, as `/...` with the -compression extension being optional depending on the archive format. +compression extension being optional depending on the archive format. For example `network.xiidm.zip` contains +`network.xiidm`. ## Example @@ -104,11 +108,11 @@ Let's consider a directory containing the following files: ``` directory -├── foo -├── foo.bar -├── foo.xiidm.gz -├── foo.v3.xiidm.gz -├── foo.gz +├── network +├── network.south +├── network.xiidm.gz +├── network.v3.xiidm.gz +├── network.gz └── toto.xiidm.gz ``` @@ -116,20 +120,20 @@ A datasource on this directory could be used this way: ```java // Creation of a directory datasource with compression -GzDirectoryDataSource datasource = new GzDirectoryDataSource(testDir, "foo", "xiidm", observer); +GzDirectoryDataSource datasource = new GzDirectoryDataSource(testDir, "network", "xiidm", observer); // Check if some files exist in the datasource by using the `exists(String fileName)` method // Since the datasource uses Gzip compression, ".gz" is added to the provided fileName parameter datasource.exists("test.toto") // Returns false: the file "test.toto.gz" does not exist in the directory -datasource.exists("foo.bar") // Returns false: the file "foo.bar.gz" does not exist -datasource.exists("foo.xiidm") // Returns true: the file "foo.xiidm.gz" exists +datasource.exists("network.south") // Returns false: the file "network.south.gz" does not exist +datasource.exists("network.xiidm") // Returns true: the file "network.xiidm.gz" exists // Check if some files exist in the datasource by using the `exists(String fileName)` method -datasource.exists("_bar", "baz") // Returns false: the file "foo_bar.baz.gz" does not exist in the directory -datasource.exists(null, "xiidm") // Returns true: the file "foo.xiidm.gz" exists in the directory -datasource.exists(null, null) // Returns true: the file "foo.gz" exists in the directory +datasource.exists("_south", "reduced") // Returns false: the file "network_south.reduced.gz" does not exist in the directory +datasource.exists(null, "xiidm") // Returns true: the file "network.xiidm.gz" exists in the directory +datasource.exists(null, null) // Returns true: the file "network.gz" exists in the directory -// We can create some a new file "foo_test.txt.gz" and write "line1" inside +// We can create some a new file "network_test.txt.gz" and write "line1" inside try (OutputStream os = dataSource.newOutputStream("_test", "txt", false)) { os.write("line1".getBytes(StandardCharsets.UTF_8)); } @@ -145,5 +149,5 @@ try (InputStream is = dataSource.newInputStream("_test", "txt")) { } // List the files in the datasource -Set files = datasource.listNames(".*") // returns a set containing: "foo", "foo.bar", "foo.xiidm", "foo.v3.xiidm", "foo_test.txt" +Set files = datasource.listNames(".*") // returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt" ``` \ No newline at end of file From e821d6d1842bd6577ca179c5c797e609a2af3e37 Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 19 Sep 2024 09:56:40 +0200 Subject: [PATCH 09/12] update doc Signed-off-by: Nicolas Rol --- .../going_further/datasources.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md index 3c302ca538b..9d54752db07 100644 --- a/docs/grid_exchange_formats/going_further/datasources.md +++ b/docs/grid_exchange_formats/going_further/datasources.md @@ -7,6 +7,7 @@ Datasources are Java-objects used to facilitate I/O operations around PowSyBl. It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem during Network imports when using `Network.read()` methods. +For importers and exporters, datasources are used to access files corresponding to a single network ## Types of datasources @@ -52,8 +53,11 @@ Those methods allow the user to write in a new file (if `append==false`) or at t `append==true`). This interface also provides two static convenience methods (`fromPath(Path file)` and -`fromPath(Path directory, String fileNameOrBaseName)`) for the different use cases like writing data to the local -filesystem, and ensuring that the target folder already exists. +`fromPath(Path directory, String fileNameOrBaseName)`) for the different use cases like reading data from the local +filesystem, and ensuring that the target file exists. These methods have their opposite in the class `Exporters` +named `createDataSource(Path file)` and used to write data on the local filesystem, while ensuring that the target file +given as parameter is not a directory. All those methods then make use of `DataSourceUtil.createDataSource` to build +the datasource. Two classes implement the `DataSource` interface: - `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource` @@ -72,9 +76,9 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi `ZstdDirectoryDataSource`. `DirectoryDataSource` integrates the notions of base name and data extension: -- The base name is used to facilitate the access to files that all start with the same String. For example, `network` would -be a good base name if your files are `network.xiidm`, `network_reduced.xiidm`, `network_mapping.csv`, etc. -- The data extension is the last extension of your files, excluding the compression extension if they have one. +- The base name is used to access files that all start with the same String. For example, `network` would +be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc. +- The data extension is the last extension of your main files, excluding the compression extension if they have one. It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used to disambiguate the files to use in the datasource, for example when you have files that differ only by the data extension (e.g. `network.xiidm` and `network.xml` in the same folder representing two different networks). @@ -109,9 +113,10 @@ Let's consider a directory containing the following files: ``` directory ├── network -├── network.south +├── network.south ├── network.xiidm.gz ├── network.v3.xiidm.gz +├── network_mapping.csv.gz ├── network.gz └── toto.xiidm.gz ``` @@ -131,7 +136,7 @@ datasource.exists("network.xiidm") // Returns true: the file "network.xiidm.gz" // Check if some files exist in the datasource by using the `exists(String fileName)` method datasource.exists("_south", "reduced") // Returns false: the file "network_south.reduced.gz" does not exist in the directory datasource.exists(null, "xiidm") // Returns true: the file "network.xiidm.gz" exists in the directory -datasource.exists(null, null) // Returns true: the file "network.gz" exists in the directory +datasource.exists("_mapping", "csv") // Returns true: the file "network_mapping.csv.gz" exists in the directory // We can create some a new file "network_test.txt.gz" and write "line1" inside try (OutputStream os = dataSource.newOutputStream("_test", "txt", false)) { From b537d99014135b99e1330cb671abda89dd6488bd Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 19 Sep 2024 14:01:03 +0200 Subject: [PATCH 10/12] add doc Signed-off-by: Nicolas Rol --- .../going_further/datasources.md | 46 +++++++++++++------ 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md index 9d54752db07..9378ca224a1 100644 --- a/docs/grid_exchange_formats/going_further/datasources.md +++ b/docs/grid_exchange_formats/going_further/datasources.md @@ -3,11 +3,11 @@ ## Principles -Datasources are Java-objects used to facilitate I/O operations around PowSyBl. +Datasources are Java-objects used for I/O operations around PowSyBl. It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem during Network imports when using `Network.read()` methods. -For importers and exporters, datasources are used to access files corresponding to a single network +For importers and exporters, datasources are used to access files corresponding to a single network. ## Types of datasources @@ -21,13 +21,14 @@ data location, data compression, etc. `ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides reading features. It has two parameters: -- a base name (corresponding to the common prefix in the names of multiple related files the user wants to -consider in the datasource), +- a base name, which is a prefix that can be used to consider only files with this prefix (while reading) or as a prefix for +the output file (while writing), - (optionally) a data extension, mainly used to disambiguate identically named data of different type. Note: this does not apply to compression extensions. _**Example:** -For a file named `europe.west.xiidm.gz`, the base name could be `europe.west` for instance (or `europe` or `europe.w` or ...), while the data extension would be `xiidm`._ +For a file named `europe.west.xiidm`, the base name could be `europe.west` for instance (or `europe` or `europe.w` +or ...), while the data extension would be `xiidm`._ The main methods `ReadOnlyDataSource` provides are: @@ -62,7 +63,8 @@ the datasource. Two classes implement the `DataSource` interface: - `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource` - `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system, -either directly or in an archive. +either (see below the DirectoryDataSource class and its children) or in an archive (see below the +AbstractArchiveDataSource class and its children). (directorydatasources)= ### Directory DataSource @@ -76,7 +78,7 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi `ZstdDirectoryDataSource`. `DirectoryDataSource` integrates the notions of base name and data extension: -- The base name is used to access files that all start with the same String. For example, `network` would +- The base name is used to access files that all start with the same prefix. For example, `network` would be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc. - The data extension is the last extension of your main files, excluding the compression extension if they have one. It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used @@ -88,6 +90,9 @@ Even if `DirectoryDataSource` integrates the notions of base name and data exten base name and data extension by using the methods with `(String filename)` as parameter, excluding the compression extension if there is one. +In addition to filtering with the regex parameter, in directory datasources the method `listNames(String regex)` filters +filenames to only keep those starting with the basename. + (archivedatasources)= ### Archive DataSource @@ -105,6 +110,8 @@ data extension, as `/... files = datasource.listNames(".*") // returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt" +Set files = datasource.listNames(".*"); +// returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt", "network_mapping.csv.gz" +// The file "toto.xiidm.gz" is not listed due to the basename filtering + +// Using a datasource with different parameters allows to use other files, even on the same directory +GzDirectoryDataSource totoDatasource = new GzDirectoryDataSource(testDir, "toto", "xiidm", observer); +oolean totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory +Set files = totoDatasource.listNames(".*"); +// returns a set containing: "toto.xiidm.gz" ``` \ No newline at end of file From 5b44ebfd0a6d13817878fd2f5ae6b49e4aeac605 Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 19 Sep 2024 15:09:47 +0200 Subject: [PATCH 11/12] fix doc Signed-off-by: Nicolas Rol --- docs/grid_exchange_formats/going_further/datasources.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md index 9378ca224a1..d6913fda97e 100644 --- a/docs/grid_exchange_formats/going_further/datasources.md +++ b/docs/grid_exchange_formats/going_further/datasources.md @@ -4,7 +4,7 @@ ## Principles Datasources are Java-objects used for I/O operations around PowSyBl. -It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem +Datasources allow users to read and write files. It is for example used under the hood by Importers to access the filesystem during Network imports when using `Network.read()` methods. For importers and exporters, datasources are used to access files corresponding to a single network. @@ -21,8 +21,8 @@ data location, data compression, etc. `ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides reading features. It has two parameters: -- a base name, which is a prefix that can be used to consider only files with this prefix (while reading) or as a prefix for -the output file (while writing), +- a base name, which is a prefix that can be used to consider only files with names starting with this prefix (while +reading) or as a prefix for the output file name (while writing), - (optionally) a data extension, mainly used to disambiguate identically named data of different type. Note: this does not apply to compression extensions. @@ -63,7 +63,7 @@ the datasource. Two classes implement the `DataSource` interface: - `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource` - `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system, -either (see below the DirectoryDataSource class and its children) or in an archive (see below the +either directly (see below the DirectoryDataSource class and its children) or in an archive (see below the AbstractArchiveDataSource class and its children). (directorydatasources)= From d40457179e85295886008358223ea9d5f6cc493a Mon Sep 17 00:00:00 2001 From: Nicolas Rol Date: Thu, 19 Sep 2024 16:35:56 +0200 Subject: [PATCH 12/12] fix doc Signed-off-by: Nicolas Rol --- .../going_further/datasources.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md index d6913fda97e..a88c3eeeafc 100644 --- a/docs/grid_exchange_formats/going_further/datasources.md +++ b/docs/grid_exchange_formats/going_further/datasources.md @@ -81,9 +81,10 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi - The base name is used to access files that all start with the same prefix. For example, `network` would be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc. - The data extension is the last extension of your main files, excluding the compression extension if they have one. -It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used -to disambiguate the files to use in the datasource, for example when you have files that differ only by the data -extension (e.g. `network.xiidm` and `network.xml` in the same folder representing two different networks). +It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is used +to disambiguate the files to use in the datasource: just like you can create two different datasources selecting a +different subset of files in a folder based on a different base name (e.g. `france.xiidm` and `europe.xiidm`), you can +use the data extension to select either `france.xiidm` or `france.uct`. Even if `DirectoryDataSource` integrates the notions of base name and data extension in the methods with `(String suffix, String ext)` as parameters, you still have the possibility to use files that do not correspond to the @@ -91,7 +92,7 @@ base name and data extension by using the methods with `(String filename)` as pa extension if there is one. In addition to filtering with the regex parameter, in directory datasources the method `listNames(String regex)` filters -filenames to only keep those starting with the basename. +filenames to only keep those starting with the base name. (archivedatasources)= ### Archive DataSource @@ -111,7 +112,7 @@ compression extension being optional depending on the archive format. For exampl `network.xiidm`. Unlike in directory datasources, in archive datasources the method `listNames(String regex)` filters -filenames only by the regex and not by the basename. +filenames only by the regex and not by the base name. ## Example @@ -164,11 +165,11 @@ try (InputStream is = dataSource.newInputStream("_test", "txt")) { // List the files in the datasource Set files = datasource.listNames(".*"); // returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt", "network_mapping.csv.gz" -// The file "toto.xiidm.gz" is not listed due to the basename filtering +// The file "toto.xiidm.gz" is not listed due to the base name filtering // Using a datasource with different parameters allows to use other files, even on the same directory GzDirectoryDataSource totoDatasource = new GzDirectoryDataSource(testDir, "toto", "xiidm", observer); -oolean totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory +totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory Set files = totoDatasource.listNames(".*"); // returns a set containing: "toto.xiidm.gz" ``` \ No newline at end of file