From c8985ba1b12ecf41153ca7ecc7c126af68d5184d Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 18 Jul 2024 15:31:13 +0200
Subject: [PATCH 01/12] start datasource doc
Signed-off-by: Nicolas Rol
---
.../datasource/ReadOnlyDataSource.java | 4 +-
docs/data/datasources.md | 59 +++++++++++++++++++
docs/index.md | 1 +
3 files changed, 62 insertions(+), 2 deletions(-)
create mode 100644 docs/data/datasources.md
diff --git a/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java b/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java
index 91f29c53f2a..ad36c793775 100644
--- a/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java
+++ b/commons/src/main/java/com/powsybl/commons/datasource/ReadOnlyDataSource.java
@@ -24,9 +24,9 @@ default String getDataExtension() {
/**
* Check if a file exists in the datasource. The file name will be constructed as:
- * {@code .}
+ * {@code .}
* @param suffix Suffix to add to the basename of the datasource
- * @param ext Extension of the file (for example: .iidm, .xml, .txt, etc.)
+ * @param ext Extension of the file (for example: iidm, xml, txt, etc.)
* @return true if the file exists, else false
*/
boolean exists(String suffix, String ext) throws IOException;
diff --git a/docs/data/datasources.md b/docs/data/datasources.md
new file mode 100644
index 00000000000..9ab0c0c43e6
--- /dev/null
+++ b/docs/data/datasources.md
@@ -0,0 +1,59 @@
+(datasources)=
+# Datasources
+
+## Principles
+
+Datasources are Java-objects used to facilitate I/O operations around PowSyBl.
+It allows users to read and write files
+
+
+## Types of datasources
+
+Multiple types of datasources exist, depending on whether it shall be writable or not, the kind of storage used,
+data location, data compression, etc.
+
+
+(readonlydatasources)=
+### ReadOnlyDataSource
+
+`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides
+reading features.
+It has two parameters: a base name (corresponding to the starting part of the files the user wants to consider in the
+datasource) and a main extension (corresponding to the file extension to consider, compression aside).
+
+_**Example:**
+For a file named `foo.bar.xiidm.gz`, the base name would be `foo.bar` or `foo` while the main extension would be `xiidm`._
+
+The main methods `ReadOnlyDataSource` provides are:
+
+- `exists(String fileName)` and `exists(String suffix, String ext)` to check if a file exists in the datasource
+- `newInputStream(String fileName)` and `newInputStream(String suffix, String ext)` to read a file from the datasource
+- `listNames(String regex)` to list the files in the datasource whose names match the regex
+
+The methods with `String suffix, String ext` as parameters look for a file which name will be constructed as
+`.`.
+
+The classes inheriting directly `ReadOnlyDataSource` are:
+- `ResourceDataSource`: datasource based on a list of resources
+- `ReadOnlyMemDataSource`: datasource where data is stored in a `Map` in memory
+- `MultipleReadOnlyDataSource`: datasource grouping multiple user-defined datasources
+- `GenericReadOnlyDataSource`: datasource built by creating new datasources of multiple types
+
+(writabledatasources)=
+### DataSource
+
+The `DataSource` interface extends `ReadOnlyDataSource` by adding writing features through the methods
+`newOutputStream(String fileName, boolean append)` and `newOutputStream(String suffix, String ext, boolean append)`.
+Those methods allow the user to write in a new file (if `append==false`) or at the end of an existing one (if
+`append==true`).
+
+This interface also provides two methods to create a datasource from a file path (`fromPath(Path file)`) or from a
+directory and a file name (`fromPath(Path directory, String fileNameOrBaseName)`)
+
+Two classes implement the `DataSource` interface:
+- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource`
+- `AbstractFileSystemDataSource`: datasource based on files present in the file system, either directly or in an archive
+
+(directorydatasources)=
+### DirectoryDataSource
+
diff --git a/docs/index.md b/docs/index.md
index 82cbef6adea..308e4955b76 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -22,6 +22,7 @@ grid_model/index.md
grid_features/index.md
simulation/index
data/timeseries
+data/datasources
user/index.md
```
From 4343f38e7e7b0fcaa0f424739985e6626265a5ea Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 1 Aug 2024 11:37:04 +0200
Subject: [PATCH 02/12] add doc
Signed-off-by: Nicolas Rol
---
docs/data/datasources.md | 95 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 92 insertions(+), 3 deletions(-)
diff --git a/docs/data/datasources.md b/docs/data/datasources.md
index 9ab0c0c43e6..648fc3ca627 100644
--- a/docs/data/datasources.md
+++ b/docs/data/datasources.md
@@ -14,7 +14,7 @@ data location, data compression, etc.
(readonlydatasources)=
-### ReadOnlyDataSource
+### Read-Only DataSource
`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides
reading features.
@@ -55,5 +55,94 @@ Two classes implement the `DataSource` interface:
- `AbstractFileSystemDataSource`: datasource based on files present in the file system, either directly or in an archive
(directorydatasources)=
-### DirectoryDataSource
-
+### Directory DataSource
+
+`DirectoryDataSource` are datasources based on files located in a specific directory directly in the file system.
+
+Files stored and used via this type of datasource may be all compressed or not at all. Compression formats available are
+defined in the class `CompressionFormat`. As of today, the following single-file compressions are available:
+BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a corresponding datasource class inheriting
+`DirectoryDataSource`: `Bzip2DirectoryDataSource`, `GzDirectoryDataSource`, `XZDirectoryDataSource`,
+`ZstdDirectoryDataSource`.
+
+`DirectoryDataSource` integrates the notions of base name and data extension:
+- The base name is used to facilitate the access to files that all start with the same String. For example, `foo` would
+be a good base name if your files are `foo.xiidm`, `foo_bar.xiidm`, `foo_mapping.csv`, etc.
+- The data extension is the last extension of your files, excluding the compression extension if they have one.
+It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used
+to identify the files to use in the datasource, for example when importing networks using the Importers implemented in
+powsybl.
+
+Even if `DirectoryDataSource` integrates the notions of base name and data extension, you still have the possibility to
+use files that do not correspond to the base name and data extension by directly providing their names, excluding the
+compression extension.
+
+(archivedatasources)=
+### Archive DataSource
+
+`AbstractArchiveDataSource` are datasources based on files located in a specific archive, in the file system. As of today,
+two classes implements `AbstractArchiveDataSource`: `ZipArchiveDataSource` and `TarArchiveDataSource`
+
+While the files located in the archive **may not** be compressed, the archive file itself can be, depending on the
+archive format:
+- A Zip archive is also already compressed so the compression format for `ZipArchiveDataSource` is always ZIP.
+- A Tar archive can be compressed by any compression format, excluding ZIP (since it would create a Zip archive containing
+the Tar archive): BZIP2, GZIP, XZ or ZSTD.
+
+Just like `DirectoryDataSource`, the archive datasources integrate the notions of base name and data extension. If not
+given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the
+data extension, as `/...` with the
+compression extension being optional depending on the archive format.
+
+
+## Example
+
+Let's consider a directory containing the following files:
+
+```java
+/*
+ directory
+ ├── foo
+ ├── foo.bar
+ ├── foo.xiidm.gz
+ ├── foo.v3.xiidm.gz
+ ├── foo.gz
+ └── toto.xiidm.gz
+ */
+```
+
+A datasource on this directory could be used this way:
+
+```java
+// Creation of a directory datasource with compression
+GzDirectoryDataSource datasource = new GzDirectoryDataSource(testDir, "foo", "xiidm", observer);
+
+// Check if some files exist in the datasource by using the `exists(String fileName)` method
+// Since the datasource uses Gzip compression, ".gz" is added to the provided fileName parameter
+datasource.exists("test.toto") // Returns false: the file "test.toto.gz" does not exist in the directory
+datasource.exists("foo.bar") // Returns false: the file "foo.bar.gz" does not exist
+datasource.exists("foo.xiidm") // Returns true: the file "foo.xiidm.gz" exists
+
+// Check if some files exist in the datasource by using the `exists(String fileName)` method
+datasource.exists("_bar", "baz") // Returns false: the file "foo_bar.baz.gz" does not exist in the directory
+datasource.exists(null, "xiidm") // Returns true: the file "foo.xiidm.gz" exists in the directory
+datasource.exists(null, null) // Returns true: the file "foo.gz" exists in the directory
+
+// We can create some a new file "foo_test.txt.gz" and write "line1" inside
+try (OutputStream os = dataSource.newOutputStream("_test", "txt", false)) {
+ os.write("line1".getBytes(StandardCharsets.UTF_8));
+}
+
+// Another line can be added to the same file by setting the `append` boolean parameter to true
+try (OutputStream os = dataSource.newOutputStream("_test", "txt", true)) {
+ os.write("line2".getBytes(StandardCharsets.UTF_8));
+}
+
+// We can read the file
+try (InputStream is = dataSource.newInputStream("_test", "txt")) {
+ System.out.println(ByteStreams.toByteArray(is)); // Displays "line1" then "line2"
+}
+
+// List the files in the datasource
+Set files = datasource.listNames(".*") // returns a set containing: "foo", "foo.bar", "foo.xiidm", "foo.v3.xiidm", "foo_test.txt"
+```
\ No newline at end of file
From a976fd7068bdbdf18afb04195de814a77c4c5ca5 Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 1 Aug 2024 11:39:35 +0200
Subject: [PATCH 03/12] add doc
Signed-off-by: Nicolas Rol
---
docs/data/datasources.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/data/datasources.md b/docs/data/datasources.md
index 648fc3ca627..ea6543dbf95 100644
--- a/docs/data/datasources.md
+++ b/docs/data/datasources.md
@@ -87,7 +87,7 @@ While the files located in the archive **may not** be compressed, the archive fi
archive format:
- A Zip archive is also already compressed so the compression format for `ZipArchiveDataSource` is always ZIP.
- A Tar archive can be compressed by any compression format, excluding ZIP (since it would create a Zip archive containing
-the Tar archive): BZIP2, GZIP, XZ or ZSTD.
+the Tar archive): BZIP2, GZIP, XZ or ZSTD. It can also not be compressed.
Just like `DirectoryDataSource`, the archive datasources integrate the notions of base name and data extension. If not
given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the
From d45e7dbf996abf0f4db96344a14019af566d454e Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 1 Aug 2024 11:40:11 +0200
Subject: [PATCH 04/12] add doc
Signed-off-by: Nicolas Rol
---
docs/data/datasources.md | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/docs/data/datasources.md b/docs/data/datasources.md
index ea6543dbf95..073f92639ab 100644
--- a/docs/data/datasources.md
+++ b/docs/data/datasources.md
@@ -99,16 +99,14 @@ compression extension being optional depending on the archive format.
Let's consider a directory containing the following files:
-```java
-/*
- directory
- ├── foo
- ├── foo.bar
- ├── foo.xiidm.gz
- ├── foo.v3.xiidm.gz
- ├── foo.gz
- └── toto.xiidm.gz
- */
+```
+directory
+├── foo
+├── foo.bar
+├── foo.xiidm.gz
+├── foo.v3.xiidm.gz
+├── foo.gz
+└── toto.xiidm.gz
```
A datasource on this directory could be used this way:
From 86fd669d51f6f346a62cb4c7cf134dd6c4779d73 Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 1 Aug 2024 11:41:46 +0200
Subject: [PATCH 05/12] add doc
Signed-off-by: Nicolas Rol
---
docs/data/datasources.md | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/docs/data/datasources.md b/docs/data/datasources.md
index 073f92639ab..92fd6dd2ade 100644
--- a/docs/data/datasources.md
+++ b/docs/data/datasources.md
@@ -4,7 +4,8 @@
## Principles
Datasources are Java-objects used to facilitate I/O operations around PowSyBl.
-It allows users to read and write files
+It allows users to read and write files. It is for example used by Importers during Network imports when using
+`Network.read()` methods.
## Types of datasources
From 404b6e54e3c9bbe4815722c170ba57141a5000f8 Mon Sep 17 00:00:00 2001
From: Florian Dupuy
Date: Fri, 9 Aug 2024 17:07:58 +0200
Subject: [PATCH 06/12] Replace main with data, base name examples, bullet
points added
Signed-off-by: Florian Dupuy
---
docs/data/datasources.md | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/docs/data/datasources.md b/docs/data/datasources.md
index 92fd6dd2ade..331897c0c00 100644
--- a/docs/data/datasources.md
+++ b/docs/data/datasources.md
@@ -19,11 +19,13 @@ data location, data compression, etc.
`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides
reading features.
-It has two parameters: a base name (corresponding to the starting part of the files the user wants to consider in the
-datasource) and a main extension (corresponding to the file extension to consider, compression aside).
+It has two parameters:
+- a base name (corresponding to the starting part of the files the user wants to consider in the
+datasource),
+- a data extension (corresponding to the file extension to consider, compression aside).
_**Example:**
-For a file named `foo.bar.xiidm.gz`, the base name would be `foo.bar` or `foo` while the main extension would be `xiidm`._
+For a file named `foo.bar.xiidm.gz`, the base name could be `foo.bar` for instance (or `foo` or `foo.b` or ...), while the data extension would be `xiidm`._
The main methods `ReadOnlyDataSource` provides are:
From 0578d395cc568b312da1af826d7957986450edae Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Mon, 9 Sep 2024 14:40:00 +0200
Subject: [PATCH 07/12] moved files
Signed-off-by: Nicolas Rol
---
docs/data/index.md | 11 +++++++++++
.../going_further}/datasources.md | 0
docs/grid_exchange_formats/going_further/index.md | 11 +++++++++++
docs/grid_exchange_formats/index.md | 1 +
docs/index.md | 3 +--
5 files changed, 24 insertions(+), 2 deletions(-)
create mode 100644 docs/data/index.md
rename docs/{data => grid_exchange_formats/going_further}/datasources.md (100%)
create mode 100644 docs/grid_exchange_formats/going_further/index.md
diff --git a/docs/data/index.md b/docs/data/index.md
new file mode 100644
index 00000000000..318f622136c
--- /dev/null
+++ b/docs/data/index.md
@@ -0,0 +1,11 @@
+# Data models
+
+In this section, you'll discover how data is modelled in PowSyBl.
+
+```{toctree}
+---
+maxdepth: 1
+---
+timeseries
+```
+
diff --git a/docs/data/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md
similarity index 100%
rename from docs/data/datasources.md
rename to docs/grid_exchange_formats/going_further/datasources.md
diff --git a/docs/grid_exchange_formats/going_further/index.md b/docs/grid_exchange_formats/going_further/index.md
new file mode 100644
index 00000000000..13cf52f1722
--- /dev/null
+++ b/docs/grid_exchange_formats/going_further/index.md
@@ -0,0 +1,11 @@
+# Going further
+
+In this section, you'll discover some advanced features related to grid exchange formats and how to use them.
+
+```{toctree}
+---
+maxdepth: 1
+---
+datasources
+```
+
diff --git a/docs/grid_exchange_formats/index.md b/docs/grid_exchange_formats/index.md
index 37dbd4b5507..a0631ca72bf 100644
--- a/docs/grid_exchange_formats/index.md
+++ b/docs/grid_exchange_formats/index.md
@@ -33,4 +33,5 @@ ieee/ieee.md
matpower/index.md
psse/index.md
ampl/index.md
+going_further/index
```
diff --git a/docs/index.md b/docs/index.md
index 308e4955b76..217b2293e8d 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -21,8 +21,7 @@ grid_exchange_formats/index
grid_model/index.md
grid_features/index.md
simulation/index
-data/timeseries
-data/datasources
+data/index
user/index.md
```
From 422596893ce65c442290a10a58e8fb8497e66270 Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Fri, 13 Sep 2024 11:59:36 +0200
Subject: [PATCH 08/12] rewrite some parts
Signed-off-by: Nicolas Rol
---
.../going_further/datasources.md | 76 ++++++++++---------
1 file changed, 40 insertions(+), 36 deletions(-)
diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md
index 331897c0c00..3c302ca538b 100644
--- a/docs/grid_exchange_formats/going_further/datasources.md
+++ b/docs/grid_exchange_formats/going_further/datasources.md
@@ -4,8 +4,8 @@
## Principles
Datasources are Java-objects used to facilitate I/O operations around PowSyBl.
-It allows users to read and write files. It is for example used by Importers during Network imports when using
-`Network.read()` methods.
+It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem
+during Network imports when using `Network.read()` methods.
## Types of datasources
@@ -20,12 +20,13 @@ data location, data compression, etc.
`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides
reading features.
It has two parameters:
-- a base name (corresponding to the starting part of the files the user wants to consider in the
-datasource),
-- a data extension (corresponding to the file extension to consider, compression aside).
+- a base name (corresponding to the common prefix in the names of multiple related files the user wants to
+consider in the datasource),
+- (optionally) a data extension, mainly used to disambiguate identically named data of different type.
+Note: this does not apply to compression extensions.
_**Example:**
-For a file named `foo.bar.xiidm.gz`, the base name could be `foo.bar` for instance (or `foo` or `foo.b` or ...), while the data extension would be `xiidm`._
+For a file named `europe.west.xiidm.gz`, the base name could be `europe.west` for instance (or `europe` or `europe.w` or ...), while the data extension would be `xiidm`._
The main methods `ReadOnlyDataSource` provides are:
@@ -37,10 +38,10 @@ The methods with `String suffix, String ext` as parameters look for a file which
`.`.
The classes inheriting directly `ReadOnlyDataSource` are:
-- `ResourceDataSource`: datasource based on a list of resources
+- `ResourceDataSource`: datasource based on a list of java classpath resources
- `ReadOnlyMemDataSource`: datasource where data is stored in a `Map` in memory
- `MultipleReadOnlyDataSource`: datasource grouping multiple user-defined datasources
-- `GenericReadOnlyDataSource`: datasource built by creating new datasources of multiple types
+- `GenericReadOnlyDataSource`: datasource used to read data from any known compressed format
(writabledatasources)=
### DataSource
@@ -50,12 +51,14 @@ The `DataSource` interface extends `ReadOnlyDataSource` by adding writing featur
Those methods allow the user to write in a new file (if `append==false`) or at the end of an existing one (if
`append==true`).
-This interface also provides two methods to create a datasource from a file path (`fromPath(Path file)`) or from a
-directory and a file name (`fromPath(Path directory, String fileNameOrBaseName)`)
+This interface also provides two static convenience methods (`fromPath(Path file)` and
+`fromPath(Path directory, String fileNameOrBaseName)`) for the different use cases like writing data to the local
+filesystem, and ensuring that the target folder already exists.
Two classes implement the `DataSource` interface:
- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource`
-- `AbstractFileSystemDataSource`: datasource based on files present in the file system, either directly or in an archive
+- `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system,
+either directly or in an archive.
(directorydatasources)=
### Directory DataSource
@@ -69,16 +72,17 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi
`ZstdDirectoryDataSource`.
`DirectoryDataSource` integrates the notions of base name and data extension:
-- The base name is used to facilitate the access to files that all start with the same String. For example, `foo` would
-be a good base name if your files are `foo.xiidm`, `foo_bar.xiidm`, `foo_mapping.csv`, etc.
+- The base name is used to facilitate the access to files that all start with the same String. For example, `network` would
+be a good base name if your files are `network.xiidm`, `network_reduced.xiidm`, `network_mapping.csv`, etc.
- The data extension is the last extension of your files, excluding the compression extension if they have one.
It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used
-to identify the files to use in the datasource, for example when importing networks using the Importers implemented in
-powsybl.
+to disambiguate the files to use in the datasource, for example when you have files that differ only by the data
+extension (e.g. `network.xiidm` and `network.xml` in the same folder representing two different networks).
-Even if `DirectoryDataSource` integrates the notions of base name and data extension, you still have the possibility to
-use files that do not correspond to the base name and data extension by directly providing their names, excluding the
-compression extension.
+Even if `DirectoryDataSource` integrates the notions of base name and data extension in the methods with
+`(String suffix, String ext)` as parameters, you still have the possibility to use files that do not correspond to the
+base name and data extension by using the methods with `(String filename)` as parameter, excluding the compression
+extension if there is one.
(archivedatasources)=
### Archive DataSource
@@ -86,16 +90,16 @@ compression extension.
`AbstractArchiveDataSource` are datasources based on files located in a specific archive, in the file system. As of today,
two classes implements `AbstractArchiveDataSource`: `ZipArchiveDataSource` and `TarArchiveDataSource`
-While the files located in the archive **may not** be compressed, the archive file itself can be, depending on the
-archive format:
+While the files located in the archive **have to be uncompressed**, the archive file itself can be compressed, depending
+on the archive format:
- A Zip archive is also already compressed so the compression format for `ZipArchiveDataSource` is always ZIP.
-- A Tar archive can be compressed by any compression format, excluding ZIP (since it would create a Zip archive containing
-the Tar archive): BZIP2, GZIP, XZ or ZSTD. It can also not be compressed.
+- A Tar archive can be compressed by: BZIP2, GZIP, XZ or ZSTD. It can also not be compressed.
Just like `DirectoryDataSource`, the archive datasources integrate the notions of base name and data extension. If not
given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the
data extension, as `/...` with the
-compression extension being optional depending on the archive format.
+compression extension being optional depending on the archive format. For example `network.xiidm.zip` contains
+`network.xiidm`.
## Example
@@ -104,11 +108,11 @@ Let's consider a directory containing the following files:
```
directory
-├── foo
-├── foo.bar
-├── foo.xiidm.gz
-├── foo.v3.xiidm.gz
-├── foo.gz
+├── network
+├── network.south
+├── network.xiidm.gz
+├── network.v3.xiidm.gz
+├── network.gz
└── toto.xiidm.gz
```
@@ -116,20 +120,20 @@ A datasource on this directory could be used this way:
```java
// Creation of a directory datasource with compression
-GzDirectoryDataSource datasource = new GzDirectoryDataSource(testDir, "foo", "xiidm", observer);
+GzDirectoryDataSource datasource = new GzDirectoryDataSource(testDir, "network", "xiidm", observer);
// Check if some files exist in the datasource by using the `exists(String fileName)` method
// Since the datasource uses Gzip compression, ".gz" is added to the provided fileName parameter
datasource.exists("test.toto") // Returns false: the file "test.toto.gz" does not exist in the directory
-datasource.exists("foo.bar") // Returns false: the file "foo.bar.gz" does not exist
-datasource.exists("foo.xiidm") // Returns true: the file "foo.xiidm.gz" exists
+datasource.exists("network.south") // Returns false: the file "network.south.gz" does not exist
+datasource.exists("network.xiidm") // Returns true: the file "network.xiidm.gz" exists
// Check if some files exist in the datasource by using the `exists(String fileName)` method
-datasource.exists("_bar", "baz") // Returns false: the file "foo_bar.baz.gz" does not exist in the directory
-datasource.exists(null, "xiidm") // Returns true: the file "foo.xiidm.gz" exists in the directory
-datasource.exists(null, null) // Returns true: the file "foo.gz" exists in the directory
+datasource.exists("_south", "reduced") // Returns false: the file "network_south.reduced.gz" does not exist in the directory
+datasource.exists(null, "xiidm") // Returns true: the file "network.xiidm.gz" exists in the directory
+datasource.exists(null, null) // Returns true: the file "network.gz" exists in the directory
-// We can create some a new file "foo_test.txt.gz" and write "line1" inside
+// We can create some a new file "network_test.txt.gz" and write "line1" inside
try (OutputStream os = dataSource.newOutputStream("_test", "txt", false)) {
os.write("line1".getBytes(StandardCharsets.UTF_8));
}
@@ -145,5 +149,5 @@ try (InputStream is = dataSource.newInputStream("_test", "txt")) {
}
// List the files in the datasource
-Set files = datasource.listNames(".*") // returns a set containing: "foo", "foo.bar", "foo.xiidm", "foo.v3.xiidm", "foo_test.txt"
+Set files = datasource.listNames(".*") // returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt"
```
\ No newline at end of file
From e821d6d1842bd6577ca179c5c797e609a2af3e37 Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 19 Sep 2024 09:56:40 +0200
Subject: [PATCH 09/12] update doc
Signed-off-by: Nicolas Rol
---
.../going_further/datasources.md | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md
index 3c302ca538b..9d54752db07 100644
--- a/docs/grid_exchange_formats/going_further/datasources.md
+++ b/docs/grid_exchange_formats/going_further/datasources.md
@@ -7,6 +7,7 @@ Datasources are Java-objects used to facilitate I/O operations around PowSyBl.
It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem
during Network imports when using `Network.read()` methods.
+For importers and exporters, datasources are used to access files corresponding to a single network
## Types of datasources
@@ -52,8 +53,11 @@ Those methods allow the user to write in a new file (if `append==false`) or at t
`append==true`).
This interface also provides two static convenience methods (`fromPath(Path file)` and
-`fromPath(Path directory, String fileNameOrBaseName)`) for the different use cases like writing data to the local
-filesystem, and ensuring that the target folder already exists.
+`fromPath(Path directory, String fileNameOrBaseName)`) for the different use cases like reading data from the local
+filesystem, and ensuring that the target file exists. These methods have their opposite in the class `Exporters`
+named `createDataSource(Path file)` and used to write data on the local filesystem, while ensuring that the target file
+given as parameter is not a directory. All those methods then make use of `DataSourceUtil.createDataSource` to build
+the datasource.
Two classes implement the `DataSource` interface:
- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource`
@@ -72,9 +76,9 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi
`ZstdDirectoryDataSource`.
`DirectoryDataSource` integrates the notions of base name and data extension:
-- The base name is used to facilitate the access to files that all start with the same String. For example, `network` would
-be a good base name if your files are `network.xiidm`, `network_reduced.xiidm`, `network_mapping.csv`, etc.
-- The data extension is the last extension of your files, excluding the compression extension if they have one.
+- The base name is used to access files that all start with the same String. For example, `network` would
+be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc.
+- The data extension is the last extension of your main files, excluding the compression extension if they have one.
It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used
to disambiguate the files to use in the datasource, for example when you have files that differ only by the data
extension (e.g. `network.xiidm` and `network.xml` in the same folder representing two different networks).
@@ -109,9 +113,10 @@ Let's consider a directory containing the following files:
```
directory
├── network
-├── network.south
+├── network.south
├── network.xiidm.gz
├── network.v3.xiidm.gz
+├── network_mapping.csv.gz
├── network.gz
└── toto.xiidm.gz
```
@@ -131,7 +136,7 @@ datasource.exists("network.xiidm") // Returns true: the file "network.xiidm.gz"
// Check if some files exist in the datasource by using the `exists(String fileName)` method
datasource.exists("_south", "reduced") // Returns false: the file "network_south.reduced.gz" does not exist in the directory
datasource.exists(null, "xiidm") // Returns true: the file "network.xiidm.gz" exists in the directory
-datasource.exists(null, null) // Returns true: the file "network.gz" exists in the directory
+datasource.exists("_mapping", "csv") // Returns true: the file "network_mapping.csv.gz" exists in the directory
// We can create some a new file "network_test.txt.gz" and write "line1" inside
try (OutputStream os = dataSource.newOutputStream("_test", "txt", false)) {
From b537d99014135b99e1330cb671abda89dd6488bd Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 19 Sep 2024 14:01:03 +0200
Subject: [PATCH 10/12] add doc
Signed-off-by: Nicolas Rol
---
.../going_further/datasources.md | 46 +++++++++++++------
1 file changed, 31 insertions(+), 15 deletions(-)
diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md
index 9d54752db07..9378ca224a1 100644
--- a/docs/grid_exchange_formats/going_further/datasources.md
+++ b/docs/grid_exchange_formats/going_further/datasources.md
@@ -3,11 +3,11 @@
## Principles
-Datasources are Java-objects used to facilitate I/O operations around PowSyBl.
+Datasources are Java-objects used for I/O operations around PowSyBl.
It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem
during Network imports when using `Network.read()` methods.
-For importers and exporters, datasources are used to access files corresponding to a single network
+For importers and exporters, datasources are used to access files corresponding to a single network.
## Types of datasources
@@ -21,13 +21,14 @@ data location, data compression, etc.
`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides
reading features.
It has two parameters:
-- a base name (corresponding to the common prefix in the names of multiple related files the user wants to
-consider in the datasource),
+- a base name, which is a prefix that can be used to consider only files with this prefix (while reading) or as a prefix for
+the output file (while writing),
- (optionally) a data extension, mainly used to disambiguate identically named data of different type.
Note: this does not apply to compression extensions.
_**Example:**
-For a file named `europe.west.xiidm.gz`, the base name could be `europe.west` for instance (or `europe` or `europe.w` or ...), while the data extension would be `xiidm`._
+For a file named `europe.west.xiidm`, the base name could be `europe.west` for instance (or `europe` or `europe.w`
+or ...), while the data extension would be `xiidm`._
The main methods `ReadOnlyDataSource` provides are:
@@ -62,7 +63,8 @@ the datasource.
Two classes implement the `DataSource` interface:
- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource`
- `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system,
-either directly or in an archive.
+either (see below the DirectoryDataSource class and its children) or in an archive (see below the
+AbstractArchiveDataSource class and its children).
(directorydatasources)=
### Directory DataSource
@@ -76,7 +78,7 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi
`ZstdDirectoryDataSource`.
`DirectoryDataSource` integrates the notions of base name and data extension:
-- The base name is used to access files that all start with the same String. For example, `network` would
+- The base name is used to access files that all start with the same prefix. For example, `network` would
be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc.
- The data extension is the last extension of your main files, excluding the compression extension if they have one.
It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used
@@ -88,6 +90,9 @@ Even if `DirectoryDataSource` integrates the notions of base name and data exten
base name and data extension by using the methods with `(String filename)` as parameter, excluding the compression
extension if there is one.
+In addition to filtering with the regex parameter, in directory datasources the method `listNames(String regex)` filters
+filenames to only keep those starting with the basename.
+
(archivedatasources)=
### Archive DataSource
@@ -105,6 +110,8 @@ data extension, as `/... files = datasource.listNames(".*") // returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt"
+Set files = datasource.listNames(".*");
+// returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt", "network_mapping.csv.gz"
+// The file "toto.xiidm.gz" is not listed due to the basename filtering
+
+// Using a datasource with different parameters allows to use other files, even on the same directory
+GzDirectoryDataSource totoDatasource = new GzDirectoryDataSource(testDir, "toto", "xiidm", observer);
+oolean totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory
+Set files = totoDatasource.listNames(".*");
+// returns a set containing: "toto.xiidm.gz"
```
\ No newline at end of file
From 5b44ebfd0a6d13817878fd2f5ae6b49e4aeac605 Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 19 Sep 2024 15:09:47 +0200
Subject: [PATCH 11/12] fix doc
Signed-off-by: Nicolas Rol
---
docs/grid_exchange_formats/going_further/datasources.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md
index 9378ca224a1..d6913fda97e 100644
--- a/docs/grid_exchange_formats/going_further/datasources.md
+++ b/docs/grid_exchange_formats/going_further/datasources.md
@@ -4,7 +4,7 @@
## Principles
Datasources are Java-objects used for I/O operations around PowSyBl.
-It allows users to read and write files. It is for example used under the hood by Importers to access the filesystem
+Datasources allow users to read and write files. It is for example used under the hood by Importers to access the filesystem
during Network imports when using `Network.read()` methods.
For importers and exporters, datasources are used to access files corresponding to a single network.
@@ -21,8 +21,8 @@ data location, data compression, etc.
`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides
reading features.
It has two parameters:
-- a base name, which is a prefix that can be used to consider only files with this prefix (while reading) or as a prefix for
-the output file (while writing),
+- a base name, which is a prefix that can be used to consider only files with names starting with this prefix (while
+reading) or as a prefix for the output file name (while writing),
- (optionally) a data extension, mainly used to disambiguate identically named data of different type.
Note: this does not apply to compression extensions.
@@ -63,7 +63,7 @@ the datasource.
Two classes implement the `DataSource` interface:
- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource`
- `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system,
-either (see below the DirectoryDataSource class and its children) or in an archive (see below the
+either directly (see below the DirectoryDataSource class and its children) or in an archive (see below the
AbstractArchiveDataSource class and its children).
(directorydatasources)=
From d40457179e85295886008358223ea9d5f6cc493a Mon Sep 17 00:00:00 2001
From: Nicolas Rol
Date: Thu, 19 Sep 2024 16:35:56 +0200
Subject: [PATCH 12/12] fix doc
Signed-off-by: Nicolas Rol
---
.../going_further/datasources.md | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/docs/grid_exchange_formats/going_further/datasources.md b/docs/grid_exchange_formats/going_further/datasources.md
index d6913fda97e..a88c3eeeafc 100644
--- a/docs/grid_exchange_formats/going_further/datasources.md
+++ b/docs/grid_exchange_formats/going_further/datasources.md
@@ -81,9 +81,10 @@ BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a correspondi
- The base name is used to access files that all start with the same prefix. For example, `network` would
be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc.
- The data extension is the last extension of your main files, excluding the compression extension if they have one.
-It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is mainly used
-to disambiguate the files to use in the datasource, for example when you have files that differ only by the data
-extension (e.g. `network.xiidm` and `network.xml` in the same folder representing two different networks).
+It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is used
+to disambiguate the files to use in the datasource: just like you can create two different datasources selecting a
+different subset of files in a folder based on a different base name (e.g. `france.xiidm` and `europe.xiidm`), you can
+use the data extension to select either `france.xiidm` or `france.uct`.
Even if `DirectoryDataSource` integrates the notions of base name and data extension in the methods with
`(String suffix, String ext)` as parameters, you still have the possibility to use files that do not correspond to the
@@ -91,7 +92,7 @@ base name and data extension by using the methods with `(String filename)` as pa
extension if there is one.
In addition to filtering with the regex parameter, in directory datasources the method `listNames(String regex)` filters
-filenames to only keep those starting with the basename.
+filenames to only keep those starting with the base name.
(archivedatasources)=
### Archive DataSource
@@ -111,7 +112,7 @@ compression extension being optional depending on the archive format. For exampl
`network.xiidm`.
Unlike in directory datasources, in archive datasources the method `listNames(String regex)` filters
-filenames only by the regex and not by the basename.
+filenames only by the regex and not by the base name.
## Example
@@ -164,11 +165,11 @@ try (InputStream is = dataSource.newInputStream("_test", "txt")) {
// List the files in the datasource
Set files = datasource.listNames(".*");
// returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt", "network_mapping.csv.gz"
-// The file "toto.xiidm.gz" is not listed due to the basename filtering
+// The file "toto.xiidm.gz" is not listed due to the base name filtering
// Using a datasource with different parameters allows to use other files, even on the same directory
GzDirectoryDataSource totoDatasource = new GzDirectoryDataSource(testDir, "toto", "xiidm", observer);
-oolean totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory
+totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory
Set files = totoDatasource.listNames(".*");
// returns a set containing: "toto.xiidm.gz"
```
\ No newline at end of file