-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasources - part 4: documentation #3105
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
c8985ba
start datasource doc
rolnico 4343f38
add doc
rolnico a976fd7
add doc
rolnico d45e7db
add doc
rolnico 86fd669
add doc
rolnico 404b6e5
Replace main with data, base name examples, bullet points added
flo-dup 0578d39
moved files
rolnico 4225968
rewrite some parts
rolnico 5a9523b
Merge branch 'main' into nro/datasources_4_documentation
So-Fras e821d6d
update doc
rolnico 03feb92
Merge remote-tracking branch 'origin/nro/datasources_4_documentation'…
rolnico b537d99
add doc
rolnico 5b44ebf
fix doc
rolnico d404571
fix doc
rolnico f4b432f
Merge branch 'main' into nro/datasources_4_documentation
flo-dup File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Data models | ||
|
||
In this section, you'll discover how data is modelled in PowSyBl. | ||
|
||
```{toctree} | ||
--- | ||
maxdepth: 1 | ||
--- | ||
timeseries | ||
``` | ||
|
175 changes: 175 additions & 0 deletions
175
docs/grid_exchange_formats/going_further/datasources.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
(datasources)= | ||
# Datasources | ||
|
||
## Principles | ||
|
||
Datasources are Java-objects used for I/O operations around PowSyBl. | ||
Datasources allow users to read and write files. It is for example used under the hood by Importers to access the filesystem | ||
during Network imports when using `Network.read()` methods. | ||
|
||
For importers and exporters, datasources are used to access files corresponding to a single network. | ||
|
||
## Types of datasources | ||
|
||
Multiple types of datasources exist, depending on whether it shall be writable or not, the kind of storage used, | ||
data location, data compression, etc. | ||
|
||
|
||
(readonlydatasources)= | ||
### Read-Only DataSource | ||
|
||
`ReadOnlyDataSource` is the most basic datasource interface available. As you can tell by the name, it only provides | ||
reading features. | ||
It has two parameters: | ||
- a base name, which is a prefix that can be used to consider only files with names starting with this prefix (while | ||
reading) or as a prefix for the output file name (while writing), | ||
- (optionally) a data extension, mainly used to disambiguate identically named data of different type. | ||
Note: this does not apply to compression extensions. | ||
|
||
_**Example:** | ||
For a file named `europe.west.xiidm`, the base name could be `europe.west` for instance (or `europe` or `europe.w` | ||
or ...), while the data extension would be `xiidm`._ | ||
|
||
The main methods `ReadOnlyDataSource` provides are: | ||
|
||
- `exists(String fileName)` and `exists(String suffix, String ext)` to check if a file exists in the datasource | ||
- `newInputStream(String fileName)` and `newInputStream(String suffix, String ext)` to read a file from the datasource | ||
- `listNames(String regex)` to list the files in the datasource whose names match the regex | ||
|
||
The methods with `String suffix, String ext` as parameters look for a file which name will be constructed as | ||
`<basename><suffix>.<ext>`. | ||
|
||
The classes inheriting directly `ReadOnlyDataSource` are: | ||
- `ResourceDataSource`: datasource based on a list of java classpath resources | ||
- `ReadOnlyMemDataSource`: datasource where data is stored in a `Map<filename, data as bytes>` in memory | ||
- `MultipleReadOnlyDataSource`: datasource grouping multiple user-defined datasources | ||
- `GenericReadOnlyDataSource`: datasource used to read data from any known compressed format | ||
|
||
(writabledatasources)= | ||
### DataSource | ||
|
||
The `DataSource` interface extends `ReadOnlyDataSource` by adding writing features through the methods | ||
`newOutputStream(String fileName, boolean append)` and `newOutputStream(String suffix, String ext, boolean append)`. | ||
Those methods allow the user to write in a new file (if `append==false`) or at the end of an existing one (if | ||
`append==true`). | ||
|
||
This interface also provides two static convenience methods (`fromPath(Path file)` and | ||
`fromPath(Path directory, String fileNameOrBaseName)`) for the different use cases like reading data from the local | ||
filesystem, and ensuring that the target file exists. These methods have their opposite in the class `Exporters` | ||
named `createDataSource(Path file)` and used to write data on the local filesystem, while ensuring that the target file | ||
given as parameter is not a directory. All those methods then make use of `DataSourceUtil.createDataSource` to build | ||
the datasource. | ||
|
||
Two classes implement the `DataSource` interface: | ||
- `MemDataSource`: extension of `ReadOnlyMemDataSource` implementing the writing features of `DataSource` | ||
- `AbstractFileSystemDataSource`: abstract class used to define datasources based on files present in the file system, | ||
either directly (see below the DirectoryDataSource class and its children) or in an archive (see below the | ||
AbstractArchiveDataSource class and its children). | ||
|
||
(directorydatasources)= | ||
### Directory DataSource | ||
|
||
`DirectoryDataSource` are datasources based on files located in a specific directory directly in the file system. | ||
|
||
Files stored and used via this type of datasource may be all compressed or not at all. Compression formats available are | ||
defined in the class `CompressionFormat`. As of today, the following single-file compressions are available: | ||
BZIP2, GZIP, XZ and ZSTD. Each one of those compression format has a corresponding datasource class inheriting | ||
`DirectoryDataSource`: `Bzip2DirectoryDataSource`, `GzDirectoryDataSource`, `XZDirectoryDataSource`, | ||
`ZstdDirectoryDataSource`. | ||
|
||
`DirectoryDataSource` integrates the notions of base name and data extension: | ||
- The base name is used to access files that all start with the same prefix. For example, `network` would | ||
be a good base name if your files are `network.xiidm`, `network_mapping.csv`, etc. | ||
- The data extension is the last extension of your main files, excluding the compression extension if they have one. | ||
It usually corresponds to the data format extension: `csv`, `xml`, `json`, `xiidm`, etc. This extension is used | ||
to disambiguate the files to use in the datasource: just like you can create two different datasources selecting a | ||
different subset of files in a folder based on a different base name (e.g. `france.xiidm` and `europe.xiidm`), you can | ||
use the data extension to select either `france.xiidm` or `france.uct`. | ||
|
||
Even if `DirectoryDataSource` integrates the notions of base name and data extension in the methods with | ||
`(String suffix, String ext)` as parameters, you still have the possibility to use files that do not correspond to the | ||
base name and data extension by using the methods with `(String filename)` as parameter, excluding the compression | ||
extension if there is one. | ||
|
||
In addition to filtering with the regex parameter, in directory datasources the method `listNames(String regex)` filters | ||
filenames to only keep those starting with the base name. | ||
|
||
(archivedatasources)= | ||
### Archive DataSource | ||
|
||
`AbstractArchiveDataSource` are datasources based on files located in a specific archive, in the file system. As of today, | ||
two classes implements `AbstractArchiveDataSource`: `ZipArchiveDataSource` and `TarArchiveDataSource` | ||
|
||
While the files located in the archive **have to be uncompressed**, the archive file itself can be compressed, depending | ||
on the archive format: | ||
- A Zip archive is also already compressed so the compression format for `ZipArchiveDataSource` is always ZIP. | ||
- A Tar archive can be compressed by: BZIP2, GZIP, XZ or ZSTD. It can also not be compressed. | ||
|
||
Just like `DirectoryDataSource`, the archive datasources integrate the notions of base name and data extension. If not | ||
given as a parameter in the datasource constructor, the archive file name is even defined using the base name and the | ||
data extension, as `<directory>/<basename>.<dataExtension>.<archiveExtension>.<compressionExtension>` with the | ||
compression extension being optional depending on the archive format. For example `network.xiidm.zip` contains | ||
`network.xiidm`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. document that listnames lists everything without filtering by the basename |
||
|
||
Unlike in directory datasources, in archive datasources the method `listNames(String regex)` filters | ||
filenames only by the regex and not by the base name. | ||
|
||
## Example | ||
|
||
Let's consider a directory containing the following files: | ||
|
||
``` | ||
directory | ||
├── network | ||
├── network.south | ||
├── network.xiidm.gz | ||
├── network.v3.xiidm.gz | ||
├── network_mapping.csv.gz | ||
├── network.gz | ||
└── toto.xiidm.gz | ||
``` | ||
|
||
A datasource on this directory could be used this way: | ||
|
||
```java | ||
// Creation of a directory datasource with compression | ||
GzDirectoryDataSource datasource = new GzDirectoryDataSource(testDir, "network", "xiidm", observer); | ||
|
||
// Check if some files exist in the datasource by using the `exists(String fileName)` method | ||
// Since the datasource uses Gzip compression, ".gz" is added to the provided fileName parameter | ||
datasource.exists("test.toto"); // Returns false: the file "test.toto.gz" does not exist in the directory | ||
datasource.exists("network.south"); // Returns false: the file "network.south.gz" does not exist | ||
datasource.exists("network.xiidm"); // Returns true: the file "network.xiidm.gz" exists | ||
datasource.exists("toto.xiidm"); // Returns true: the file "toto.xiidm.gz" exists | ||
|
||
// Check if some files exist in the datasource by using the `exists(String suffix, String ext)` method | ||
datasource.exists("_south", "reduced"); // Returns false: the file "network_south.reduced.gz" does not exist in the directory | ||
datasource.exists(null, "xiidm"); // Returns true: the file "network.xiidm.gz" exists in the directory | ||
datasource.exists("_mapping", "csv"); // Returns true: the file "network_mapping.csv.gz" exists in the directory | ||
|
||
// We can create some a new file "network_test.txt.gz" and write "line1" inside | ||
try (OutputStream os = dataSource.newOutputStream("_test", "txt", false)) { | ||
os.write("line1".getBytes(StandardCharsets.UTF_8)); | ||
} | ||
|
||
// Another line can be added to the same file by setting the `append` boolean parameter to true | ||
try (OutputStream os = dataSource.newOutputStream("_test", "txt", true)) { | ||
os.write("line2".getBytes(StandardCharsets.UTF_8)); | ||
} | ||
|
||
// We can read the file | ||
try (InputStream is = dataSource.newInputStream("_test", "txt")) { | ||
System.out.println(ByteStreams.toByteArray(is)); // Displays "line1" then "line2" | ||
} | ||
|
||
// List the files in the datasource | ||
Set<String> files = datasource.listNames(".*"); | ||
// returns a set containing: "network", "network.south", "network.xiidm", "network.v3.xiidm", "network_test.txt", "network_mapping.csv.gz" | ||
// The file "toto.xiidm.gz" is not listed due to the base name filtering | ||
|
||
// Using a datasource with different parameters allows to use other files, even on the same directory | ||
GzDirectoryDataSource totoDatasource = new GzDirectoryDataSource(testDir, "toto", "xiidm", observer); | ||
totoDatasource.exists(null, "xiidm"); // Returns true: the file "toto.xiidm.gz" exists in the directory | ||
Set<String> files = totoDatasource.listNames(".*"); | ||
// returns a set containing: "toto.xiidm.gz" | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More things to eventually document (in this PR or in another):
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Going further | ||
|
||
In this section, you'll discover some advanced features related to grid exchange formats and how to use them. | ||
|
||
```{toctree} | ||
--- | ||
maxdepth: 1 | ||
--- | ||
datasources | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,4 +33,5 @@ ieee/ieee.md | |
matpower/index.md | ||
psse/index.md | ||
ampl/index.md | ||
going_further/index | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document that listNames filters by the basename contrary to exists(String filename)