Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CLI for export, plus documentation #31

Merged
merged 1 commit into from
Mar 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions docs/intro/export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Exporting a schema to schemasheets

## Use Case

Sometimes you might want to export from an existing LinkML schema to schemasheets -
for example to migrate the source of some or part of a schema to sheet-based editing.

The `sheets2linkml` command will convert schemasheet(s) to a LinkML schema

The reverse operation `linkml2sheets` will convert a LinkML schema to schemasheets

## Status

__THIS COMMAND IS ONLY PARTIALLY IMPLEMENTED__ -- not all parts of the specification are considered.
However, you may still find this useful for "bootstrapping" schema sheets

## Usage

Type

```bash
linkml2sheets --help
```

to get complete help

Broadly there are two usage scenarios:

- when you have a single sheet
- when your schema is mapped to multiple sheets (e.g. enums and slots in different sheets)

In both cases you need two inputs

1. A linkml schema, specified in yaml
2. One or more schemasheets that serve as the specification
- these do not need to have any data
- they do need the columns used and column descriptors

### Single-sheet usage

Here you pass a single TSV specification on the command line

You can use the `--output` (`-o`) option to write output to a single sheet file.
Or omit this to write on stdout.

### Multi-sheet usage

Here you multiple TSV specifications on the command line

You must use the `--directory` (`-d`) option to specify which directory
the files are written to. The filenames will be the same.

So for example, if you had a folder:

```
sheets/
enums.tsv
slots.tsv
```

where:

- each tsv contains minimally the column specifications,
- you pass in `sheets/*tsv` as input
- you pass `--directory output`

Then you will generate a folder:

```
output/
enums.tsv
slots.tsv
```

the headers will be the same as the TSVs in the input,
but it will include "data" rows, where each row is a matching
schema element

the input and output directory can be identical, but
you will need to pass in `--overwrite` to explicitly overwrite,
this guards against accidental overwrites.

## Converting between two different schemasheet specs

schemasheets allows *custom* sheet formats that map to the LinkML standard.

you can use the combination of sheets2linkml and linkml2sheets to convert betweeen two sheet specifications.

For example, let's say for schema1.tsv, you use a spreadsheet with the following headers:

- record: `> class`
- field: `> slot`
- cardinality: `> cardinality`
- info: `> description`

and for schema2.tsv you have:

- table: `> class`
- attribute: `> slot`
- required: `> required`
- multivalued: `> multivalued`
- description: `> description`

(here each list element is a column, and the part after the `>` is the 2nd row)

If you do:

```bash
sheets2linkml schema1.tsv > schema1.yaml
linkml2sheets -s schema1.yaml schema2.tsv > schema2_full.tsv
```

then this will effectively map schema1.tsv onto the format for schema2.tsv.
And you can swap the arguments to go in the reverse direction.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry.scripts]
sheets2linkml = "schemasheets.schemamaker:convert"
linkml2sheets = "schemasheets.schema_exporter:export_schema"
sheets2project = "schemasheets.sheets_to_project:multigen"
117 changes: 105 additions & 12 deletions schemasheets/schema_exporter.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import csv
import logging
import sys
from dataclasses import dataclass, field
from typing import Dict, Any, List, Optional
from pathlib import Path
from typing import Dict, Any, List, Optional, TextIO, Union

import click
from linkml_runtime.linkml_model import Element, SlotDefinition
from linkml_runtime.utils.formatutils import underscore
from linkml_runtime.utils.schemaview import SchemaView, ClassDefinition
Expand All @@ -23,7 +26,7 @@ class SchemaExporter:
rows: List[ROW] = field(default_factory=lambda: [])

def export(self, schemaview: SchemaView, specification: str = None,
to_file: str = None, table_config: TableConfig = None):
to_file: Union[str, Path] = None, table_config: TableConfig = None):
"""
Exports a schema to a schemasheets TSV

Expand All @@ -49,14 +52,17 @@ def export(self, schemaview: SchemaView, specification: str = None,
for su in cls.slot_usage.values():
self.export_element(su, cls, schemaview, table_config)
if to_file:
with open(to_file, 'w', encoding='utf-8') as stream:
writer = csv.DictWriter(
stream,
delimiter=self.delimiter,
fieldnames=table_config.columns.keys())
writer.writeheader()
for row in self.rows:
writer.writerow(row)
if isinstance(to_file, str) or isinstance(to_file, Path):
stream = open(to_file, 'w', encoding='utf-8')
else:
stream = to_file
writer = csv.DictWriter(
stream,
delimiter=self.delimiter,
fieldnames=table_config.columns.keys())
writer.writeheader()
for row in self.rows:
writer.writerow(row)


def export_element(self, element: Element, parent: Optional[Element], schemaview: SchemaView, table_config: TableConfig):
Expand Down Expand Up @@ -106,13 +112,100 @@ def repl(v: str) -> Optional[str]:
elif parent_pk_col == col_name:
exported_row[col_name] = parent.name
else:
print(f'TODO: {col_name} [{type(element).class_name}] // {col_config}')
logging.info(f'TODO: {col_name} [{type(element).class_name}] // {col_config}')
else:
print(f'IGNORING: {col_name} // {col_config}')
logging.info(f'IGNORING: {col_name} // {col_config}')
self.export_row(exported_row)

def export_row(self, row: ROW):
self.rows.append(row)

def is_slot_redundant(self, slot: SlotDefinition, schemaview: SchemaView):
for c in schemaview.all_classes().values():
if slot.name in c.slots:
pass


@click.command()
@click.option('-o', '--output',
help="output file")
@click.option("-d", "--output-directory",
help="folder in which to store resulting TSVs")
@click.option("-s", "--schema",
required=True,
help="Path to the schema")
@click.option("--overwrite/--no-overwrite",
default=False,
show_default=True,
help="If set, then overwrite existing schemasheet files if they exist")
@click.option("--append-sheet/--no-append-sheet",
default=False,
show_default=True,
help="If set, then append to existing schemasheet files if they exist")
@click.option("--unique-slots/--no-unique-slots",
default=False,
show_default=True,
help="All slots are treated as unique and top level and do not belong to the specified class")
@click.option("-v", "--verbose", count=True)
@click.argument('tsv_files', nargs=-1)
def export_schema(tsv_files, output_directory, output: TextIO, overwrite: bool, append_sheet: bool,
schema, unique_slots: bool, verbose: int):
"""
Convert LinkML schema to schemasheets

Convert a schema to a single sheet, writing on stdout:

linkml2sheets -s my_schema.yaml my_schema_spec.tsv > my_schema.tsv

As above, with explicit output:

linkml2sheets -s my_schema.yaml my_schema_spec.tsv -o my_schema.tsv

Convert schema to multisheets, writing output to a folder:

linkml2sheets -s my_schema.yaml specs/*.tsv -d output

Convert schema to multisheets, writing output in place:

linkml2sheets -s my_schema.yaml sheets/*.tsv -d sheets --overwrite

Convert schema to multisheets, appending output:

linkml2sheets -s my_schema.yaml sheets/*.tsv -d sheets --append


"""
if verbose >= 2:
logging.basicConfig(level=logging.DEBUG)
elif verbose == 1:
logging.basicConfig(level=logging.INFO)
else:
logging.basicConfig(level=logging.WARNING)
if output is not None and output_directory:
raise ValueError(f'Cannot combine output-directory and output options')
if output is not None and len(tsv_files) > 1:
raise ValueError(f'Cannot use output option with multiple sheets')
if append_sheet:
raise NotImplementedError(f'--append-sheet not yet implemented')
exporter = SchemaExporter()
sv = SchemaView(schema)
for f in tsv_files:
if output_directory:
outpath: Path = Path(output_directory) / Path(f).name
else:
if output is not None:
outpath = Path(output)
else:
outpath = sys.stdout
if isinstance(outpath, Path) and outpath.exists():
if overwrite:
logging.info(f'Overwriting: {outpath}')
else:
raise PermissionError(f'Will not overwrite {outpath} unless --overwrite is set')
exporter.export(sv, specification=f, to_file=outpath)


if __name__ == '__main__':
export_schema()