Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary of new features for named input/output. #1115

Closed
BoPeng opened this issue Dec 20, 2018 · 3 comments
Closed

Summary of new features for named input/output. #1115

BoPeng opened this issue Dec 20, 2018 · 3 comments
Assignees

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Dec 20, 2018

Persistent grouping of sos_targets

input: `a.txt`, `b.txt`, group_by=1

will be considered as equivalent to

input: sos_targets('a.txt', 'b.txt', group_by=1)

which creates a sos_targets with two targets and two groups, with groups accessible with property groups, which is a list of sos_targets with no subgroups.

sos_targets will keep its grouping information when it is passed around. That is to say

  • step_input will have groups that are essentially _input for substeps.
  • step_output will contain _output from each substep as its groups.

keyword arguments in input and output

Keyword arguments used to specify sources of targets.

input: name=targets
output: name=targets

Named input and output can be accessed by _input['name'] and _output['name'].

Implementation-wise,

input: name=targets

creates step_input as sos_targets(name=targets), which assigns sources of targets to name.

output_from(steps, **kwargs) to get output from other steps

Refers to output from one or more steps, parameter can be a name or a number. The latter refers to a step in the same workflow (output_from(10) from step_20 is equivalent to output_from('step_10')).

input: output_from('step')
input: output_from(1)
input: output_from([1, 2])

with named input and output, the syntax can be expanded to

input: ref=output_from('get_ref')['ref']

A special step name -1 as in

input: output_from(-1)

is reserved to output from previous step, which is only valid from a numerically indexed steps.

Options group_by, paired_with, pattern, group_with, and for_each can be used to regroup or attach variables to the output. For example, group_by can be used to regroup the retrieved sos_targets,

input: output_from(10, group_by='all')

named_output('name', **kwargs) for data flow without step name

named_output('ref') in the following example refers to any step with ref in named output,

[A]
output: ref=targets

[B]
input: named_output('ref')

which has the same effect with output_from('A')['ref'] but does not need the specification of step name.

Similar to output_from, parameters group_by, paired_with, pattern, group_with, for_each can be used to regroup or attached variables to retrieved targets.

Merging of multiple sos_targets

Multiple sos_targets can be specified in the input statement, either explicitly with sos_targets, or implicitly with output_from, named_output. In this case, targets and groups from multiple sos_targets will be merged. sos_targets objects with different numbers of groups can be merged only if one of them has no group information or has a single group with all targets. In this case the group will be replicated for all groups before merging.

For example,

input: 'a.txt', 'b.txt', sos_targets('c.txt', 'd.txt', group_by=1)

will create a sos_targets with four targets 'a.txt', 'b.txt', 'c.txt', 'd.txt', and two groups

'a.txt', 'b.txt', 'c.txt'
'a.txt', 'b.txt', 'd.txt'

The same rule applies to sos_targets created by output_from() or output_from(group_by). However, if a global group_by option is present, all individual groups will be overridden. That is to say,

input: 'a.txt', 'b.txt', output_from(10), group_by=1

will regroup all targets by 1, regardless of original grouping information from output_from(10).

set and get of attributes to sos targets

New functions are added BaseTarget.set(), BaseTarget.get()

A dictionary are now associated with each BaseTarget and can be access with .set() and .get() function, or as an attribute of the target. The .set() function is usually done automatically by parameters paired_with and group_with, but can be used directly. With

a = file_target('a.txt')
a.set('name', 'a')

it is usually easier to use

a.name

instead of

a.get('name')

but a.get('name', default=None) will return a default value instead of raising an AttributeError if name does not exist, which can be safer to use from time to time.

Changes to parameters paired_with, group_with and for_each

In addition to variables set to the global namespace, the paired values are written to _input as target or group properties. That is to say, with

sample = ['A',  'B']
files = ['a1', 'a2', 'a3', 'a4']
input: 'a1.txt', 'a2.txt', 'b1.txt', 'b2.txt', group_by=2, 
    paired_with='files', group_with='sample', for_each=dict(i=range(5))

you can access _sample, _files, and i both directly, and as

_input[0]._files
_input._sample
_input.i

So that

sample = ['A',  'B']
files = ['a1', 'a2', 'a3', 'a4']
input: 'a1.txt', 'a2.txt', 'b1.txt', 'b2.txt', group_by=2, 
    paired_with='files', group_with='sample', for_each=dict(i=range(5))

print(f'_input={_input}, _files={_files}, _sample={_sample}, i={i}')
print(f'_input[0]._files={_input[0]._files}, _input._sample={_input._sample}, _input.i={_input.i}')

would produce:

_input=a1.txt a2.txt, _files=['a1', 'a2'], _sample=A, i=0
_input[0]._files=a1, _input._sample=A, _input.i=0
_input=b1.txt b2.txt, _files=['a3', 'a4'], _sample=B, i=0
_input[0]._files=a3, _input._sample=B, _input.i=0
...
@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

Random thoughts on names:

output_from_step(1)
output_with_name('ref')

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 21, 2018

3a4c41f has the first test case that fails due to incompatibility.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 22, 2018

An example for 'persistent' variables is

[10]
input: for_each=dict(i=range(5))
output: f'a_{i}.txt'
_output.touch()

[20]
print(i)

which produces

0
1
2
3
4

because i is set to _output, then as groups of step_output of step 10, then as step_input of step 20, then the groups are unpacked, and the groups variables are populated to the step namespace.

BoPeng pushed a commit that referenced this issue Dec 22, 2018
@BoPeng BoPeng closed this as completed Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants