Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of targets and groups #1114

Closed
BoPeng opened this issue Dec 20, 2018 · 33 comments
Closed

Behavior of targets and groups #1114

BoPeng opened this issue Dec 20, 2018 · 33 comments

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Dec 20, 2018

We have

sos_targets.set_each(name, sequence)

to set variables to each target of the sos_targets, but each is not quite clear as what

sos_groups( ).set_each(name, sequence)

does because on the surface sos_groups() returns groups.

It would be better to do

sos_targets.set_to_targets(name, sequence)

and add

sos_groups( ).set_to_groups(name, sequence)

to make it clear what the variables are set to. In the latter case, set_to_groups can be an alternative method for group_with (#1113).

@gaow
Copy link
Member

gaow commented Dec 20, 2018

Sorry I was up very late last night due to preparation of a collaborator on-site visit today. I'm a bit lost now because sos_groups( ).set_each(name, sequence) is clear to me that it does it for each group.

In the latter case, set_to_groups can be an alternative method for group_with (#1113).

I thought set_each is the new group_with ...? Just trying to simplify the interface.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

I knew it can be confusing.

set_each

set variable to targets, not groups because there is no difference between

sos_targets().set_each

and

sos_groups().set_each

because both are just sos_targets objects. So the original set_each was designed for paired_with.

Now we have

set_to_targets

that assigns variables to each target (which then affects all targets in groups, thus in _input), and

set_to_groups

that assigns variables to each group so it affects only _input, not its children. So the former is for paired_with, the latter is for group_with.

@gaow
Copy link
Member

gaow commented Dec 20, 2018

So the former is for paired_with, the latter is for group_with.

I see, that explains a lot for me. Basically set_to_groups happen after by, set_to_targets happen before by. But for other users, it is hard to digest by reading these method names.

Still thinking ...

This was referenced Dec 20, 2018
@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

set_to_targets can be applied to sos_groups(...) because set_to_targets will be applied to t1, t2, t3, and t4, and set_to_groups will be applied to groups t1 t2 and t3 t4, and yet there is the plain set that will be applied to the entire sos_targets.

t1 t2 t3 t4
_groups:
    t1 t2
    t3 t4

@gaow
Copy link
Member

gaow commented Dec 20, 2018

set_to_targets can be applied to sos_groups(...)

Yes I think I understand that.

and yet there is the plain set that will be applied to the entire sos_targets.

Maybe concrete example helps.

[1]
output: 1.txt, 2.txt 3.txt 4.txt

[2]
vars1 = [1,2,3,4]
input: output_from(-1).set('vars1', vars1)
print(_input, _input.get('vars1'))

gives

1.txt 2.txt 3.txt 4.txt [1,2,3,4]
[2]
vars1 = [1,2,3,4]
input: sos_groups(output_from(-1), by = 2).set('vars1', vars1)
print(_input, _input.get('vars1'))

gives

1.txt 2.txt [1,2,3,4]
3.txt 4.txt [1,2,3,4]
[2]
vars1 = [1,2,3,4]
input: sos_groups(output_from(-1), by = 2).set_to_target('vars1', vars1)
print(_input, _input.get('vars1'))

gives

1.txt 2.txt [1,2]
3.txt 4.txt [3,4]
[2]
vars1 = [1,2,3,4]
vars2 = [5,6]
input: sos_groups(output_from(-1), by = 2).set_to_target('vars1', vars1).set_to_group('vars2', vars2).set('vars3', vars2).set('vars4', vars1)
print(_input, _input.get('vars1'), ...)

gives

1.txt 2.txt [1,2] 5 [5,6]  [1,2,3,4]
3.txt 4.txt [3,4] 6 [5,6]  [1,2,3,4]

?

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

The behavior of set is unclear here because it is supposed to be set to step_input.

input: sos_groups(output_from(-1), by = 2).   # _input, correct groups
set_to_target('vars1', vars1).    # correct grouping 1, 2, and 3, 4
set_to_group('vars2', vars2).  # correct assignment of 5, 6 to groups (_group)
set('vars3', vars2).    # undefined yet
set('vars4', vars1) .   # undefined 

@gaow
Copy link
Member

gaow commented Dec 20, 2018

Okay so why not still call .set_to_target() by .paired_with(), and .set_to_group() by .group_with()? And .attach() to make whatever in there as a whole via _input()?

@gaow
Copy link
Member

gaow commented Dec 20, 2018

Also when you say "correct" in your latest thread you mean my understanding was "correct"?

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

Yes, but your code example was wrong (not real code) since _input.get('vars1') would not work. It is target level variable so _input[0].get('vars1') is needed (unless _input has only one element).

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

[1]
output: '1.txt', '2.txt', '3.txt', '4.txt'
_output.touch()


[2]
vars1 = [1,2,3,4]
vars2 = [5,6]
input: sos_groups(output_from(-1), by = 2).set_to_targets('vars1', vars1).set_to_groups('vars2', vars2).set('vars3', vars2).set('vars4', vars1)
print(_input, _input[0].get('vars1'), _input[1].get('vars1'), _input.get('vars2'), _input.get('vars3'), step_input.get('vars3'), step_input.get('vars4'))

yields

1.txt 2.txt 1 2 5 None None None
3.txt 4.txt 3 4 6 None None None

because step_input is used to create an internal sos_targets that produces _input. step_input was not updated with .set() so step_input.get('vars1') does not work. Easy to fix though.

BoPeng pushed a commit that referenced this issue Dec 20, 2018
@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

New output

1.txt 2.txt 1 2 5 None [5, 6] [1, 2, 3, 4]
3.txt 4.txt 3 4 6 None [5, 6] [1, 2, 3, 4]

_input.get('var3') does not work because it belongs to step_input.

@gaow
Copy link
Member

gaow commented Dec 20, 2018

Great! But this way I do not see the value of set or attach, if we refer to them by the global variable name directly, right? I'm not convinced we need that, when I realize these attributes do not get carried on to the next steps.

Also again, why not using .paired_with() and .group_with()?

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

So previous with

input: 'a.txt', 'b.txt', paired_with={'name': ['A', 'B']}

we can use them as

for target, name in zip(_input, name):
    print(f'Sample {name}: {target}')

and now with

input: sos_targets('a.txt', 'b.txt').set_to_targets('name', ['A', 'B'])

we can use them as

for target in _input:
    print(f'Sample {target.get("name"]}: {target}')

As for naming, since we have set, set_to_targets and set_to_groups make it clear what the functions do, and paired_with is less clearer.

As for carrying information, nothing prevents you from doing

input: for_each={'trunk': range(5)}
output: file_target(xxx).set('trunk', trunk)

Then the step_output will be a collection of file_target with trunk information. What I said was that

  1. _output is a sos_target created by output and there is no way to set properties to _output (unless we introduce some output parameter).
  2. _output will be accumulated to create step_output so target level dictionary will be kept, but group level dictionary will be merged.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

Now, it is possible for us to keep _output as _groups of step_output. This will allow us to do something like

[1]
input: sos_groups('a.txt', 'b.txt', by=1)
output: _input.with_suffix('txt1')

[2]
output: _input.with_suffix('txt2')

where the default output_from(-1) will have groups already so there is no need to do sos_groups() again in [2]. This can be very useful, or very confusing in the same time.

@gaow
Copy link
Member

gaow commented Dec 20, 2018

output: file_target(xxx).set('trunk', trunk)

Hmm then I can try to use it from the next step via some smart usage of step_input[_index]? This seems to have justified the existence of set. I can live with this hack. Yes I understand the limitations you have pointed out ... there is currently no happy solution, but it is better than no solution at all.

set, set_to_targets and set_to_groups make it clear what the functions do, and paired_with is less clearer.

Yes I think .group_with() is a clear name. .paired_with() is less so. But target is an SoS jargon that unless people read our documentation they will not get it from looking at it (against human-readable). So I'm still think of replacing them.

This can be very useful, or very confusing in the same time.

Sorry it is more confusing than useful in my view.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

Sorry it is more confusing than useful in my view.

Not so if we promote groups as the first class citizen and say we are passing groups of objects around... For example, when you have a "set" of train and test and result. You can do

[1]
input: sos_groups(raw_data) 
output: groups of train and set . # automatic in groups

[2]
input: # default, groups of train and set
output: groups of result . # automatic in groups

[3]
input: output_from([1,2]) # groups of train,set and result, automatically merged

@gaow
Copy link
Member

gaow commented Dec 20, 2018

Okay but what if I want to undo the grouping? ie merge all groups. Or to re-group?

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

output_from([1, 2], grouping='keep')?

@gaow
Copy link
Member

gaow commented Dec 20, 2018

hmm, maybe output_from([1, 2], keep_grouped = True) or grouped = True. And default is False?

@gaow gaow changed the title Rename sos_targets.set_each to set_to_targets Behavior of set_to_targets and set_to_groups Dec 20, 2018
@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

Something like that. The default can be True or False but False should be safer because we want to inherit groups in an explicit way, and because of backward compatibility.

If we are to do this, then the difference between sos_groups and sos_targets would be really confusing, and it is better just to say sos_targets have groups, and do something like

input: 'a.txt', 'b.txt', group_by=1

is the same as

input: sos_targets('a.txt', 'b.txt', group_by=1)

We have less a need to regroup so

input: output_from(1, grouped=True)

should use the original grouping, and then

input: output_from(1, group_by=2)

can regroup.

Then we can have multiple sos_targets and we can create a larger sos_targets by merging groups.

input: output_from(1, groups=True), output_from(2, groups=True)

@gaow gaow changed the title Behavior of set_to_targets and set_to_groups Behavior of targets and groups Dec 20, 2018
@gaow
Copy link
Member

gaow commented Dec 20, 2018

Sounds like a good idea! Also conceptually more consistent with the implementation. Do you see any limitations given all we have been discussing? Obviously it involves lots of typing, unless we change sos_targets to targets at least on the interface ...

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

We discourage the use of sos_targets explicitly in scripts, and I do not think we have used it in this way, so renaming should not be an issue.

Even with the new stuff, sos_targets is only needed when you want to do something like

input: sos_targets('a.txt', 'b.txt', group_by=1), 
    sos_targets('c.txt', 'd.txt', 'e.txt', 'f.ext', group_by=2)

because in most cases

input: `c.txt', 'd.txt', 'e.txt', 'f.ext', group_by=2

would be fine and is backward compatible.

@gaow
Copy link
Member

gaow commented Dec 20, 2018

I think it makes sense that targets or something like that will be for more advanced usage of group_by and various set-like stuff, particularly when named input and output is supported.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

Things can be tricky for

input: 'a.txt', 'b.txt', output_from(1, grouped=True), group_by=1

We know how to handle

input: sos_targets('a.txt', 'b.txt',  group_by=1), output_from(1, grouped=True)

but in the case of

input: 'a.txt', 'b.txt', output_from(1, grouped=True), group_by=1

the whole thing should be

input: sos_targets('a.txt', 'b.txt', output_from(1, grouped=True), group_by=1)

and I suppose grouped=True should be overridden by group_by.

@gaow
Copy link
Member

gaow commented Dec 20, 2018

I think for that corner case it is fair enough to throw an error for grouping ambiguity.

And just to double-check,

input: sos_targets('a.txt', 'b.txt',  group_by=1), output_from(1, grouped=True)

or, in general, multiple sos_targets each with group_by, should only be valid if the total length of all groups are the same, right?

@gaow
Copy link
Member

gaow commented Dec 20, 2018

Another point is that,

[1]
...

[2]
input: group_by = 1

is a lot easier than

[2]
input: output_from(-1, grouped=True)

Even

[2]
input: targets(group_by=1)

is easier to type than to keep the group.

So, do we really need this?

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

So grouped should be True by default? 😄

Keeping group is conceptually useful for complex cases like the train, test, result stuff, especially when re-grouping might mess up the original grouping. It also allows you to keep the meta information for each group, in this case the for_each stuff you asked.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

The only incompatible case that changing default to True would bring is

[1]
output: with goup

[2]
input: use default

because _input will be different. The other case when

[2]
input: group_by=1

would be unaffected since it will be considered to be regrouping.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

multiple sos_targets each with group_by, should only be valid if the total length of all groups are the same, right?

Right now we support N - N, 0-N and 1 - N merging. 0 means no group information, 1 means a single group.

@gaow
Copy link
Member

gaow commented Dec 20, 2018

So grouped should be True by default? 

Cannot argue against it now ...

especially when re-grouping might mess up the original grouping.

Very much agreed.

The only incompatible case that changing default to True would bring is

i would personally not worry about compatibility for this change because I know I have only a few pipelines that will be impacted but not more than a few and I think I remember exactly what they are. I'm just still trying to think about possible bad effects we are not seeing now.

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

In any case I am happy to get rid of sos_groups.

@gaow
Copy link
Member

gaow commented Dec 20, 2018

I think our discussion is converging. Would you mind updating #1115 when you get a chance?

@BoPeng
Copy link
Contributor Author

BoPeng commented Dec 20, 2018

#1115 updated. You can check if everything look ok, except perhaps about the naming of things.

BoPeng pushed a commit that referenced this issue Dec 20, 2018
@BoPeng BoPeng closed this as completed Dec 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants