Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load correctly the delegated Targets objects hierarchy #1052

Merged
merged 7 commits into from
Jun 29, 2020

Conversation

sechkova
Copy link
Contributor

Fixes issue #: #1045

Description of the changes being introduced by the pull request:

  • Update load_repository() function to load the delegations metadata starting from 'targets' and traversing downwards the delegated roles in order to load correctly the delegations hierarchy.
  • Improve readability by extracting the loading of metadata_directory files in a separate function get_delegations_filenames() in a similar way in which top level metadata files are loaded by get_metadata_filenames().
  • Update test_load_repository.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@trishankatdatadog
Copy link
Member

Thanks, Theodora!

I have just one concern: how does this handle the possibility that Alice and Bob may delegate to Charlie using different sets and thresholds of keys from each other?

@lukpueh
Copy link
Member

lukpueh commented Jun 11, 2020

💯, 🎉, 🥳, @sechkova!!

I have just one concern: how does this handle the possibility that Alice and Bob may delegate to Charlie using different sets and thresholds of keys from each other?

@trishankatdatadog, the current reference implementation already does not support such "promiscuous" delegations (see #589 and for a more general description of the issue #660).
Supporting that behavior is way out of the scope of #1045.

@trishankatdatadog
Copy link
Member

"promiscuous" delegations

Got it. Gods, I hate that term, let's find another name, please...

@joshuagl
Copy link
Member

"promiscuous" delegations

Got it. Gods, I hate that term, let's find another name, please...

Agreed. Promiscuous has negative connotations.

In theupdateframework/specification#96 Marina used the term multi-role delegations

@trishankatdatadog
Copy link
Member

In theupdateframework/specification#96 Marina used the term multi-role delegations

Thanks, Josh. I don't think the problem is specific to multi-role delegations. I believe the idea is that Alice and Bob, on entirely different parts of the graph, can delegate to Charlie using entirely different thresholds and keys. I don't know what's a good, succinct word for this, except to say something like "delegator-dependent keys for a delegatee" (rolls off the tongue, I know).

@lukpueh
Copy link
Member

lukpueh commented Jun 11, 2020

Yes, multi-role delegations as described in TAP 3 are something else. It's a one-to-many delegation where the delegated-to roles need to agree on a target.

Promiscuous delegations are many-to-many delegations.

@joshuagl
Copy link
Member

This looks good to me. There's some nice clean-ups of the modified code in here too, thanks @sechkova

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this great patch, @sechkova! This is just a partial review that touches repository_lib.py only. I'll follow up with the rest tomorrow.

Btw. I am aware that the things I flagged were already there before, but since they are in your diff, I'm asking you to fix them. :P

@@ -811,6 +811,59 @@ def import_ed25519_privatekey_from_file(filepath, password=None):
return private_key



def get_delegations_filenames(metadata_directory, consistent_snapshot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Would you mind calling the function something like get_delegated_role_metadata_filenames just so that it's clear that delegation != role? (see #660) :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or maybe get_non_top_level_metadata_filenames? 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

35f72a3 I renamed both get_*_metadata_filenames ... 🤷

continue

# Skip top-level roles, only interested in delegated roles now that the
# top-level roles have already been loaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the comment, the "now that the top-level roles have already been loaded" part is not in the scope of this function anymore and given the function name it should be clear what's happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in here: da09a22

# Keep a store of metadata previously loaded metadata to prevent re-loading
# duplicate versions. Duplicate versions may occur with
# 'consistent_snapshot', where the same metadata may be available in
# multiples files (the different hash is included in each filename).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has a stray "metadata" (first line) and is a bit outdated, as consistent role metadata does not use a hash-prefix anymore but a version number (last line). A comment like this still might be helpful, but maybe rather above, where we define loaded_metadata and call sorted(..., reverse=True).
What do you think about something like this directly above the for loop?

# Iterate over role metadata files, sorted by their version-number prefix, with
# more recent versions first, and only add the most recent version of any
# (non top-level) metadata to the list of returned filenames. Note that there
# should only be one version of each file, if consistent_snapshot is False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also done in da09a22 , thanks for suggesting the text which I directly reused ;)
I still left one line comment for the lazy ones who won't scroll up to the start of the for loop.

continue

filenames[metadata_name] = metadata_path
loaded_metadata.append(metadata_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just

if metadata_name not in filenames:
  filenames[metadata_name] = metadata_path

Should be a lot faster (O(1) vs. O(n)), less code and less memory. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing, here it is da09a22

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the excellent work, @sechkova, you're quite the magician! :) Please consider improving the code comment as suggested inline (and my review comments from yesterday). Otherwise this LGTM!

# top-level targets: [('role1', 'targets'), ('role2', 'targets'), ... ]
roleinfo = tuf.roledb.get_roleinfo('targets', repository_name)
for role in roleinfo['delegations']['roles']:
delegations.append([role['name'], 'targets'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: The comment suggests delegations are list of tuples, but the code makes it a list of lists. Please change either one. IMO tuples are fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And tuples they are!

# Append the next level delegations to the list:
# the 'delegated' role becomes the 'delegating'
for delegation in metadata_object['delegations']['roles']:
delegations.append([delegation['name'], rolename])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is cool stuff! But I admit it took me a while to understand what's happening. You're adding delegations to the list of delegations as you iterate over them, thus traversing the tree of delegations. :)

Out of curiosity, is there a name for this pattern?

At any rate, we should add a big comment that describes it above the outer loop definition (L3123). And we should definitely point out that cycles in the delegation graph, will make this an infinite loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it a custom breadth-first search 😁
I tried to improve it after you pointed out the potential infinite loop.
The loaded delegations were still kept in the list of delegations anyway so the options to avoid repetitions seem to be:

  • if x is in list - O(n)
  • replace the list with a dictionary for O(1) lookup but dictionaries are guaranteed to be ordered only in Python 3.x so the traversal won't work
  • use a deque for traversing the delegations (keep only the next in line) + set for storing already loaded ones

If this seems unnecessary overhead since cycles in the delegations are not likely to happen, we can keep only the deque and get rid of the set + adding a comment.

Hopefully the new comments help to make things look less magical 🙂
(and this is the new commit c79ea55)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I think it just became even more magical. 😆

Regarding the options:

Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.

The need for deque is not fully obvious to me. Are you doing this to save memory? If so, I suppose that's fine. Maybe you can leave a note about the motivation. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.

It could be better to warn during the creation of the delegation ... but I think currently there is no mechanism to detect that so ... better late than never?

@trishankatdatadog
Copy link
Member

Teodora is very good!

Update load_repository() function to load the delegations metadata
starting from 'targets' and traversing downwards the delegated
roles in order to load correctly the delegations hierarchy.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add a tests case checking if delegated Targets() objects
are loaded correctly.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Remove unnecessary list keeping track of loaded file names and
rewrite outdated comments.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
targets_object = targets_objects['targets']
targets_objects[metadata_name] = new_targets_object
targets_objects['targets'].add_delegated_role(rolename,
new_targets_object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic:
I find it extremely odd that the top-level Targets object does not make a difference between roles it delegates to directly and indirectly (via other roles).

# Append the next level delegations to the list:
# the 'delegated' role becomes the 'delegating'
for delegation in metadata_object['delegations']['roles']:
delegations.append([delegation['name'], rolename])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I think it just became even more magical. 😆

Regarding the options:

Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.

The need for deque is not fully obvious to me. Are you doing this to save memory? If so, I suppose that's fine. Maybe you can leave a note about the motivation. :)

continue

# Skip top-level roles, only interested in delegated roles.
if metadata_name in ['root', 'snapshot', 'targets', 'timestamp']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this kind of pattern in enough places that I think it would be worthwhile to add a constant for the list of top-level role names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Maybe we can cherry-pick 7100dc3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd seen that somewhere already, and the same author too. 😄

# top-level roles have already been loaded.
if metadata_name in ['root', 'snapshot', 'targets', 'timestamp']:
# A deque used to keep the next delegation to be loaded
delegations = deque()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the use of a deque here, can you explain the rationale a bit?
AFAICT you're just using it as a FIFO, at which point a list with pop(0) would work just as well and would be a bit more familiar.

Copy link
Contributor Author

@sechkova sechkova Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering both of your comments @lukpueh @joshuagl :

Yes, @joshuagl got it right, the idea is to use a FIFO and continuously append the next delegations while removing(pop(0)/popleft()) the loaded one (unlike in my first implementation where the delegations were only appended and never removed from the list).

@joshuagl my reasoning about list vs deque as a FIFO

Using only a FIFO structure would be enough if there were no potential cycles in the graph, but since what we want to achieve is traverse the graph AND avoid loops, we need one ordered structure + one fast lookup structure, hence the deque+set (or one that can do both which would be the dict in Python 3.x).

And for a completeness of this discussion ...
On the other hand, up to now we don't have exact constraints in load_repository() regarding the graph traversal and we can as well use a list as a stack (LIFO) which will result in a depth-first-search type of traversal.

I'm open to suggestions about less magical code or more pythonic way of implementing this :)

Copy link
Member

@lukpueh lukpueh Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @joshuagl got it right, the idea is to use a FIFO and continuously append the next delegations while removing(pop(0)/popleft()) the loaded one (unlike in my first implementation where the delegations were only appended and never removed from the list).

Sure, but your first approach already implemented a FIFO queue, didn't it? The only difference that I saw is that the list grows in your first approach but does not in your second.

# Go from left to right, adding to the right
  ↓
| 0 | 
      ↓
| 0 | 1 |
          ↓
| 0 | 1 | 2 | 
# Pop from left, add to the right
  ↓
| 0 |
      ↓
    | 1 | 
          ↓
        | 2 |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hence the deque+set (or one that can do both which would be the dict in Python 3.x).

Haven't checked, but would collections.OrderedDict have the desired properties cross-version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, up to now we don't have exact constraints in load_repository() regarding the graph traversal and we can as well use a list as a stack (LIFO) which will result in a depth-first-search type of traversal.

AFAIU, when loading the delegation graph into memory it really shouldn't matter if we do depth or breadth first. This is only important when verifying targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hence the deque+set (or one that can do both which would be the dict in Python 3.x).

Haven't checked, but would collections.OrderedDict have the desired properties cross-version?

Probably it would do the job, I think I had overlooked it because it sounded outdated from the perspective of the newer Python versions ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO really any of the approaches discussed here (FIFO or LIFO with list + set, deque + set or OrderedDict) is absolutely fine, I think I would even have been okay with not guarding against infinite loops, as long as it is documented. :)

So what do you think about keeping it as it is and adding another comment about saving memory by using a deque + leftpop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • warning the user when a loop is encountered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • warning the user when a loop is encountered.

Is this related to the sentence above ("okay with not guarding against infinite loops") but appearing as a bullet? 😁
Otherwise, yep, it works for me :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, this was meant to be a + as in:

... what do you think about adding another comment ... plus warning the user when a loop is encountered?

Replace the list used for the delegations graph traversal with
a deque and use a set to store already loaded roles and avoid
loops in case of cycles in the graph.
Improve comments and readability.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Use the top-level targets object to reference already loaded
delegated targets instead of storing them in an additional
dictionary in load_repository().

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Rename repository_lib.get_metadata_filenames() and
get_delegations_filenames() to better match their
functionality and tuf terminology.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add TOP_LEVEL_ROLES as a global variable in roledb.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
@sechkova
Copy link
Contributor Author

f1a6676 : Amended with an extended comment about the deque usage and a warning in case of a cycle in the graph
97eff9e : Discard the dictionary used for temporary storing new_targets_objects and reference them by the parent_targets_object instead
6ae3ea6 : cherry-picked 7100dc3

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 Thanks for addressing all the comments! :)

@lukpueh lukpueh merged commit 49031b4 into theupdateframework:develop Jun 29, 2020
@sechkova sechkova deleted the issue-1045 branch June 29, 2020 13:04
@di di mentioned this pull request Feb 1, 2022
52 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants