Load correctly the delegated Targets objects hierarchy #1052

sechkova · 2020-06-10T14:57:43Z

Fixes issue #: #1045

Description of the changes being introduced by the pull request:

Update load_repository() function to load the delegations metadata starting from 'targets' and traversing downwards the delegated roles in order to load correctly the delegations hierarchy.
Improve readability by extracting the loading of metadata_directory files in a separate function get_delegations_filenames() in a similar way in which top level metadata files are loaded by get_metadata_filenames().
Update test_load_repository.

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

trishankatdatadog · 2020-06-10T17:16:01Z

Thanks, Theodora!

I have just one concern: how does this handle the possibility that Alice and Bob may delegate to Charlie using different sets and thresholds of keys from each other?

lukpueh · 2020-06-11T07:42:53Z

💯, 🎉, 🥳, @sechkova!!

I have just one concern: how does this handle the possibility that Alice and Bob may delegate to Charlie using different sets and thresholds of keys from each other?

@trishankatdatadog, the current reference implementation already does not support such "promiscuous" delegations (see #589 and for a more general description of the issue #660).
Supporting that behavior is way out of the scope of #1045.

trishankatdatadog · 2020-06-11T14:46:31Z

"promiscuous" delegations

Got it. Gods, I hate that term, let's find another name, please...

joshuagl · 2020-06-11T14:49:49Z

"promiscuous" delegations

Got it. Gods, I hate that term, let's find another name, please...

Agreed. Promiscuous has negative connotations.

In theupdateframework/specification#96 Marina used the term multi-role delegations

trishankatdatadog · 2020-06-11T14:54:08Z

In theupdateframework/specification#96 Marina used the term multi-role delegations

Thanks, Josh. I don't think the problem is specific to multi-role delegations. I believe the idea is that Alice and Bob, on entirely different parts of the graph, can delegate to Charlie using entirely different thresholds and keys. I don't know what's a good, succinct word for this, except to say something like "delegator-dependent keys for a delegatee" (rolls off the tongue, I know).

lukpueh · 2020-06-11T14:58:48Z

Yes, multi-role delegations as described in TAP 3 are something else. It's a one-to-many delegation where the delegated-to roles need to agree on a target.

Promiscuous delegations are many-to-many delegations.

joshuagl · 2020-06-11T20:48:37Z

This looks good to me. There's some nice clean-ups of the modified code in here too, thanks @sechkova

lukpueh

Thanks again for this great patch, @sechkova! This is just a partial review that touches repository_lib.py only. I'll follow up with the rest tomorrow.

Btw. I am aware that the things I flagged were already there before, but since they are in your diff, I'm asking you to fix them. :P

lukpueh · 2020-06-17T12:18:07Z

tuf/repository_lib.py

@@ -811,6 +811,59 @@ def import_ed25519_privatekey_from_file(filepath, password=None):
  return private_key


+
+def get_delegations_filenames(metadata_directory, consistent_snapshot,


Minor nit: Would you mind calling the function something like get_delegated_role_metadata_filenames just so that it's clear that delegation != role? (see #660) :)

... or maybe get_non_top_level_metadata_filenames? 🤷

35f72a3 I renamed both get_*_metadata_filenames ... 🤷

lukpueh · 2020-06-17T12:21:04Z

tuf/repository_lib.py

+      continue
+
+    # Skip top-level roles, only interested in delegated roles now that the
+    # top-level roles have already been loaded.


I think you can remove the comment, the "now that the top-level roles have already been loaded" part is not in the scope of this function anymore and given the function name it should be clear what's happening.

Done in here: da09a22

lukpueh · 2020-06-17T12:35:49Z

tuf/repository_lib.py

+    # Keep a store of metadata previously loaded metadata to prevent re-loading
+    # duplicate versions.  Duplicate versions may occur with
+    # 'consistent_snapshot', where the same metadata may be available in
+    # multiples files (the different hash is included in each filename).


The comment has a stray "metadata" (first line) and is a bit outdated, as consistent role metadata does not use a hash-prefix anymore but a version number (last line). A comment like this still might be helpful, but maybe rather above, where we define loaded_metadata and call sorted(..., reverse=True).
What do you think about something like this directly above the for loop?

# Iterate over role metadata files, sorted by their version-number prefix, with # more recent versions first, and only add the most recent version of any # (non top-level) metadata to the list of returned filenames. Note that there # should only be one version of each file, if consistent_snapshot is False.

Also done in da09a22 , thanks for suggesting the text which I directly reused ;)
I still left one line comment for the lazy ones who won't scroll up to the start of the for loop.

lukpueh · 2020-06-17T13:10:21Z

tuf/repository_lib.py

+      continue
+
+    filenames[metadata_name] = metadata_path
+    loaded_metadata.append(metadata_name)


Why not just

if metadata_name not in filenames: filenames[metadata_name] = metadata_path

Should be a lot faster (O(1) vs. O(n)), less code and less memory. :)

Thanks for noticing, here it is da09a22

lukpueh

Thanks for the excellent work, @sechkova, you're quite the magician! :) Please consider improving the code comment as suggested inline (and my review comments from yesterday). Otherwise this LGTM!

lukpueh · 2020-06-18T11:00:46Z

tuf/repository_tool.py

+  # top-level targets: [('role1', 'targets'), ('role2', 'targets'), ... ]
+  roleinfo = tuf.roledb.get_roleinfo('targets', repository_name)
+  for role in roleinfo['delegations']['roles']:
+    delegations.append([role['name'], 'targets'])


Minor nit: The comment suggests delegations are list of tuples, but the code makes it a list of lists. Please change either one. IMO tuples are fine.

And tuples they are!

lukpueh · 2020-06-18T11:29:20Z

tuf/repository_tool.py

+    # Append the next level delegations to the list:
+    # the 'delegated' role becomes the 'delegating'
+    for delegation in metadata_object['delegations']['roles']:
+      delegations.append([delegation['name'], rolename])


Wow, this is cool stuff! But I admit it took me a while to understand what's happening. You're adding delegations to the list of delegations as you iterate over them, thus traversing the tree of delegations. :)

Out of curiosity, is there a name for this pattern?

At any rate, we should add a big comment that describes it above the outer loop definition (L3123). And we should definitely point out that cycles in the delegation graph, will make this an infinite loop.

Let's call it a custom breadth-first search 😁
I tried to improve it after you pointed out the potential infinite loop.
The loaded delegations were still kept in the list of delegations anyway so the options to avoid repetitions seem to be:

if x is in list - O(n)

replace the list with a dictionary for O(1) lookup but dictionaries are guaranteed to be ordered only in Python 3.x so the traversal won't work

use a deque for traversing the delegations (keep only the next in line) + set for storing already loaded ones

If this seems unnecessary overhead since cycles in the delegations are not likely to happen, we can keep only the deque and get rid of the set + adding a comment.

Hopefully the new comments help to make things look less magical 🙂
(and this is the new commit c79ea55)

Haha, I think it just became even more magical. 😆

Regarding the options:

Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.

The need for deque is not fully obvious to me. Are you doing this to save memory? If so, I suppose that's fine. Maybe you can leave a note about the motivation. :)

Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.

It could be better to warn during the creation of the delegation ... but I think currently there is no mechanism to detect that so ... better late than never?

trishankatdatadog · 2020-06-18T15:01:42Z

Teodora is very good!

Update load_repository() function to load the delegations metadata starting from 'targets' and traversing downwards the delegated roles in order to load correctly the delegations hierarchy. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Add a tests case checking if delegated Targets() objects are loaded correctly. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Remove unnecessary list keeping track of loaded file names and rewrite outdated comments. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

lukpueh · 2020-06-24T16:03:59Z

tuf/repository_tool.py

-    targets_object = targets_objects['targets']
-    targets_objects[metadata_name] = new_targets_object
+    targets_objects['targets'].add_delegated_role(rolename,
+        new_targets_object)


Off-topic:
I find it extremely odd that the top-level Targets object does not make a difference between roles it delegates to directly and indirectly (via other roles).

lukpueh · 2020-06-25T11:11:37Z

tuf/repository_tool.py

+    # Append the next level delegations to the list:
+    # the 'delegated' role becomes the 'delegating'
+    for delegation in metadata_object['delegations']['roles']:
+      delegations.append([delegation['name'], rolename])


Haha, I think it just became even more magical. 😆

Regarding the options:

Using a set as an infinite loop safe-guard seems like a good idea. And definitely better than using a list.
I think we would want to warn the caller if we encounter a loop, what do you think? To me, loops in the delegation graph don't seem desirable. But I may be missing something.

The need for deque is not fully obvious to me. Are you doing this to save memory? If so, I suppose that's fine. Maybe you can leave a note about the motivation. :)

joshuagl · 2020-06-25T11:23:07Z

tuf/repository_lib.py

+      continue
+
+    # Skip top-level roles, only interested in delegated roles.
+    if metadata_name in ['root', 'snapshot', 'targets', 'timestamp']:


We have this kind of pattern in enough places that I think it would be worthwhile to add a constant for the list of top-level role names.

Agreed. Maybe we can cherry-pick 7100dc3

I thought I'd seen that somewhere already, and the same author too. 😄

joshuagl · 2020-06-25T11:28:04Z

tuf/repository_tool.py

-    # top-level roles have already been loaded.
-    if metadata_name in ['root', 'snapshot', 'targets', 'timestamp']:
+  # A deque used to keep the next delegation to be loaded
+  delegations = deque()


I'm curious about the use of a deque here, can you explain the rationale a bit?
AFAICT you're just using it as a FIFO, at which point a list with pop(0) would work just as well and would be a bit more familiar.

Answering both of your comments @lukpueh @joshuagl :

Yes, @joshuagl got it right, the idea is to use a FIFO and continuously append the next delegations while removing(pop(0)/popleft()) the loaded one (unlike in my first implementation where the delegations were only appended and never removed from the list).

@joshuagl my reasoning about list vs deque as a FIFO

Using only a FIFO structure would be enough if there were no potential cycles in the graph, but since what we want to achieve is traverse the graph AND avoid loops, we need one ordered structure + one fast lookup structure, hence the deque+set (or one that can do both which would be the dict in Python 3.x).

And for a completeness of this discussion ...
On the other hand, up to now we don't have exact constraints in load_repository() regarding the graph traversal and we can as well use a list as a stack (LIFO) which will result in a depth-first-search type of traversal.

I'm open to suggestions about less magical code or more pythonic way of implementing this :)

Yes, @joshuagl got it right, the idea is to use a FIFO and continuously append the next delegations while removing(pop(0)/popleft()) the loaded one (unlike in my first implementation where the delegations were only appended and never removed from the list).

Sure, but your first approach already implemented a FIFO queue, didn't it? The only difference that I saw is that the list grows in your first approach but does not in your second.

# Go from left to right, adding to the right ↓ | 0 | ↓ | 0 | 1 | ↓ | 0 | 1 | 2 |

# Pop from left, add to the right ↓ | 0 | ↓ | 1 | ↓ | 2 |

hence the deque+set (or one that can do both which would be the dict in Python 3.x).

Haven't checked, but would collections.OrderedDict have the desired properties cross-version?

On the other hand, up to now we don't have exact constraints in load_repository() regarding the graph traversal and we can as well use a list as a stack (LIFO) which will result in a depth-first-search type of traversal.

AFAIU, when loading the delegation graph into memory it really shouldn't matter if we do depth or breadth first. This is only important when verifying targets.

hence the deque+set (or one that can do both which would be the dict in Python 3.x).

Haven't checked, but would collections.OrderedDict have the desired properties cross-version?

Probably it would do the job, I think I had overlooked it because it sounded outdated from the perspective of the newer Python versions ...

IMO really any of the approaches discussed here (FIFO or LIFO with list + set, deque + set or OrderedDict) is absolutely fine, I think I would even have been okay with not guarding against infinite loops, as long as it is documented. :)

So what do you think about keeping it as it is and adding another comment about saving memory by using a deque + leftpop?

warning the user when a loop is encountered.

warning the user when a loop is encountered.

Is this related to the sentence above ("okay with not guarding against infinite loops") but appearing as a bullet? 😁
Otherwise, yep, it works for me :)

haha, this was meant to be a + as in:

... what do you think about adding another comment ... plus warning the user when a loop is encountered?

Replace the list used for the delegations graph traversal with a deque and use a set to store already loaded roles and avoid loops in case of cycles in the graph. Improve comments and readability. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Use the top-level targets object to reference already loaded delegated targets instead of storing them in an additional dictionary in load_repository(). Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Rename repository_lib.get_metadata_filenames() and get_delegations_filenames() to better match their functionality and tuf terminology. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Add TOP_LEVEL_ROLES as a global variable in roledb. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

sechkova · 2020-06-29T11:14:52Z

f1a6676 : Amended with an extended comment about the deque usage and a warning in case of a cycle in the graph
97eff9e : Discard the dictionary used for temporary storing new_targets_objects and reference them by the parent_targets_object instead
6ae3ea6 : cherry-picked 7100dc3

lukpueh

🥇 Thanks for addressing all the comments! :)

lukpueh requested changes Jun 17, 2020

View reviewed changes

lukpueh reviewed Jun 18, 2020

View reviewed changes

sechkova added 3 commits June 24, 2020 13:52

Update test_load_repository

73bff87

Add a tests case checking if delegated Targets() objects are loaded correctly. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Improve get_delegations_filenames performance and readability

da09a22

Remove unnecessary list keeping track of loaded file names and rewrite outdated comments. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

sechkova force-pushed the issue-1045 branch from 3bfe50c to 35f72a3 Compare June 24, 2020 10:54

lukpueh reviewed Jun 25, 2020

View reviewed changes

joshuagl reviewed Jun 25, 2020

View reviewed changes

sechkova added 4 commits June 29, 2020 13:19

Reference loaded delegated targets objects by top-level targets

97eff9e

Use the top-level targets object to reference already loaded delegated targets instead of storing them in an additional dictionary in load_repository(). Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Rename get_*_metadata_filenames functions

a69208c

Rename repository_lib.get_metadata_filenames() and get_delegations_filenames() to better match their functionality and tuf terminology. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Add TOP_LEVEL_ROLES as a global variable

6ae3ea6

Add TOP_LEVEL_ROLES as a global variable in roledb. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

sechkova force-pushed the issue-1045 branch from 35f72a3 to 6ae3ea6 Compare June 29, 2020 11:04

lukpueh approved these changes Jun 29, 2020

View reviewed changes

lukpueh merged commit 49031b4 into theupdateframework:develop Jun 29, 2020

sechkova deleted the issue-1045 branch June 29, 2020 13:04

joshuagl mentioned this pull request Jul 7, 2020

load_repository() does not load correctly Targets objects of delegated roles #1045

Closed

di mentioned this pull request Feb 1, 2022

Roadmap for PEP 458 pypi/warehouse#10672

Open

52 tasks

		@@ -811,6 +811,59 @@ def import_ed25519_privatekey_from_file(filepath, password=None):
		return private_key



		def get_delegations_filenames(metadata_directory, consistent_snapshot,

Load correctly the delegated Targets objects hierarchy #1052

Load correctly the delegated Targets objects hierarchy #1052

Conversation

sechkova commented Jun 10, 2020

trishankatdatadog commented Jun 10, 2020

lukpueh commented Jun 11, 2020

trishankatdatadog commented Jun 11, 2020

joshuagl commented Jun 11, 2020

trishankatdatadog commented Jun 11, 2020

lukpueh commented Jun 11, 2020

joshuagl commented Jun 11, 2020

lukpueh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukpueh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trishankatdatadog commented Jun 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sechkova Jun 25, 2020 • edited Loading

Choose a reason for hiding this comment

lukpueh Jun 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sechkova commented Jun 29, 2020

lukpueh left a comment

Choose a reason for hiding this comment

sechkova Jun 25, 2020 •

edited

Loading

lukpueh Jun 25, 2020 •

edited

Loading