Providing Repo Include Feature #1479

kensipe · 2020-04-22T02:55:44Z

This provides the ability for 1 repo index to include another (or multiple). This changes in code is backward compatible... if the new field "include:" is missing, it works fine.

The new index file might look like:

apiVersion: v1
entries:
  flink:
    - appVersion: 0.7.0
      name: flink
      operatorVersion: 0.3.0
      urls:
        - http://kudo.dev/flink
includes:
  - https://kudo-repository.storage.googleapis.com/

The "includes" is a list of urls... for testing file locations are also possible.
It is designed and tested so that duplicates are ignored and the root / parent repo entries take precedence. The includes are recursive so repoA could include repoB which includes repoC.

The value of this model is that a private repo user today is required to maintain all the versions of all the entries in the community repo in their private repo which is a burden. This will allow a private repo to have 1 entry of their operator and reference the community repo (with all it's updates and changes).

For the user searching or installing there is no perceived difference.

There are 2 things left todo (which are marked with todo in code.

Error handling... I am hoping that we will agree that errors of includes will be ignored / clogged ... if a connection is down we don't want the whole operation to not be useful... although this creates odd edge cases.
I want to pass a map to the recursive function and ignore (with clog messages) duplicate entries or urls which have already been processed... this is mainly to prevent infinite loops... but it will also be more efficient.

Signed-off-by: Ken Sipe <kensipe@gmail.com>

nfnt · 2020-04-22T08:22:30Z

Some high level comments based on the description, didn't look into the code yet:

I think that parent repo entries shouldn't take precedence. Repos including other repos should be able to override existing operators. This will allow private repo users to add or customize features of existing operators. Also this makes it easier for operator developers to test deploy possible changes.
Errors are an interesting topic. This should be closely related to the expectations users a repo that includes another repo should have: If we treat the include like a merge, then we should return an error if the included repo isn't found. Because if we wouldn't, we would only take the local repo into account for installing or searching packages and that could result in unexpected results.

kensipe · 2020-04-22T15:22:32Z

@nfnt thanks for jumping in and providing early feedback! I'm confused by your comment 1... the fact that the parent repo takes precedence provides the value the rest of the paragraph expresses. If repo A includes repo B... if repo B has operator CC, repo A would automatically appear like it has CC.. but also repo A can create it's own CC and custom it and now access to repo A for CC will provide its operator... the operator CC from repo B will not be "merged" or "seen" by clients to repo A.

Regarding comment 2... yeah... I'm mixed about this... I'm leaning the same way. I'm also thinking we add a flag to ignore the errors... but perhaps that is premature

nfnt · 2020-04-22T15:58:28Z

Okay, your example is exactly what I meant. If repo A includes repo B and provides its custom version of an operator that's already in repo B, then the operator from repo A takes precedence.

ANeumann82

I like it so far.

Regarding the error handling - as we allow the top-level repo to overwrite operators, I think a failure to load one of the includes may be ok. There should be a warning, but I'm not sure it requires an error.

We should try to ensure that a failure to load any of the includes leads to a different operator version installed. So, if a repo overwrites a certain OV, then we always a) install that OV or b) error out.

The only situation where I can imagine that happening is:
Repo BB and CC have OperatorVersion X
AA includes BB and CC in this order.

Normally, a User would install BB.X. If the repository URL for BB can not be loaded, but CC can, the user would suddenly get CC.X instead of BB.X.

ANeumann82 · 2020-04-23T08:36:35Z

pkg/kudoctl/util/repo/repo_operator.go

+		nextIndex, err := c.downloadIndexFile(indexFile, iURL)
+		if err != nil {
+			return nil, err
+		}
+		if parent != nil {
+			c.Merge(parent, nextIndex)
+		} else {
+			c.Merge(indexFile, nextIndex)
+		}


I think this block works incorrectly for nested includes.

Assuming we have three repos that have nested includes: AA includes BB, BB includes CC.
BB and CC contain operator X.

The final index should contain BB.X, if I understand correctly.

With this code, we would:

downloadIndexFile(nil, urlAA) -> indexFile for AA has no operator X

downloadIndexFile(indexFileAA, urlBB)

downloadIndexFile(indexFileAA, urlCC)

c.Merge(indexFileAA, indexCC) -> indexFileAA now has indexCC.X operator

c.Merge(indexFileAA, indexBB) -> X operator is already in there and is skipped.

Either we:

c.Merge the current index file before we handle the includes

Not pass the parent IndexFile to downloadIndexFile and merge the returned file from that call

IF AA -> BB and BB -> CC and BB and CC have X I would expect:

downloadIndexFile(nil, urlAA) -> indexFile for AA has no operator X downloadIndexFile(indexFileAA, urlBB) downloadIndexFile(indexFileBB, urlCC) c.Merge(indexFileBB, indexCC) -> indexFileBB would ignore X from CC (it has one) c.Merge(indexFileAA, indexBB) -> X is merged into AA from BB

the case where

downloadIndexFile(indexFileAA, urlBB) downloadIndexFile(indexFileAA, urlCC)

in your example would be true if AA included BB and CC

pkg/kudoctl/util/repo/repo_operator.go

Signed-off-by: Ken Sipe <kensipe@gmail.com>

kensipe · 2020-04-23T15:46:30Z

@ANeumann82

Regarding the error handling - as we allow the top-level repo to overwrite operators, I think a failure to load one of the includes may be ok. There should be a warning, but I'm not sure it requires an error.

If we want warnings vs errors... I think that should be a separate unit of work... currently any error is reported as an error with the following exceptions: duplicate OV is clogged level 1, repeated url. It seems like we are good here.

We should try to ensure that a failure to load any of the includes leads to a different operator version installed. So, if a repo overwrites a certain OV, then we always a) install that OV or b) error out.

I don't understand what this means... I think the logic around duplicate OV seems correct. The parent or first include takes precedence.

The only situation where I can imagine that happening is:
Repo BB and CC have OperatorVersion X
AA includes BB and CC in this order.

Currently AA without X will include AA.X first... then the CC will be ignored. These are expected to be immutable versioned OVs... the same version should be the same thing is should be safe to ignore. IF it is used in an unintended way... the user can always reorder the includes in the index to get the behavior they desire. Anything else would need to be a user feature request IMO.

Normally, a User would install BB.X. If the repository URL for BB can not be loaded, but CC can, the user would suddenly get CC.X instead of BB.X.

The current behavior is if any url is unreachable or fails... the search / install will fail. That seems good for this PR... anything else seems like it would need to be defined.

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

ANeumann82

Agreed on the error/warning handling. lgtm.

gerred · 2020-04-23T16:36:58Z

Implementation looks good, what's the motivation for this particular implementation rather than a multi-repo approach (see: Linux distribution model, etc)? Or a fully qualified distribution model (see: Docker Hub, NPM). One is a public API change (adding in includes - it's backwards compatible now, but not backwards compatible to new implementations), the other is an internal feature enabling multiple repositories to be used at once.

Do we have a KEP or another conversation to point toward in adding this where we've considered that alternative?

kensipe · 2020-04-23T16:38:25Z

@gerred it was a need expressed by internal staff needing to have a private repo but also wanting full access to updates and changes in the community repo

kensipe · 2020-04-23T16:41:55Z

well done @ANeumann82

gerred · 2020-04-23T16:42:50Z

Awesome, that makes sense. Any public API changes have traditionally gone through a KEP, and this is a public API change. We should still consider the alternatives here that don't necessarily include that. Right now, this flips the relationship I've come to expect from installing packages in other environments (I install "my thing" from "a repo that resolves 'that thing'" given "multiple repos"). That approach is of course fraught with errors, and I think I'm onboard with this PR, but we should still be consistent and considerate of the possible ways we can do this before committing to a change that forces breaking changes in the future (not just to behavior, but API as well). I don't think it should take much time.

gerred · 2020-04-23T17:07:18Z

As discussed and recommended to me, I'm closing this PR and am happy to re-open it once the contributing guidelines are met. I'm really excited for this capability to land so any user can benefit from the whole KUDO library.

kensipe added 3 commits April 19, 2020 14:06

initial step to support multi-repo include

455d5ce

Signed-off-by: Ken Sipe <kensipe@gmail.com>

working version of merge index and include of index files with tests

ca5addf

Signed-off-by: Ken Sipe <kensipe@gmail.com>

adding godoc

7e3484b

Signed-off-by: Ken Sipe <kensipe@gmail.com>

kensipe requested review from alenkacz, gerred, nfnt and zen-dog as code owners April 22, 2020 02:55

kensipe closed this Apr 22, 2020

kensipe reopened this Apr 22, 2020

ANeumann82 requested changes Apr 23, 2020

View reviewed changes

ANeumann82 reviewed Apr 23, 2020

View reviewed changes

pkg/kudoctl/util/repo/repo_operator.go Show resolved Hide resolved

kensipe added 2 commits April 23, 2020 10:16

updated error handling

536c879

Signed-off-by: Ken Sipe <kensipe@gmail.com>

url tracking added

8637e6e

Signed-off-by: Ken Sipe <kensipe@gmail.com>

ANeumann82 added 2 commits April 23, 2020 18:22

Added test case for nested included repos

2335f22

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

Fixed nested repo include

7a0537d

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

ANeumann82 approved these changes Apr 23, 2020

View reviewed changes

kensipe added the release/highlight This PR is a highlight for the next release label Apr 23, 2020

gerred closed this Apr 23, 2020

kensipe reopened this Apr 23, 2020

kensipe added the do-not-merge/hold label Apr 23, 2020

kensipe changed the base branch from master to main June 24, 2020 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Providing Repo Include Feature #1479

Providing Repo Include Feature #1479

kensipe commented Apr 22, 2020

nfnt commented Apr 22, 2020

kensipe commented Apr 22, 2020 •

edited

Loading

nfnt commented Apr 22, 2020

ANeumann82 left a comment

ANeumann82 Apr 23, 2020

kensipe Apr 23, 2020

kensipe Apr 23, 2020

kensipe commented Apr 23, 2020

ANeumann82 left a comment

gerred commented Apr 23, 2020 •

edited

Loading

kensipe commented Apr 23, 2020

kensipe commented Apr 23, 2020

gerred commented Apr 23, 2020 •

edited

Loading

gerred commented Apr 23, 2020

Providing Repo Include Feature #1479

Are you sure you want to change the base?

Providing Repo Include Feature #1479

Conversation

kensipe commented Apr 22, 2020

nfnt commented Apr 22, 2020

kensipe commented Apr 22, 2020 • edited Loading

nfnt commented Apr 22, 2020

ANeumann82 left a comment

Choose a reason for hiding this comment

ANeumann82 Apr 23, 2020

Choose a reason for hiding this comment

kensipe Apr 23, 2020

Choose a reason for hiding this comment

kensipe Apr 23, 2020

Choose a reason for hiding this comment

kensipe commented Apr 23, 2020

ANeumann82 left a comment

Choose a reason for hiding this comment

gerred commented Apr 23, 2020 • edited Loading

kensipe commented Apr 23, 2020

kensipe commented Apr 23, 2020

gerred commented Apr 23, 2020 • edited Loading

gerred commented Apr 23, 2020

kensipe commented Apr 22, 2020 •

edited

Loading

gerred commented Apr 23, 2020 •

edited

Loading

gerred commented Apr 23, 2020 •

edited

Loading