Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing Repo Include Feature #1479

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Providing Repo Include Feature #1479

wants to merge 7 commits into from

Conversation

kensipe
Copy link
Member

@kensipe kensipe commented Apr 22, 2020

This provides the ability for 1 repo index to include another (or multiple). This changes in code is backward compatible... if the new field "include:" is missing, it works fine.

The new index file might look like:

apiVersion: v1
entries:
  flink:
    - appVersion: 0.7.0
      name: flink
      operatorVersion: 0.3.0
      urls:
        - http://kudo.dev/flink
includes:
  - https://kudo-repository.storage.googleapis.com/

The "includes" is a list of urls... for testing file locations are also possible.
It is designed and tested so that duplicates are ignored and the root / parent repo entries take precedence. The includes are recursive so repoA could include repoB which includes repoC.

The value of this model is that a private repo user today is required to maintain all the versions of all the entries in the community repo in their private repo which is a burden. This will allow a private repo to have 1 entry of their operator and reference the community repo (with all it's updates and changes).

For the user searching or installing there is no perceived difference.

There are 2 things left todo (which are marked with todo in code.

  1. Error handling... I am hoping that we will agree that errors of includes will be ignored / clogged ... if a connection is down we don't want the whole operation to not be useful... although this creates odd edge cases.
  2. I want to pass a map to the recursive function and ignore (with clog messages) duplicate entries or urls which have already been processed... this is mainly to prevent infinite loops... but it will also be more efficient.

Signed-off-by: Ken Sipe <kensipe@gmail.com>
Signed-off-by: Ken Sipe <kensipe@gmail.com>
@nfnt
Copy link
Member

nfnt commented Apr 22, 2020

Some high level comments based on the description, didn't look into the code yet:

  1. I think that parent repo entries shouldn't take precedence. Repos including other repos should be able to override existing operators. This will allow private repo users to add or customize features of existing operators. Also this makes it easier for operator developers to test deploy possible changes.
  2. Errors are an interesting topic. This should be closely related to the expectations users a repo that includes another repo should have: If we treat the include like a merge, then we should return an error if the included repo isn't found. Because if we wouldn't, we would only take the local repo into account for installing or searching packages and that could result in unexpected results.

@kensipe
Copy link
Member Author

kensipe commented Apr 22, 2020

@nfnt thanks for jumping in and providing early feedback! I'm confused by your comment 1... the fact that the parent repo takes precedence provides the value the rest of the paragraph expresses. If repo A includes repo B... if repo B has operator CC, repo A would automatically appear like it has CC.. but also repo A can create it's own CC and custom it and now access to repo A for CC will provide its operator... the operator CC from repo B will not be "merged" or "seen" by clients to repo A.

Regarding comment 2... yeah... I'm mixed about this... I'm leaning the same way. I'm also thinking we add a flag to ignore the errors... but perhaps that is premature

@kensipe kensipe closed this Apr 22, 2020
@kensipe kensipe reopened this Apr 22, 2020
@nfnt
Copy link
Member

nfnt commented Apr 22, 2020

Okay, your example is exactly what I meant. If repo A includes repo B and provides its custom version of an operator that's already in repo B, then the operator from repo A takes precedence.

Copy link
Member

@ANeumann82 ANeumann82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it so far.

Regarding the error handling - as we allow the top-level repo to overwrite operators, I think a failure to load one of the includes may be ok. There should be a warning, but I'm not sure it requires an error.

We should try to ensure that a failure to load any of the includes leads to a different operator version installed. So, if a repo overwrites a certain OV, then we always a) install that OV or b) error out.

The only situation where I can imagine that happening is:
Repo BB and CC have OperatorVersion X
AA includes BB and CC in this order.

Normally, a User would install BB.X. If the repository URL for BB can not be loaded, but CC can, the user would suddenly get CC.X instead of BB.X.

Comment on lines 94 to 102
nextIndex, err := c.downloadIndexFile(indexFile, iURL)
if err != nil {
return nil, err
}
if parent != nil {
c.Merge(parent, nextIndex)
} else {
c.Merge(indexFile, nextIndex)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block works incorrectly for nested includes.

Assuming we have three repos that have nested includes: AA includes BB, BB includes CC.
BB and CC contain operator X.

The final index should contain BB.X, if I understand correctly.

With this code, we would:

  • downloadIndexFile(nil, urlAA) -> indexFile for AA has no operator X
  • downloadIndexFile(indexFileAA, urlBB)
  • downloadIndexFile(indexFileAA, urlCC)
  • c.Merge(indexFileAA, indexCC) -> indexFileAA now has indexCC.X operator
  • c.Merge(indexFileAA, indexBB) -> X operator is already in there and is skipped.

Either we:

  • c.Merge the current index file before we handle the includes
  • Not pass the parent IndexFile to downloadIndexFile and merge the returned file from that call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF AA -> BB and BB -> CC and BB and CC have X I would expect:

downloadIndexFile(nil, urlAA) -> indexFile for AA has no operator X
downloadIndexFile(indexFileAA, urlBB)
downloadIndexFile(indexFileBB, urlCC)
c.Merge(indexFileBB, indexCC) -> indexFileBB would ignore X from CC (it has one)
c.Merge(indexFileAA, indexBB) -> X is merged into AA from BB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case where

downloadIndexFile(indexFileAA, urlBB)
downloadIndexFile(indexFileAA, urlCC)

in your example would be true if AA included BB and CC

Signed-off-by: Ken Sipe <kensipe@gmail.com>
Signed-off-by: Ken Sipe <kensipe@gmail.com>
@kensipe
Copy link
Member Author

kensipe commented Apr 23, 2020

@ANeumann82

Regarding the error handling - as we allow the top-level repo to overwrite operators, I think a failure to load one of the includes may be ok. There should be a warning, but I'm not sure it requires an error.

If we want warnings vs errors... I think that should be a separate unit of work... currently any error is reported as an error with the following exceptions: duplicate OV is clogged level 1, repeated url. It seems like we are good here.

We should try to ensure that a failure to load any of the includes leads to a different operator version installed. So, if a repo overwrites a certain OV, then we always a) install that OV or b) error out.

I don't understand what this means... I think the logic around duplicate OV seems correct. The parent or first include takes precedence.

The only situation where I can imagine that happening is:
Repo BB and CC have OperatorVersion X
AA includes BB and CC in this order.

Currently AA without X will include AA.X first... then the CC will be ignored. These are expected to be immutable versioned OVs... the same version should be the same thing is should be safe to ignore. IF it is used in an unintended way... the user can always reorder the includes in the index to get the behavior they desire. Anything else would need to be a user feature request IMO.

Normally, a User would install BB.X. If the repository URL for BB can not be loaded, but CC can, the user would suddenly get CC.X instead of BB.X.

The current behavior is if any url is unreachable or fails... the search / install will fail. That seems good for this PR... anything else seems like it would need to be defined.

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Copy link
Member

@ANeumann82 ANeumann82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the error/warning handling. lgtm.

@gerred
Copy link
Member

gerred commented Apr 23, 2020

Implementation looks good, what's the motivation for this particular implementation rather than a multi-repo approach (see: Linux distribution model, etc)? Or a fully qualified distribution model (see: Docker Hub, NPM). One is a public API change (adding in includes - it's backwards compatible now, but not backwards compatible to new implementations), the other is an internal feature enabling multiple repositories to be used at once.

Do we have a KEP or another conversation to point toward in adding this where we've considered that alternative?

@kensipe
Copy link
Member Author

kensipe commented Apr 23, 2020

@gerred it was a need expressed by internal staff needing to have a private repo but also wanting full access to updates and changes in the community repo

@kensipe
Copy link
Member Author

kensipe commented Apr 23, 2020

well done @ANeumann82

@gerred
Copy link
Member

gerred commented Apr 23, 2020

Awesome, that makes sense. Any public API changes have traditionally gone through a KEP, and this is a public API change. We should still consider the alternatives here that don't necessarily include that. Right now, this flips the relationship I've come to expect from installing packages in other environments (I install "my thing" from "a repo that resolves 'that thing'" given "multiple repos"). That approach is of course fraught with errors, and I think I'm onboard with this PR, but we should still be consistent and considerate of the possible ways we can do this before committing to a change that forces breaking changes in the future (not just to behavior, but API as well). I don't think it should take much time.

@kensipe kensipe added the release/highlight This PR is a highlight for the next release label Apr 23, 2020
@gerred
Copy link
Member

gerred commented Apr 23, 2020

As discussed and recommended to me, I'm closing this PR and am happy to re-open it once the contributing guidelines are met. I'm really excited for this capability to land so any user can benefit from the whole KUDO library.

@gerred gerred closed this Apr 23, 2020
@kensipe kensipe reopened this Apr 23, 2020
@kensipe kensipe changed the base branch from master to main June 24, 2020 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold release/highlight This PR is a highlight for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants