Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support anonymous reads and authenticated writes #381

Closed
joeljeske opened this issue Dec 2, 2020 · 17 comments
Closed

Support anonymous reads and authenticated writes #381

joeljeske opened this issue Dec 2, 2020 · 17 comments

Comments

@joeljeske
Copy link
Contributor

It appears that when using htpasswd, one can only enforce auth for r/w or not at all. I would like to enforce auth when writing, but allow for anonymous reads.

@mostynb
Copy link
Collaborator

mostynb commented Dec 2, 2020

The topic of allowing some clients read-only access has come up in the past, but I haven't had a concrete-enough use case to spend time investigating such a feature yet.

However it might actually be possible to set something like this up with the httpproxy backend: use a primary bazel-remote instance with large storage and authentication, and a secondary bazel-remote instance with --num_uploaders 0 and without authentication, that uses the primary bazel-remote instance as a http proxy backend. I think bazel-remote currently lacks a command line flag to authenticate with a backend http server, but perhaps you could use a config file and specify a username/password in the httproxy server URL like https://user:pass@hostname/whatever/.

@joeljeske
Copy link
Contributor Author

joeljeske commented Dec 2, 2020

That is an interesting approach that does sound like it could work today. I hadn't thought of that, although Im not sure if I like that.

I think it would be a common setup where:

  • CI agents populate and r/w from a remote cache. These agents are trusted to write to the cache
  • Dev machines are setup to pull from the cache (in lieu of remote exec) for a speed increase, but they are not trusted to arbitrarily write to the cache (or at least the /ac/*)

What do you think of that setup?

Presumably, a flag could be introduced that only checks for auth when the http method is not GET and that would satisfy the requirement?

@mostynb
Copy link
Collaborator

mostynb commented Dec 2, 2020

Maybe there's an even simpler proxy solution- run one bazel-remote instance with authentication, and setup a simple unauthenticated http proxy that forwards GET and HEAD requests (and maybe also cas PUT requests) to it and rejects everything else. This could be a starting point (add some checks in the handler): https://gist.github.com/JalfResi/6287706 - I think it would be worth investigating this modular setup a bit before considering adding this feature to bazel-remote.

Of course you can probably also setup nginx or apache to do this kind of proxying, but they have a lot more configuration options you would probably need to understand.

Some questions spring to mind, if we were to consider adding this as a feature:

  • Should unauthenticated clients be allowed to impact the LRU index when reading items?
  • Are there any existing clients that have a don't-write-to-the-cache mode? Or would the server need to pretend to accept uploads?
  • Should untrusted clients be allowed to upload CAS items? (I guess this is a question of how much you trust/distrust these clients.)
  • Why are your dev machines less reliable than your CI machines? Is it worth spending time improving the hermeticity of your build so that you can trust dev machines?

@joeljeske
Copy link
Contributor Author

joeljeske commented Dec 2, 2020

Yes, I think a unauthenticated proxy could work fine, however I still think this is a valid and potentially common use case.

Should unauthenticated clients be allowed to impact the LRU index when reading items?

I would think they would impact the LRU index, as they are still being leveraged even if anonymously.

Are there any existing clients that have a don't-write-to-the-cache mode? Or would the server need to pretend to accept uploads?

I am not aware of any clients like this. I think it would negatively affect performance as the bandwidth would likely still be consumed. If one wanted this setup without a proper client, I would assume they would be willing to setup their own proxy like you are discussing to achieve this behavior.

Should untrusted clients be allowed to upload CAS items? (I guess this is a question of how much you trust/distrust these clients.)

Personally, a CAS is a CAS and it should be fine to upload to by anyone, however, a malicious client could lead to the emptying of cache items by populating unused CAS items. But it would probably be fine to allow for this anyway. (Does Bazel-remote validate blobs sent to /cas? or is the client expected to do so upon reading?)

Why are your dev machines less reliable than your CI machines? Is it worth spending time improving the hermeticity of your build so that you can trust dev machines?

They are not less reliable, however, the concern would be of the population of entries to /ac that could lead the CI/CD servers to use maliciously uploaded blobs to the cache. If crafted properly, a CI build could use/execute any random blob that a user might choose to craft. That is why I would protect the service user of the CI agents but still allow devs to read from the populated cache. I would love to be told that my fears are unfounded, as that would simplify things, but currently my understanding is this.

@mostynb
Copy link
Collaborator

mostynb commented Dec 2, 2020

Yes, I think a unauthenticated proxy could work fine, however I still think this is a valid and potentially common use case.

It's definitely a valid use case. I have no idea how common it is.

I'm pretty focused on the compressed-blobs feature at the moment, but once that lands I could look into this feature (if you would like to work on it, contributions are welcome of course). In the meantime, if you try the proxy hacks and find one that works, we could add notes to the README file.

Are there any existing clients that have a don't-write-to-the-cache mode? Or would the server need to pretend to accept uploads?

I am not aware of any clients like this. I think it would negatively affect performance as the bandwidth would likely still be consumed. If one wanted this setup without a proper client, I would assume they would be willing to setup their own proxy like you are discussing to achieve this behavior.

I was mostly curious if there are any clients that have a read-only cache mode. It looks like bazel does, with
--remote_upload_local_results=false. It would make sense to reject uploads in that case.

Should untrusted clients be allowed to upload CAS items? (I guess this is a question of how much you trust/distrust these clients.)

Personally, a CAS is a CAS and it should be fine to upload to by anyone, however, a malicious client could lead to the emptying of cache items by populating unused CAS items. But it would probably be fine to allow for this anyway. (Does Bazel-remote validate blobs sent to /cas? or is the client expected to do so upon reading?)

bazel-remote validates CAS items on upload- it's required by the REAPI spec, so other gRPC REAPI cache implementations should check this too.

Why are your dev machines less reliable than your CI machines? Is it worth spending time improving the hermeticity of your build so that you can trust dev machines?

They are not less reliable, however, the concern would be of the population of entries to /ac that could lead the CI/CD servers to use maliciously uploaded blobs to the cache. If crafted properly, a CI build could use/execute any random blob that a user might choose to craft. That is why I would protect the service user of the CI agents but still allow devs to read from the populated cache. I would love to be told that my fears are unfounded, as that would simplify things, but currently my understanding is this.

Ah, I was thinking about this more from a reliability point of view. Your understanding is correct- the action cache is certainly more vulnerable since there's no way to check if the entries have been tampered with.

Can you describe your threat model a bit more? You have trusted CI/infra, and want untrusted clients to benefit from the cache but prevent them from injecting malicious files for CI? Do you care about untrusted clients injecting malicious files for other untrusted clients? Do you care about DoS attacks (either raw traffic, or trashing the LRU/cache efficiency)? What does your network layout look like?

Note that I do not consider bazel-remote to be security-hardended enough to expose it on public internet, even with TLS and authentication.

@joeljeske
Copy link
Contributor Author

The threat here is an employee/developer at an enterprise from tricking a CI server into using a falsified blob as an artifact that could presumably be sent to production.

We want to allow blobs created by CI to be used by developers as a cache, but we do not want developers to be able to send PUTs to a cache and corrupt the cache.

The network itself would all be secured and behind firewalls/vpn and not exposed directly to the internet. TLS would be enabled and authentication enabled for the agents that write to the cache. I am not concerned about DoS attacks in this scenario.

@cheister
Copy link

cheister commented Mar 2, 2021

I have the same setup and would be interested in a read-only option for developers and write option only from trusted servers as well. Would it work to have a configuration option to have an optional read-only port separate from the other ports?

@mostynb
Copy link
Collaborator

mostynb commented Mar 2, 2021

I will probably start working on this soon.

Using a different address/port for read-write and readonly access is an option, but I think I would prefer a solution that doesn't require external configuration like firewalls/VLANs/etc.

@mostynb
Copy link
Collaborator

mostynb commented Mar 14, 2021

I have started looking into this...

It seems fairly easy to support on a single port/address with basic authentication, but I'm unsure how to do this for mTLS authentication. @Mythra: any tips?

If it's not possible to support unauthenticated read-only access + authenticated read-write access on the same port, then maybe adding a separate read-only port is the way to go after all. (Silly me, @cheister's suggestion wouldn't require external firewall/VLAN configuration.)

@Mythra
Copy link
Contributor

Mythra commented Mar 14, 2021

Hey! So if you wanted to tackle mTLS authenticated, but read only I'd imagine you'd have a separate intermediate/root certificate you'd validate against.

In this case from the codes point of view you'd have two "CA files". A certain CA file would validate write connections, the other CA file would validate read only connections. (These CA files work as they should now, they'll be either two separate root CAs, or much more likely in this case both CA files will contain two certs. The same root but a separate intermediate underneath).

Implementation wise though, I'd think you'd be forced to use a separate port for gRPC. I don't think it gives you enough information to say "I validated with this particular cert chain for this connection". Each port would be set up with its own unique ca cert pool. The write port would only accept connections from your first CA file, the read port would accept connections if either CA file signed it. Then just depending on your connected port you would get read or write only.

@Mythra
Copy link
Contributor

Mythra commented Mar 14, 2021

Oh silly me! You mentioned unauthenticated read only. I don't think grpc-go gives you that level of fine tunedness today. I believe there is a permissive mode where it accepts both mTLS and not, but then I don't know of a way to access if the connection you're on actually used mTLS. (Note: I may just not being seeing the docs in grpc-go I haven't spent too much time with it.)

@mostynb
Copy link
Collaborator

mostynb commented Mar 14, 2021

Thanks- I'm not seeing a easy solution with grpc-go either. It looks like grpc methods can extract a Peer from the context, and type-assert that to a credentials.TLSInfo, but then I'm a bit unsure what to look at in there.

So I'm leaning towards just using additional addresses/ports to keep things simple.

@mostynb
Copy link
Collaborator

mostynb commented Mar 14, 2021

https://jbrandhorst.com/post/grpc-auth/ has some details on how to achieve this with a single port/address.

@ulrfa
Copy link
Contributor

ulrfa commented Mar 15, 2021

Separate ports could allow additional use cases such:

  • HTTP for unauthenticated read-only access, but HTTPS for authenticated write access.
  • HTTP unauthenticated both for read-only access and write access, but limit exposure of the write access port via firewall.

@mostynb
Copy link
Collaborator

mostynb commented Mar 17, 2021

Separate ports could allow additional use cases such:

  • HTTP for unauthenticated read-only access, but HTTPS for authenticated write access.
  • HTTP unauthenticated both for read-only access and write access, but limit exposure of the write access port via firewall.

I'm not sure how that would look with regards to command line args/config file settings, but it sounds complicated.

mostynb added a commit to mostynb/bazel-remote that referenced this issue Mar 22, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
@mostynb
Copy link
Collaborator

mostynb commented Mar 22, 2021

Here's a candidate PR, feedback welcome: #412

mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 12, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 23, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 24, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 24, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 26, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 27, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue May 1, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue May 2, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue May 5, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
mostynb added a commit to mostynb/bazel-remote that referenced this issue May 5, 2021
If authentication is enabled, this new flag allows readonly
access for unauthenticated clients. Technically, some of the
"readonly" API calls do modify the cache (by affecting the
LRU index, or by causing blobs to be downloaded from proxy
backends), but they do not add or modify ActionCache blobs
so cannot inject new data into the cache.

Implements buchgr#381.
@mostynb
Copy link
Collaborator

mostynb commented May 7, 2021

This feature is now available on the master branch, but not in a released version yet.

@mostynb mostynb closed this as completed May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants