Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add management acl token and save token info to file #4126

Merged
merged 7 commits into from
Aug 22, 2022

Conversation

vli11
Copy link
Contributor

@vli11 vli11 commented Aug 16, 2022

If your build fails due to your commit message not passing the build checks, please review the guidelines here: https://github.com/edgexfoundry/edgex-go/blob/main/.github/Contributing.md

PR Checklist

Please check if your PR fulfills the following requirements:

  • I am not introducing a breaking change (if you are, flag in conventional commit message with BREAKING CHANGE: describing the break)
  • I am not introducing a new dependency (add notes below if you are)
  • I have added unit tests for the new feature or bug fix (if not, why?)
  • I have fully tested (add details below) this the new feature or bug fix (if not, why?)
  • I have opened a PR for the related docs change (if not, why?) later

Testing Instructions

New Dependency Instructions (If applicable)

Signed-off-by: Valina Li <valina.li@intel.com>
Signed-off-by: Valina Li <valina.li@intel.com>
Signed-off-by: Valina Li <valina.li@intel.com>
@vli11 vli11 changed the title 3158 feat: add management acl token and save token info to file Aug 16, 2022
@vli11 vli11 added the enhancement New feature or request label Aug 16, 2022
@vli11 vli11 self-assigned this Aug 16, 2022
Signed-off-by: Valina Li <valina.li@intel.com>
Signed-off-by: Valina Li <valina.li@intel.com>
Copy link
Member

@farshidtz farshidtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feature.

I tested the snap. A fresh installation works but an upgrade from an older version fails. I think it boils down to consul bootstrapper failure on an upgrade, preventing the deployment of the management token:
failed to re-setup registry ACL: on 2nd time or later, failed to re-set up agent token: failed to check the agent token persistence: failed to check agent ACL token persistence property with status code= 403: ACL not found

I released a snap from your code to try:

$ snap install edgexfoundry --channel=latest/beta
edgexfoundry (beta) 2.3.0-dev.44 from Canonical✓ installed

# upgrading from 2.3.0-dev.44 to a revision built for this PR
$ snap refresh edgexfoundry --channel=latest/edge/pr-4126
error: cannot perform the following tasks:
- Start snap "edgexfoundry" (3919) services (systemctl command [start snap.edgexfoundry.security-proxy-setup.service] failed with exit status 1: Job for snap.edgexfoundry.security-proxy-setup.service failed because the control process exited with error code.
See "systemctl status snap.edgexfoundry.security-proxy-setup.service" and "journalctl -xeu snap.edgexfoundry.security-proxy-setup.service" for details.
)

Logs (security-proxy-setup):

edgexfoundry.security-proxy-setup[77946]: level=INFO ts=2022-08-17T09:50:08.939690132Z app=security-proxy-setup source=service.go:503 msg="successful to set up route authentication for `core-data`"
edgexfoundry.security-proxy-setup[77946]: level=INFO ts=2022-08-17T09:50:08.94246337Z app=security-proxy-setup source=service.go:382 msg="proxy service for consul has been set up"
edgexfoundry.security-proxy-setup[77946]: level=INFO ts=2022-08-17T09:50:08.942541798Z app=security-proxy-setup source=service.go:161 msg="try to enable service plugin for core-consul"
edgexfoundry.security-proxy-setup[77946]: level=INFO ts=2022-08-17T09:50:09.085600475Z app=security-proxy-setup source=serviceplugin.go:115 msg="successful to get service pluginID of plugin request-transformer for service consul"
edgexfoundry.security-proxy-setup[77946]: level=INFO ts=2022-08-17T09:50:09.102504921Z app=security-proxy-setup source=serviceplugin.go:149 msg="successfully deleted service plugin for service consul"
edgexfoundry.security-proxy-setup[77946]: level=ERROR ts=2022-08-17T09:50:09.10256291Z app=security-proxy-setup source=service.go:163 msg="failed to enable service plugin for core-consul: cannot retrieve Consul access token: access token file /var/snap/edgexfoundry/3919/secrets/consul-acl-token/mgmt_token.json not found"
edgexfoundry.security-proxy-setup[77946]: level=ERROR ts=2022-08-17T09:50:09.102574051Z app=security-proxy-setup source=init.go:53 msg="cannot retrieve Consul access token: access token file /var/snap/edgexfoundry/3919/secrets/consul-acl-token/mgmt_token.json not found"
systemd[1]: snap.edgexfoundry.security-proxy-setup.service: Main process exited, code=exited, status=1/FAILURE

Logs (security-consul-bootstrapper):

systemd[1]: Starting Service for snap application edgexfoundry.security-consul-bootstrapper...
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.759673927Z app=security-bootstrapper source=config.go:391 msg="Loaded service configuration from /var/snap/edgexfoundry/3919/config/security-bootstrapper/res/configuration.toml"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.760007164Z app=security-bootstrapper source=variables.go:352 msg="Variables override of 'StageGate.Registry.ACL.SentinelFilePath' by environment variable: STAGEGATE_REGISTRY_ACL_SENTINELFILEPATH=/var/snap/edgexfoundry/3919/consul/config/consul_acl_done"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.760022312Z app=security-bootstrapper source=variables.go:352 msg="Variables override of 'StageGate.Registry.Host' by environment variable: STAGEGATE_REGISTRY_HOST=localhost"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.760027842Z app=security-bootstrapper source=variables.go:352 msg="Variables override of 'StageGate.Registry.ACL.ManagementTokenPath' by environment variable: STAGEGATE_REGISTRY_ACL_MANAGEMENTTOKENPATH=/var/snap/edgexfoundry/3919/secrets/consul-acl-token/mgmt_token.json"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.760032611Z app=security-bootstrapper source=variables.go:352 msg="Variables override of 'StageGate.Registry.ACL.SecretsAdminTokenPath' by environment variable: STAGEGATE_REGISTRY_ACL_SECRETSADMINTOKENPATH=/var/snap/edgexfoundry/3919/secrets/edgex-consul/admin/token.json"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.760084289Z app=security-bootstrapper source=config.go:551 msg="Using local configuration from file (4 envVars overrides applied)"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.760108324Z app=security-bootstrapper source=command.go:108 msg="Security bootstrapper running setupRegistryACL"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.761013848Z app=security-bootstrapper source=command.go:569 msg="leader [\"127.0.0.1:8300\"] is elected"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.761024778Z app=security-bootstrapper source=command.go:525 msg="found Consul leader to set up ACL"
edgexfoundry.security-consul-bootstrapper[77748]: level=INFO ts=2022-08-17T09:49:36.761111081Z app=security-bootstrapper source=command.go:498 msg="successfully reconstructed bootstrap ACL token from /tmp/edgex/secrets/consul-acl-token/bootstrap_token.json"
edgexfoundry.security-consul-bootstrapper[77748]: level=ERROR ts=2022-08-17T09:49:36.761664071Z app=security-bootstrapper source=init.go:90 msg="failed to re-setup registry ACL: on 2nd time or later, failed to re-set up agent token: failed to check the agent token persistence: failed to check agent ACL token persistence property with status code= 403: ACL not found"
edgexfoundry.security-consul-bootstrapper[77723]: Wed Aug 17 11:49:36 CEST 2022 failed to set up Consul ACL
systemd[1]: snap.edgexfoundry.security-consul-bootstrapper.service: Deactivated successfully.
systemd[1]: Finished Service for snap application edgexfoundry.security-consul-bootstrapper.

@jim-wang-intel jim-wang-intel modified the milestone: Levski Aug 17, 2022
@@ -21,7 +21,7 @@ LogLevel = "DEBUG"
SNIS = [""]
# RequestTimeout for proxy-setup http client caller
RequestTimeout = 10
AccessTokenFile = "/tmp/edgex/secrets/consul-acl-token/bootstrap_token.json"
AccessTokenFile = "/tmp/edgex/secrets/consul-acl-token/mgmt_token.json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the backwards breaking change that @farshidtz talking about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close but not this one i think., it is the one in security-bootstrapper, we need bootstrap acl token reconstructed in the upgrading path. so we should keep that path in the configuration and environment as well...

Signed-off-by: Valina Li <valina.li@intel.com>
@vli11
Copy link
Contributor Author

vli11 commented Aug 17, 2022

@farshidtz I've updated the snapcraft to add back the bootstraptoken env var, can you give it a try again; can you also try the same step from main branch because the 403 statusCode from consul, I wonder if this issue is already there before this PR.

@vli11 vli11 requested a review from farshidtz August 17, 2022 18:32
@codecov-commenter
Copy link

Codecov Report

Merging #4126 (38beb3e) into main (2391845) will decrease coverage by 0.02%.
The diff coverage is 51.51%.

@@            Coverage Diff             @@
##             main    #4126      +/-   ##
==========================================
- Coverage   43.61%   43.58%   -0.03%     
==========================================
  Files         120      120              
  Lines       10630    10657      +27     
==========================================
+ Hits         4636     4645       +9     
- Misses       5607     5617      +10     
- Partials      387      395       +8     
Impacted Files Coverage Δ
...urity/bootstrapper/command/setupacl/aclpolicies.go 70.66% <ø> (ø)
.../security/bootstrapper/command/setupacl/command.go 71.16% <33.33%> (-0.71%) ⬇️
...ecurity/bootstrapper/command/setupacl/acltokens.go 70.00% <52.68%> (-3.52%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@farshidtz farshidtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. I no longer get errors in consul-bootstrapper but proxy-setup still fails for the same reason:

edgexfoundry.security-proxy-setup[22474]: level=ERROR ts=2022-08-18T12:46:55.172278192Z app=security-proxy-setup source=service.go:163 msg="failed to enable service plugin for core-consul: cannot retrieve Consul access token: access token file /var/snap/edgexfoundry/3922/secrets/consul-acl-token/mgmt_token.json not found"
edgexfoundry.security-proxy-setup[22474]: level=ERROR ts=2022-08-18T12:46:55.17228788Z app=security-proxy-setup source=init.go:53 msg="cannot retrieve Consul access token: access token file /var/snap/edgexfoundry/3922/secrets/consul-acl-token/mgmt_token.json not found"

Following the logs, it looks like consul-bootstrapper doesn't get to generate the management token on upgrade. I believe this is because sentinelFile already exists and he function returns (command.go:131) before calling saveACLTokens (command.go:161) which saves the management token (command.go:256).

The last thing that consul-bootstrapper does is to log "setupRegistryACL successfully done" (command.go:129) and return:

...
edgexfoundry.security-consul-bootstrapper[19087]: level=INFO ts=2022-08-18T12:43:09.516111643Z app=security-bootstrapper source=aclroles.go:129 msg="successfully created a role [app-rules-engine] for secretstore"
edgexfoundry.security-consul-bootstrapper[19087]: level=INFO ts=2022-08-18T12:43:09.517119586Z app=security-bootstrapper source=aclroles.go:129 msg="successfully created a role [device-rfid-llrp] for secretstore"
edgexfoundry.security-consul-bootstrapper[19087]: level=INFO ts=2022-08-18T12:43:09.517140195Z app=security-bootstrapper source=command.go:129 msg="setupRegistryACL successfully done"
systemd[1]: snap.edgexfoundry.security-consul-bootstrapper.service: Deactivated successfully.
systemd[1]: Finished Service for snap application edgexfoundry.security-consul-bootstrapper.

The reSetup function called before return doesn't call saveACLTokens either.

@jim-wang-intel
Copy link
Contributor

Thanks for the change. I no longer get errors in consul-bootstrapper but proxy-setup still fails for the same reason:

edgexfoundry.security-proxy-setup[22474]: level=ERROR ts=2022-08-18T12:46:55.172278192Z app=security-proxy-setup source=service.go:163 msg="failed to enable service plugin for core-consul: cannot retrieve Consul access token: access token file /var/snap/edgexfoundry/3922/secrets/consul-acl-token/mgmt_token.json not found"
edgexfoundry.security-proxy-setup[22474]: level=ERROR ts=2022-08-18T12:46:55.17228788Z app=security-proxy-setup source=init.go:53 msg="cannot retrieve Consul access token: access token file /var/snap/edgexfoundry/3922/secrets/consul-acl-token/mgmt_token.json not found"

Following the logs, it looks like consul-bootstrapper doesn't get to generate the management token on upgrade. I believe this is because sentinelFile already exists and he function returns (command.go:131) before calling saveACLTokens (command.go:161) which saves the management token (command.go:256).

The last thing that consul-bootstrapper does is to log "setupRegistryACL successfully done" (command.go:129) and return:

...
edgexfoundry.security-consul-bootstrapper[19087]: level=INFO ts=2022-08-18T12:43:09.516111643Z app=security-bootstrapper source=aclroles.go:129 msg="successfully created a role [app-rules-engine] for secretstore"
edgexfoundry.security-consul-bootstrapper[19087]: level=INFO ts=2022-08-18T12:43:09.517119586Z app=security-bootstrapper source=aclroles.go:129 msg="successfully created a role [device-rfid-llrp] for secretstore"
edgexfoundry.security-consul-bootstrapper[19087]: level=INFO ts=2022-08-18T12:43:09.517140195Z app=security-bootstrapper source=command.go:129 msg="setupRegistryACL successfully done"
systemd[1]: snap.edgexfoundry.security-consul-bootstrapper.service: Deactivated successfully.
systemd[1]: Finished Service for snap application edgexfoundry.security-consul-bootstrapper.

The reSetup function called before return doesn't call saveACLTokens either.

Thanks @farshidtz for the detailed information. So it looks like the snap's upgrading route has some special case. The reason reSetup func called in the 2nd time or later that doesn't call saveACLTokens is because the token should already be saved from the 1st time and also Consul would fail if we call bootstrap ACL again.

So it seems to me that the right logic for reSetup is to also check whether management token is existing or not and create it and save it if not while skipping the bootstrapping the consul ACL again.

@farshidtz
Copy link
Member

So it looks like the snap's upgrading route has some special case. The reason reSetup func called in the 2nd time or later that doesn't call saveACLTokens is because the token should already be saved from the 1st time and also Consul would fail if we call bootstrap ACL again.

Is this different from how you'd upgrade a native or dockerized deployment? In case of an upgrade, the 1st run is from an older version that doesn't generate the newly introduced management token. Currently, the second and follow up runs don't generate it either. As you pointed out, the second and follow up runs need to save the management token if it doesn't exist:

So it seems to me that the right logic for reSetup is to also check whether management token is existing or not and create it and save it if not while skipping the bootstrapping the consul ACL again.

Signed-off-by: Valina Li <valina.li@intel.com>
@sonarcloud
Copy link

sonarcloud bot commented Aug 18, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@vli11
Copy link
Contributor Author

vli11 commented Aug 18, 2022

@farshidtz thank you for testing and feedback! I have added create and save management acl token logic during reSetup now, can you verify it. Thanks!

Copy link
Member

@farshidtz farshidtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@farshidtz thank you for testing and feedback! I have added create and save management acl token logic during reSetup now, can you verify it. Thanks!

Thanks for the change @vli11. I tested again and both a fresh install and upgrade work.

For the upgrade, I needed to merge the changes from #4125 into your branch to prevent another upgrade error. That PR is waiting for committer approval to be merged.

Copy link
Collaborator

@bnevis-i bnevis-i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a regression in the /consul Kong route. Discussing offline.

@bnevis-i
Copy link
Collaborator

LTGM. Requires simultaneous change to edgex-compose to avoid breaking /consul Kong route. Valina to link to associated PR.

@vli11
Copy link
Contributor Author

vli11 commented Aug 19, 2022

here is the edgex-compose PR related to this fix: edgexfoundry/edgex-compose#270

@vli11
Copy link
Contributor Author

vli11 commented Aug 19, 2022

I have verified the consul kong route in docker environment with edgex-compose PR(edgexfoundry/edgex-compose#270), @farshidtz can you help to verify consul kong route on SNAP environment? thank you!

@farshidtz
Copy link
Member

farshidtz commented Aug 22, 2022

I have verified the consul kong route in docker environment with edgex-compose PR(edgexfoundry/edgex-compose#270), @farshidtz can you help to verify consul kong route on SNAP environment? thank you!

I tried and could access Consul via Kong at /consul path. I could access core-data's kv entry (e.g. /consul/v1/kv/edgex/core/2.0/core-data/Service/Port) using either of bootstrap and management tokens. Is that expected?

I added a docs PR to use the management token in Snap's getting started: edgexfoundry/edgex-docs#841

@vli11
Copy link
Contributor Author

vli11 commented Aug 22, 2022

I have verified the consul kong route in docker environment with edgex-compose PR(edgexfoundry/edgex-compose#270), @farshidtz can you help to verify consul kong route on SNAP environment? thank you!

I tried and could access Consul via Kong at /consul path. I could access core-data's kv entry (e.g. /consul/v1/kv/edgex/core/2.0/core-data/Service/Port) using either of bootstrap and management tokens. Is that expected?

I added a docs PR to use the management token in Snap's getting started: edgexfoundry/edgex-docs#841

great! thank you!

@bnevis-i bnevis-i merged commit 96e03a3 into edgexfoundry:main Aug 22, 2022
@bnevis-i
Copy link
Collaborator

@farshidtz Thanks for all the great validation work.

@bnevis-i
Copy link
Collaborator

@farshidtz For reference, the token used by Kong can update any part of the KV store, but can't disclose any of the other tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request levski fall 2022 release security-services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Secure Consul Ph. 3] Create a global Consul token (stored in a file) for use in remote configuration
6 participants