Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix security config race condition #1718

Merged
merged 48 commits into from
Nov 15, 2022
Merged

Fix security config race condition #1718

merged 48 commits into from
Nov 15, 2022

Conversation

sanders41
Copy link
Contributor

@sanders41 sanders41 commented Nov 8, 2022

Closes

Code Changes

  • Pre-load the config in the __init__.py
  • Load the security setting before loading the full model
  • init SecuritySettings directly if the section doesn't exist in the toml config file
  • Added _FUNCS.clear() to clear Pydantic's validator cache before loading
  • DO NOT include the security defaults in the fides.toml file

Steps to Confirm

  • Build the container nox -s "build(dev)"
  • docker run -it ethyca/fides:local fails on the missing config items
  • docker run -e FIDES__USER__ANALYTICS_OPT_OUT=True -e FIDES__SECURITY__APP_ENCRYPTION_KEY=kfkdkslaksldkfjdkslaksldkfjskald -e FIDES__SECURITY__OAUTH_ROOT_CLIENT_SECRET=rootclient -e FIDES__SECURITY__OAUTH_ROOT_CLIENT_ID=someclient ethyca/fides:local passes

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation Updated:
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md

Description Of Changes

Fixes the issue where the security config is not properly loading from environment variables.

The root cause of this is Pydantic was delaying the loading of nested models when using environment variables. Because of this if the nested value was accessed before it was loaded an error occurred. The fix was to pre-load the nested model and add it to the parent model at initialization when loading from environment variables.

In doing this pre-loading it turned up a new issue where in some environments (CI for example) Pydantic would through an error that validators were being reused, while in other environments (running locally for example) no error occurred. This seems to occur because of a conflict between server cache and Pydantic's cache. I did try removing our lru_cache to see if that was involved in the issue, but it made no difference. The solution turned out to be clearing Pydantic's cache before loading the model with _FUNCS.clear(). More context for this can be found here.

Other changes in this PR, bumping Pydantic and bringing fideslib code directly in, were done as attempts to solve this Pydantic issue before the cache clearing option was found. In the end they didn't make any difference, but we left here since this change is planned as part of #1572.

@sanders41
Copy link
Contributor Author

@ThomasLaPiana these tests are passing when I run them locally. Any chance the failures could be related to the recent CI updates?

@ThomasLaPiana
Copy link
Contributor

@sanders41 no, the latest main merge shows only 4 tests failing, all known-bad (Shopify errors + the Admin UI Cypress tests)

These are probably from this PR

@ThomasLaPiana
Copy link
Contributor

this seems like a pretty sprawling change, simple but touches a ton of code. I'm sure you've already thought about it, but maybe there's a way to add one or two checks somewhere instead of in every function that might need it? This will get tricky to maintain/revert/document

@sanders41
Copy link
Contributor Author

sanders41 commented Nov 9, 2022

I tried several other ways to do the checks, but mypy and/or loading was failing any other way. I don’t like the way I did this either, but I couldn’t come up with another way.

Troubleshooting the failures is also going to be difficult since they pass locally and the error is with the database config which didn’t change. The only difference I can think of right now is dev vs production builds, but don’t see why that would matter.

@sanders41
Copy link
Contributor Author

sanders41 commented Nov 9, 2022

I'm still struggling to reproduce this error locally. I tried building and testing with the production image and that didn't do it either.

I also tried running with no volumes mounted to be sure it wasn't something in the fides.toml making the difference.

@sanders41
Copy link
Contributor Author

Possibly related nazrulworld/fhir.resources#41. If so it could be a caching issue. It's the same scenario, but in the case of this issue it was with lambda and not github actions.

@NevilleS
Copy link
Contributor

Hey @sanders41, I think you can reproduce this pretty easily by doing the following:

docker run ethyca/fides:local

...and then start layering in minimal ENV variables, like:

docker run -e FIDES__USER__ANALYTICS_OPT_OUT=True -e FIDES__SECURITY__APP_ENCRYPTION_KEY=keygoeshere ethyca/fides:local

@sanders41
Copy link
Contributor Author

This is what I did and it was running without error locally. I got it working both locally and in ci now, but the docs build is failing still. running a test now where I hope I have that fixed also.

@ThomasLaPiana
Copy link
Contributor

im testing this in the plus PR and still seeing the security errors....still digging

@ThomasLaPiana
Copy link
Contributor

If there is a config file, it will fail to init from the environment variables for the security settings

@ThomasLaPiana
Copy link
Contributor

fixed the docs checks, next step is to fix the config tests (they need the required env vars now!)

@ThomasLaPiana
Copy link
Contributor

@NevilleS @sanders41 I want to call out that now, if a user does a pip install ethyca-fides, they will not be able to do anything! until they've set these values, and the failure will happen before we have a chance to tell them

@sanders41
Copy link
Contributor Author

Would deferring the loading of the config help with this? Adding the load to the __init__.py was one of the first things done in trouble shooting and it did help, but there have been a lot of other changes since then so we might be able to get away with removing it.

@ThomasLaPiana
Copy link
Contributor

@sanders41 I think we can cross check this PR and then merge it, and do that UX improvements in another PR as this one is getting big

@TheAndrewJackson
Copy link
Contributor

TheAndrewJackson commented Nov 14, 2022

I think everything is working on my end. It was able to load the config but failed when trying to connect to the db with the provided snippet

docker run -e FIDES__USER__ANALYTICS_OPT_OUT=True -e FIDES__SECURITY__APP_ENCRYPTION_KEY=kfkdkslaksldkfjdkslaksldkfjskald -e FIDES__SECURITY__OAUTH_ROOT_CLIENT_SECRET=rootclient -e FIDES__SECURITY__OAUTH_ROOT_CLIENT_ID=someclient ethyca/fides:local

I think that is expected because no db config is provided so it used default-db. Here is the error for reference

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "default-db" to address: Name or service not known

@ThomasLaPiana
Copy link
Contributor

I got this branch working with the plus PR, so I'm going to merge this. Will immediately open a follow-up ticket around the user experience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants