Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Migration for Fideslang 2.0 #4030

Conversation

ThomasLaPiana
Copy link
Contributor

@ThomasLaPiana ThomasLaPiana commented Sep 6, 2023

Builds on top of #3933

Description Of Changes

For ease of PR reviewing, this PR is focused solely on writing and testing a migration for current users to Fideslang 2.0

The following are the objects that need updating:

  • Dataset (categories on fields, collections, and top level
  • Systems.PrivacyDeclarations (data_use, data_categories, shared_categories)
  • Systems.Ingress/Egress (data_categories)
  • Policy.Rules (data_uses, data_categories)
  • DataCategory/DataUse (fides_key, parent_key)
  • PrivacyNotice/PrivacyNoticeHistory/PrivacyNoticeTemplate
  • PrivacyRequest.consent_preferences
  • Consent.data_use
  • ConsentRequest.preferences

Code Changes

  • add a data migration
  • add a testing script for verification (scripts/verify_fideslang_2_data_migration.py)

Steps to Confirm

  • nox -s dev
  • nox -s shell
  • fides user login
  • fides db reset -y ; python scripts/verify_fideslang_2_data_migration.py --reload ; python scripts/verify_fideslang_2_data_migration.py --reload - the script will probably break on System push, so run it again to pass (seriously I'm not sure why, something with the order of how things are getting pushed I guess?)

Pre-Merge Checklist

@ThomasLaPiana ThomasLaPiana self-assigned this Sep 6, 2023
@cypress
Copy link

cypress bot commented Sep 6, 2023

Passing run #4100 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge e7eabcd into b6aecaa...
Project: fides Commit: e52664d927 ℹ️
Status: Passed Duration: 01:06 💡
Started: Sep 13, 2023 7:37 AM Ended: Sep 13, 2023 7:39 AM

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

@ThomasLaPiana ThomasLaPiana mentioned this pull request Sep 6, 2023
17 tasks
@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.05% ⚠️

Comparison is base (b324612) 87.39% compared to head (e7eabcd) 87.35%.
Report is 51 commits behind head on ThomasLaPiana-update-fideslang-200.

Additional details and impacted files
@@                          Coverage Diff                           @@
##           ThomasLaPiana-update-fideslang-200    #4030      +/-   ##
======================================================================
- Coverage                               87.39%   87.35%   -0.05%     
======================================================================
  Files                                     320      319       -1     
  Lines                                   19602    19602              
  Branches                                 2512     2510       -2     
======================================================================
- Hits                                    17132    17124       -8     
- Misses                                   2033     2040       +7     
- Partials                                  437      438       +1     
Files Changed Coverage Δ
.../api/api/v1/endpoints/consent_request_endpoints.py 85.12% <ø> (ø)
.../api/api/v1/endpoints/privacy_request_endpoints.py 90.34% <ø> (ø)
src/fides/api/db/seed.py 89.20% <ø> (ø)
src/fides/api/models/privacy_request.py 96.30% <ø> (ø)
src/fides/api/schemas/privacy_request.py 100.00% <ø> (ø)
.../api/service/connectors/consent_email_connector.py 97.14% <ø> (ø)
...vice/saas_request/saas_request_override_factory.py 100.00% <ø> (ø)
src/fides/api/util/data_category.py 100.00% <ø> (ø)
.../fides/api/api/v1/endpoints/messaging_endpoints.py 92.94% <100.00%> (+0.12%) ⬆️
src/fides/api/models/sql_models.py 98.12% <100.00%> (+0.07%) ⬆️

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pattisdr
Copy link
Contributor

pattisdr commented Sep 6, 2023

Moving my last comment over from the previous PR so it doesn't get lost!


Thanks for your changes @ThomasLaPiana. The migration was crashing so I added some changes here: 9308569 to move this forward, but there's still more work to do. Let's all continue to test and figure out where the gaps are.

Here's some current TODO items I'm seeing after another round of testing:

  • I think this is the biggest potential issue. Migrate custom data uses and custom data categories that build off of the affected 2.0 uses and categories.

    • For example, say I have a custom data use, "user.search_history.work" that I am using on a PrivacyDeclaration. We migrate that to "user.behavior.search_history.work" on the Declaration. However, there is no "DataUse" that matches, and you get errors later when trying to update the System as a result.
  • New column PrivacyDeclaration.shared_categories going out in the current release needs to be migrated now that 2.19 is out. Resurfacing [DANGER] Update to Fideslang 2.0 #3933 (comment)

  • PrivacyNotice, PrivacyNoticeTemplate, and PrivacyNoticeHistory data_uses columns likely need to be migrated. None of our out of the box notices have data uses at risk, but customers could have added custom notices, or added data uses to existing notices.

@pattisdr
Copy link
Contributor

pattisdr commented Sep 6, 2023

@ThomasLaPiana @adamsachs in addition to the above, I wanted to discuss if we also need to migrate Consent.data_use and PrivacyRequest.consent_preferences (array of dictionaries, which possibly has a data_use)

This came up while I was hunting for additional places that use data_use that we are not accounting for in this migration. While we've been working on new-style consent this year, some customers are running our older version of consent, which stores saved consent preferences in the consent table: this has a data_use column. In addition, our default config.json file which populates the consent tab of the privacy center for old-style consent, has improve as a data use. It is possible then, users are saving preferences against improve, which has moved to functional. When we create a privacy request to propagate third party preferences, we cache the relevant consent_preferences on that privacy request so we have a snapshot of the preferences at that point in time, which is why an obsolete data use might be on both records.

We may not need to modify old records. But it seems like the most pressing thing would be to make sure whatever config.json customers are using was updated going forward to not have old data uses in it, which might just be customer outreach. We can talk about specifics internally.

@pattisdr
Copy link
Contributor

pattisdr commented Sep 7, 2023

  • Similarly consentrequest.preferences is a jsonb stores this as well, I think this is more for record-keeping for old-style consent. Similar to privacyrequest.consent_preferences.

@ThomasLaPiana
Copy link
Contributor Author

@pattisdr can you give me some guidance on how to get at those programmatically? I'm really struggling to use the APIs, as the privacy request api needs that information for creation but doesn't return it on a GET

I've updated it directly in the database for Consent.data_use, I'm assuming that is the right move there? How can we test that?

@pattisdr
Copy link
Contributor

pattisdr commented Sep 8, 2023

@ThomasLaPiana

can you give me some guidance on how to get at those programmatically? I'm really struggling to use the APIs, as the privacy request api needs that information for creation but doesn't return it on a GET

This isn't returned in the API, that was intentional! This is more of an internal detail that takes a snapshot of the preferences for the user at that point in time and propagates these to third party systems. Your testing script might just query the privacy request directly with sqlalchemy and verify that the consent_preferences resource has been updated.

I've updated it directly in the database for Consent.data_use, I'm assuming that is the right move there? How can we test that?

Yes, this seems right but I would expect everything to be updated directly, including the PrivacyRequest and ConsentRequest. I'm not sure what you mean by calling out this one location directly! Testing instructions are below. You can use the "Get previously-saved consent preferences" endpoint to check after a migration.

(I achieved this by updating Consent.data_use, which appears to populate this?)

I'm not sure what you mean. This is useful going forward but existing privacy requests still need to be migrated, if we're going to do that. Also, these tables are potentially large - worth pointing this out. This is something we shouldn't be migrating live -

Old Consent Testing Details

Create a Consent Request

curl -X 'POST' \
  'http://localhost:8080/api/v1/consent-request' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
   "email": "user@example.com"
}'
  • This creates a ConsentRequest object, which is largely used for identity verification and tracking a particular request
  • This returns a consent_request_id that you should use in the following request:

Set Consent Preferences

  • This creates a Consent object to save current preferences
  • This also creates a PrivacyRequest to propagate preferences to third party systems and stashes the preferences on this object as well, so we know which set of preferences this request is responsible for propagating. executable_options in the request body is important to make sure a privacy request is created.
  • Also, separately creates a snapshot of the preferences on the ConsentRequest object, because a PrivacyRequest is not necessarily created here, only if we have things to propagate.
curl -X 'PATCH' \
  'http://localhost:8080/api/v1/consent-request/con_bcec68e1-ae0f-494d-8be6-a07d296083bb/preferences' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "code": "1234",
  "consent": [
    {
      "data_use": "improve",
      "opt_in": true,
      "has_gpc_flag": false,
      "conflicts_with_gpc": false
    }
  ],
  "executable_options": [
    {
      "data_use": "improve",
      "executable": true
    }
  ]
}'

Get previously-saved consent preferences

  • Use your consent request id from earlier. The code can be anything when identify verification is turned off.
curl -X 'POST' \
  'http://localhost:8080/api/v1/consent-request/con_bcec68e1-ae0f-494d-8be6-a07d296083bb/verify' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "code": "1234"
}'

SQL Queries for double checking migration

View created privacy request

fides=# select id, status, consent_preferences from privacyrequest;
                    id                    |  status  |                                                     consent_preferences                                                     
------------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------
 pri_b926282f-9679-40d5-9d03-b239ddf6207b | complete | [{"opt_in": true, "data_use": "improve", "has_gpc_flag": false, "conflicts_with_gpc": false, "data_use_description": null}]
(1 row)

View updated consent request

fides=# select id, preferences, privacy_request_id from consentrequest;
                    id                    |                                                         preferences                                                         |            privacy_request_id            
------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------------------
 con_03c926d1-6840-43ba-8757-d5f06590ddf9 |                                                                                                                             | 
 con_bcec68e1-ae0f-494d-8be6-a07d296083bb | [{"opt_in": true, "data_use": "improve", "has_gpc_flag": false, "conflicts_with_gpc": false, "data_use_description": null}] | pri_b926282f-9679-40d5-9d03-b239ddf6207b

View Consent resource

fides=# select id, data_use, opt_in from consent;
                    id                    | data_use | opt_in 
------------------------------------------+----------+--------
 con_2573d315-b6f8-42e2-9018-f18b4322b71c | improve  | t

@ThomasLaPiana
Copy link
Contributor Author

ThomasLaPiana commented Sep 11, 2023

@pattisdr amazing testing directions/context, thank you so much! I'll get these translated into the script and hopefully we'll be off to the races 🙂

Also, what did you mean by "migrating live"? I'm not sure there's any other way? It would happen upon version bump and restart, which means startup would take longer which means longer downtime

@ThomasLaPiana ThomasLaPiana marked this pull request as ready for review September 11, 2023 10:59
@ThomasLaPiana
Copy link
Contributor Author

using the steps to confirm, I got everything passing locally

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great @ThomasLaPiana, I reviewed and have been testing this evening. I appreciate the extra effort on the migration test script to build confidence here: I'll test a little more in the AM. None of the comments below affect the migration itself.

Also, what did you mean by "migrating live"? I'm not sure there's any other way?

I just meant the application isn't up and running, no one is writing or reading from the db while this migration is performed. Just double checking, given the potential size of consent/consent request/privacy request.

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back through the other PR, there's a comment about the RuleTarget table?

https://github.com/ethyca/fides/pull/3933/files#r1309338648

@pattisdr
Copy link
Contributor

pattisdr commented Sep 12, 2023

Working on a commit for RuleTarget migration

@pattisdr pattisdr force-pushed the ThomasLaPiana-fideslang-2-migration branch from ea631b5 to 5f6c4a7 Compare September 12, 2023 18:36
@pattisdr pattisdr force-pushed the ThomasLaPiana-fideslang-2-migration branch from 5f6c4a7 to fe360b2 Compare September 12, 2023 19:00
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! take a look at my RuleTarget work please before merge! I added another small commit for backend tests while I was here too

@ThomasLaPiana
Copy link
Contributor Author

@pattisdr verified the changes! Thank you!

@ThomasLaPiana
Copy link
Contributor Author

failure is a flaky test, merging

@ThomasLaPiana ThomasLaPiana merged commit 3d3a67e into ThomasLaPiana-update-fideslang-200 Sep 13, 2023
35 of 36 checks passed
@ThomasLaPiana ThomasLaPiana deleted the ThomasLaPiana-fideslang-2-migration branch September 13, 2023 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants