-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Keep Endpoint policies up to date with license changes #83992
Conversation
4d4a524
to
e56e436
Compare
Pinging @elastic/endpoint-app-team (Feature:Endpoint) |
@elasticmachine merge upstream |
5933185
to
178752e
Compare
} from '../../../../common/license/policy_config'; | ||
import { licenseService } from '../../../lib/license/license'; | ||
|
||
export class PolicyWatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PolicyWatcher
confused me a little because it did not mention that it's really a license watcher. Maybe LicenseWatcher
would be more appropriate?
(ps. I can see this being used by other things in the future that might not be Policy related)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(two hard problems joke here)
This is in server/endpoint/lib/policy
, and it's job is to watch license changes, in order to keep policy up to date. I'm open to better names. LicenseWatcher
is a possibility, but note that our other class, LicenseService
is copy-pasted from about 4 other plugins that have the same class. But some of them call that LicenseWatcher
. Wanted to prevent that confusion.
Also I do agree we may want other side-effects to license changes that may not be policy related. But those should create their own subscription via LicenseService
observable. Not use this. Observables are designed to have anyone subscribe who also needs it, not collectively boggarting off of one sub.
x-pack/plugins/security_solution/server/endpoint/lib/policy/license_watch.ts
Show resolved
Hide resolved
policyConfig, | ||
licenseService | ||
); | ||
this.policyService.update(this.soClient, policy.id, policy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.... Should you be throttling here? You could be generating several calls in parallel/rapid succession.
Also - you need to account for failures here for two reasons:
- to ensure that the remainder of the loop succeeds. You also don't want to cause the outer look (
do{} while()
) to throw, I don't think (not sure what that will do to the observable calling logic). - to account for update conflicts, which may be the case if an update is done right after you retrieve the list of items or by other Kibana instance that may also be doing this same thing. I think we said maybe you do 1 attempt after a failure and then just log it if it still fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an accepted practice for how to throttle? This is certainly could be firing off a lot of update calls.
If I wrap update
in a try/catch, do I now need to await update
to make sure it's actually caught? If I'm now awaiting, do I still need to throttle since they are no longer in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re:
Do we have an accepted practice for how to throttle?
I did a little of this in Fleet using Bluebird - see here
kibana/x-pack/plugins/fleet/server/routes/agent_policy/handlers.ts
Lines 55 to 65 in 95861a0
await bluebird.map( | |
items, | |
(agentPolicy: GetAgentPoliciesResponseItem) => | |
listAgents(soClient, { | |
showInactive: false, | |
perPage: 0, | |
page: 1, | |
kuery: `${AGENT_SAVED_OBJECT_TYPE}.policy_id:${agentPolicy.id}`, | |
}).then(({ total: agentTotal }) => (agentPolicy.agents = agentTotal)), | |
{ concurrency: 10 } | |
); |
Being that you are paging through, you might able to use a variation of that approach (chain each page to do concurrent updates, then await
at the end).
} | ||
} | ||
|
||
public async watch(license: ILicense) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point we had talked about possibly doing the "migration" slightly delayed from when the observable triggers in order to try and account for edge case where requests might be in flight that would override our changes here. Delaying the migration would allow for existing one to complete ++ any future ones would be auto-rejected by the API handles, so migrating should be able to account for all records.
Also - do you need to protect against the license changing while a migration is in flight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I was part of any of these discussions. We want to add a sleep here?
Do we expect other things changing at the same moment the license changes? What's this about migrations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re:
Do we expect other things changing at the same moment the license changes? What's this about migrations?
My understanding is that changing of license can happen while the system is running - there is downtime when it happens. It just happens
re:
What's this about migrations?
Sorry for the confusion. In my mind, what this service here is doing is a migration - thats what I mean.
178752e
to
b9ade22
Compare
b9ade22
to
6113014
Compare
} | ||
} | ||
|
||
public async watch(license: ILicense) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re:
Do we expect other things changing at the same moment the license changes? What's this about migrations?
My understanding is that changing of license can happen while the system is running - there is downtime when it happens. It just happens
re:
What's this about migrations?
Sorry for the confusion. In my mind, what this service here is doing is a migration - thats what I mean.
} catch (e) { | ||
// try again for transient issues | ||
try { | ||
await this.policyService.update(this.soClient, policy.id, policy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the problem encountered as a version conflict error, then you can't just "try" again with the same payload. You will need to retrieve the record again and redo updates, and then use that to update record for the update.
One way to test this with a running kibana:
above line 102, update policy.version
to something like 123
. That update should fail for the version missmatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given our most recent conversation, will you leave this "retry" here or remove it and address it in the subsequent issue?
I'm ok either way
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given our recent off-PR conversation, I'm 👍
Thanks @pzl
💚 Build SucceededMetrics [docs]Distributable file count
History
To update your PR or re-run it, just comment with: |
* master: (40 commits) fix: 🐛 don't add separator befor group on no main items (elastic#83166) [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323) [APM] Add APM agent config options (elastic#84678) Fixed a11y issue on rollup jobs table selection (elastic#84567) [Discover] Refactor getContextUrl to separate file (elastic#84503) [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654) [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749) [Lens] Migrate legacy es client and remove total hits as int (elastic#84340) Improve logging pipeline in @kbn/legacy-logging (elastic#84629) Catch @hapi/podium errors (elastic#84575) [Discover] Unskip date histogram test (elastic#84727) Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791) [Enterprise Search] Fix schema errors button (elastic#84842) [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589) [Maps] Always initialize routes on server-startup (elastic#84806) [Fleet] EPM support to handle uploaded file paths (elastic#84708) [Snapshot Restore] Fix initial policy form state (elastic#83928) Upgrade Node.js to version 14 (elastic#83425) [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992) [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006) ...
* master: (236 commits) fix: 🐛 don't add separator befor group on no main items (elastic#83166) [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323) [APM] Add APM agent config options (elastic#84678) Fixed a11y issue on rollup jobs table selection (elastic#84567) [Discover] Refactor getContextUrl to separate file (elastic#84503) [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654) [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749) [Lens] Migrate legacy es client and remove total hits as int (elastic#84340) Improve logging pipeline in @kbn/legacy-logging (elastic#84629) Catch @hapi/podium errors (elastic#84575) [Discover] Unskip date histogram test (elastic#84727) Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791) [Enterprise Search] Fix schema errors button (elastic#84842) [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589) [Maps] Always initialize routes on server-startup (elastic#84806) [Fleet] EPM support to handle uploaded file paths (elastic#84708) [Snapshot Restore] Fix initial policy form state (elastic#83928) Upgrade Node.js to version 14 (elastic#83425) [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992) [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006) ...
Summary
Upon license downgrade, any features enabled in endpoint policy that are no longer allowed by the change in license tier are reset and/or disabled
Depends on #83972(merged)Checklist
Delete any items that are not applicable to this PR.
For maintainers