Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistable state migrations #63358

Closed
stacey-gammon opened this issue Apr 13, 2020 · 14 comments
Closed

Persistable state migrations #63358

stacey-gammon opened this issue Apr 13, 2020 · 14 comments
Labels
discuss impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. NeededFor:Dashboard NeededFor:Security Project:TimeToVisualize Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@stacey-gammon
Copy link
Contributor

stacey-gammon commented Apr 13, 2020

The migration part of #62950

Background

Pluggable state from registry items can often end up stored inside saved objects, for instance:

  • Embeddables
  • Expression strings
  • Url generators
  • Alerts? (cc @bmcconaghy - has the alerting team thought about what to if an author of a custok alert needs to migrate data?)

We need a formal way for authors of registry implementations to add migrations for this data. Since they may not own the saved object that their implementation ends up in, we can't rely on saved object migrations as it exists today (not to mention data that ends up in URLs that may need to be migrated).

Migrating state shared among all implementations

Only developers that registered a new saved object type can write migrations for them, but we've made exceptions, for example when the security team needs to migrate saved object ids and add a "spaces" array to every saved object.

If a team internal to Elastic needs to migrate the base saved object shape, they can do so, but this is not pluggable - no third party developer outside Elastic would be able to do this. Because the saved object extension point doesn't support mutating saved objects registered by other authors, the security team will code directly to handle this (cc @legrego in case I am misrepresenting that).

Can we extend our existing saved object migration system to work off non-saved object state?

Maybe, but there are considerations.

Consideration 1: Officially supporting mutating state from other registry items.

Mutating the base saved object shape is a rare exception, but this may be much more common with certain registry items. For example, Embeddables and Dynamic Drilldowns.

Drilldowns are using the Enhancement Pattern to enhance the basic Embeddable type. A user implementing an Embeddable doesn't know about this extra data stored on its Embeddable, on input.enhancements.events. Only the Drilldown plugin does. The drilldown plugin is going to use another registry, so the drilldown plugin needs to check for any existing migrations that it needs to run - on every embeddable.

For example:

TodoEmbedableInput v1

 {
   title: 'My todo item',
   events: [{
     actionId: 'SayHelloAction',
     triggerId: 'ApplyFilterTrigger',
     serializedActionData: "{name: 'Mary'}"
   }]
  }

TodoEmbedableInput v2

 {
   title: 'My todo item',
   events: [{
     actionId: 'SayHelloAction',
     triggerId: 'ApplyFilterTrigger',
     serializedActionData: "{firstName: 'Mary', lastName: ''}"
   }]
  }

Consideration 2: When storing the version id with the data is not ideal

It's no problem throwing in an extra version number with saved object state, nor Embeddable state, but this is more awkward for expressions where the data that is being saved is a string.

Version 1 expression string:
sayHello name="Mary" | render as="Debug"

Version 2, expression string migrated:
sayHello firstName="Mary" | render as="Debug"

If someone copies this string from version 1, and then pastes it into version 2, should we support it? We can't unless we store the version somewhere.

I haven't fully convinced myself we can get away with storing the version per expression, vs needing to store it per expression function (the expression fn is the registry item, not the string, but maybe we could bump version numbers simultaneously for every function??)

The original idea: using ids as the version number

I originally thought we could get away with using the id of the registry item for the version number to avoid needing an additional piece of data, but we need to support the ability for at minimum, an elastic developer writing migrations for every embeddable, even ones they down own (to support the case of embeddables and events above). I don't think we can do that with just ids - the drilldown author can't create a new id for every registry implementation and bump the types (remember the developer that created drilldowns can't know every developer that registered a custom embeddable implementation - these plugins may be outside the kibana repo).

I think we will need to use version ids after all, but we will also need to figure out how to do this with expressions. Maybe an official "copy expression string" could inject version numbers or something.

tl;dr; a sufficient system will not be as simple as I was originally thinking, and how I first implemented with Url Generators.

We should also sync with the platform team and learn why there is no official support for plugin developers mutating shared state, and if there is a way to add support for that for the drilldown use case. cc @rudolf

cc @lukeelmers @ppisljar @timroes

Tagging both platform and app arch as we may want to see if we can re-use the same system, since we'll probably run into similar issues.

Related issues

@stacey-gammon stacey-gammon added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:AppArch labels Apr 13, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@rudolf
Copy link
Contributor

rudolf commented Apr 14, 2020

We should also sync with the platform team and learn why there is no official support for plugin developers mutating shared state, and if there is a way to add support for that for the drilldown use case. cc @rudolf

In NP we want to limit migrations to the "owner" of the type. If the embeddables plugin registers the embeddable type we don't want another plugin (elastic or third party) to be able to migrate it's data. I think it's important to maintain this ownership aspect. So if embeddables wants to allow other plugins to migrate the data it stores on their behalf, it should explicitly allow this and might want to add certain restrictions or safe guards. Providing isolation between plugins is a fundamental motivation of the New Platform, so I believe we need to apply it not only to runtime JS but also to our data.

I find it really hard to reason about this in an abstract way. Do we have an existing example in our code base or can we add examples to the enhancement pattern? It would help to see the code for the following:

  1. Which plugin owns the saved object type
  2. Which plugin would like to migrate the data stored in (1)

My understanding is that we have (a) a ToDoPlugin that persists data from an Embeddable, (b) a DrilldownsPlugin "enhances" the Embeddable and wants to migrate the data that it added as part of this enhancement. Because the ToDoPlugin stores the data on behalf of the DrilldownsPlugin via the Embeddable, there's no way for the ToDoPlugin to know that it needs to apply DrilldownsPlugin's migrations.

In this case it sounds like the DrilldownsPlugin needs to supply a migration function as part of it's enhancement of the Embeddable. If ToDoPlugin wants to persist an Embeddable, it needs ask the Embeddable plugin for all the migration functions that was registered. As the owner of the data the ToDoPLugin then provides all these migration functions as part of registering the save dobject type that persists this embeddable data.

Because this is dynamic, we will probably have to use observables since there's no direct dependency between ToDoPlugin and DrilldownsPlugin so there's no guarantee about the order in which DrilldownsPlugin and ToDoPLugin's setup functions get called.

@joshdover
Copy link
Contributor

In this case it sounds like the DrilldownsPlugin needs to supply a migration function as part of it's enhancement of the Embeddable. If ToDoPlugin wants to persist an Embeddable, it needs ask the Embeddable plugin for all the migration functions that was registered. As the owner of the data the ToDoPLugin then provides all these migration functions as part of registering the save dobject type that persists this embeddable data.

This lines up with the discussion we had previously about a similar (same?) problem and I think that solution still makes sense to me.

Another potential option is that Drilldowns could have it's own SO type that references the OSS one and contains its own enhanced data. This could be a bit cleaner since it allows the Drilldowns plugin to be disabled without breaking Kibana's ability to read the regular OSS state. Drilldowns would need to manage fetching these objects and migrating them when needed.

Like Rudolf mentioned, it's hard to reason about this abstractly without a concrete example, so it's possible I'm missing details about why this wouldn't work.

@stacey-gammon
Copy link
Contributor Author

Thanks for weighing in @rudolf and @joshdover. Both of you have asked for a more concrete example and I am working on one but it's not yet ready to share.

@rudolf re:

In NP we want to limit migrations to the "owner" of the type. If the embeddables plugin registers the embeddable type we don't want another plugin (elastic or third party) to be able to migrate it's data.

We have a concrete use case for a migration of a non-owner: the security team needs to migrate data on every saved object. Should this be a supported use case?

The security team is also following the enhancements pattern by registering a custom SavedObjectClient provider (I copied the pattern from this use case).

In this case it sounds like the DrilldownsPlugin needs to supply a migration function as part of it's enhancement of the Embeddable. If ToDoPlugin wants to persist an Embeddable, it needs ask the Embeddable plugin for all the migration functions that was registered. As the owner of the data the ToDoPlugin then provides all these migration functions as part of registering the save dobject type that persists this embeddable data.

Yes, this plus the link Josh highlighted above I think is a potential solution. But that solution does support migrations on shared state, which contradicts your earlier statement:

I think it's important to maintain this ownership aspect. So if embeddables wants to allow other plugins to migrate the data it stores on their behalf, it should explicitly allow this and might want to add certain restrictions or safe guards. Providing isolation between plugins is a fundamental motivation of the New Platform, so I believe we need to apply it not only to runtime JS but also to our data.

I made some modifications to the code linked to above:

  • migrations aren't registered via core.savedObject but via a new persistableState plugin
  • the migration is not "safely wrapped" - it should be applied to every embeddable type, not just the owner.
export class DrilldownPlugin {
  public setup(core: CoreSetup, { embeddables, persistableState }) {
   embeddables.setCustomFactoryProvider(getEnhancedFactoryProvider());
   persistableState.registerMigration('embeddables', (type, input, version) => {
    if (!input.enhancements.events) return;
    const migratedEvents = input.enhancements.events.map(
        event => persistableState.migrate(event.type, event.input, event.version);
     return {
       ...input,
      enhancements: {
        ...input.enhancements,
        events: migratedEvents
       }
     }
   };
  }
}

What are the risks in allowing migrations to be registered like this, that may migrate unowned data?

What would this system look like if we wanted embeddables to enforce restrictions and safeguards?

@rudolf
Copy link
Contributor

rudolf commented Apr 15, 2020

We have a concrete use case for a migration of a non-owner: the security team needs to migrate data on every saved object. Should this be a supported use case?

The security team is also following the enhancements pattern by registering a custom SavedObjectClient provider (I copied the pattern from this use case).

Yes, I think there's many similarities but it's not exactly the same. I'm not sure if there is any alternative, but for spaces, the decision was made that the basic building blocks for spaces need to be provided by core:

  • the concept of a "namespace" and isolating saved object operations to a given namespace
  • the ability to enhance the saved object client -- the spaces plugin uses this to remove the ability to specify an arbitrary namespace for saved object operations, forcing consumers of the saved object client to only operate in the namespace of the incoming request.

For sharing saved objects across multiple spaces, core's concept of a namespace needs to be migrated. So this should be thought of as Core migrating the fundamental data structure that underlies saved objects. This migration code will be written by the security team, but it will part of Core's codebase and owned by the platform team.

Rubber-ducking first-class support for enhanced state in Core I'm not sure if this is a viable solution, but it might be worth exploring what it would look like if persistable registeries or "enhanced state" was made a first-class concept in Core. Core could add a mapping to every saved object like `"enhancedState": {"drilldownEmbeddable": {"somekey": "some state value"}}` that acts a bit like a saved-object type. Plugins can register enhancedstate types like a `drilldownEmbeddable` and supply migration functions that will only run on the state stored in that enhancement. Any plugin can write state to it's saved object's `enhancedstate` properties and when reading the state, any registered migration function will be applied. This provides isolation and data ownership. Not sure of the downsides, it will probably add a lot of complexity since we're essentially adding a sub-type which needs to duplicate a lot of the behaviour that a top-level type already requires.

But that solution does support migrations on shared state, which contradicts your earlier statement

Migrations are still applied to shared state, but the data owner is the gatekeeper. So the plugin can catch and ignore exceptions from a buggy registrant that would otherwise make Kibana unavailable or in the future just this saved object unavailable. The plugin can also apply some filtering to prevent a registrant from accidentally deleting or altering data it shouldn't have access to (this is unlikely to be malicious code since there are easier attack vectors if you manage to get a user to install your server-side plugin).

What are the risks in allowing migrations to be registered like this, that may migrate unowned data?
What would this system look like if we wanted embeddables to enforce restrictions and safeguards?

Could you elaborate on your example, is the persistableState plugin also the data owner who registers the saved object type and migrations with Core?

@mshustov
Copy link
Contributor

I'm not sure if this is a viable solution, but it might be worth exploring what it would look like if persistable registeries or "enhanced state" was made a first-class concept in Core

I'm not familiar with the current implementation of the data persistence layer. So I have a couple of questions. The state-owner plugin defines how its state should be serialized on writing (to add this drilldownEmbeddable key) / migrated on reading (to access drilldownEmbeddable key and migrate the value)? And now we need a registry to map serialization/migration functions to the state?

@stacey-gammon
Copy link
Contributor Author

Meeting Notes:

  • Should we re-use the saved object client migrations (requiring it to be modified to support the enhancements pattern)? If so, should the platform team own this?

  • No persistable registry in core, core would just give the building blocks to the persitable

  • The building block core would expose is:

    • mapping saved object type -> migrations/references (like it does now).
  • One thought was using the registry id as the version but this won't support the enhancement use case.

  • Can we "guess" the version based on the data? This is risky.

  • Ideally there is a version property on all persistable state.

    • Maybe a fallback can be the "guess the version based on state" method to support both cases?
    • "Best practice" is to have a version property but not enforced initially?
  • Concerns: client side only vs server and client side?

    • Saved object migrations run on the server.
    • Will this system exist only on client side?
    • No, it needs to exist on server side as well (think about server side alerts storing information - they may never be instantiated on client side at all)
  • Is it time to formalize the concept of common registries?

  • Some functions have different implementations on client vs server. Some reuse the same functionality. (e.g. esaggs has different implementations).

  • In the case of migrations the client and server side implementation should be the same by requirement.

  • The client side registry could simply just query the API to get the server side implementation.

  • This could be fallback behavior?

  • Formal common contracts? Probably not going to happen short term. Maybe short term, we can throw an error if the registry items don't exist in both?

  • Performance concerns

  • e.g. every time an embeddable is deserialized, drillddowns enhancer needs to run all possible uiAction migrations.

  • Maybe could use bfetch to avoid a slew of calls and batch them up.

  • Simplest step:

  • assume implementations manually on both client and server, up to registrator to add migrations on both client and server

  • Next steps to explore

    • automatic client calls to server side implementations. Maybe not do this first because of performance concerns and ensure we have time to be confident this won't be an issue.
    • Formal common contracts/ enforcements, etc.
  • This system should not be as complex as saved object migrations system because it's not writing to storage, it's just a transformation/schema evolution library.

  • Owners

  • App Arch

  • We'll need a lot of testing for all the corner cases.

Action Items: @ppisljar & @lukeelmers - as you explore/work on this, can you formalize plan in persisted state (ba dum tisss)? Like a google doc or slides, or something that we can review to help think through how the solution solves all the complicated corner cases?

@rudolf
Copy link
Contributor

rudolf commented Apr 17, 2020

To add to stacey's notes. We agreed that this is a data / schema evolution problem that's similar to saved object migrations, but doesn't have to be tied directly to it. Since state can live in URL's or localStorage being able to evolve this state independent of saved objects would be more useful.

This system should follow the same API as saved object migrations, i.e. a migration consists of a semver version number and a transformation function pair. That would allow us to create saved object migrations which are composed of any data evolution migrations if we ever need to persist the data for being able to query on it. We don't currently know of any use cases that require persistence. Since this would require further work in Core to support, we'll only work on it once we found a use case that requires it.

@rudolf
Copy link
Contributor

rudolf commented Apr 17, 2020

Screenshot 2020-04-17 at 21 14 12

@stacey-gammon
Copy link
Contributor Author

Blocking #54837

@joshdover joshdover self-assigned this Jul 30, 2020
@stacey-gammon stacey-gammon added Dependency:TimeToVisualize impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Sep 10, 2020
@joshdover
Copy link
Contributor

Removing myself from this issue since I am not focusing on any new solutions here.

@joshdover joshdover removed their assignment Dec 2, 2020
@ppisljar
Copy link
Member

ppisljar commented Dec 3, 2020

should we close this ?

@rudolf
Copy link
Contributor

rudolf commented Jan 19, 2021

The last bit of work to enable this is described in #84907 and likely to be merged soon so I think we can close this discussion.

@rudolf rudolf closed this as completed Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. NeededFor:Dashboard NeededFor:Security Project:TimeToVisualize Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

7 participants