Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(GH-1934) Add support for selective sync of plugins to target nodes #1935

Merged

Conversation

hajee
Copy link
Contributor

@hajee hajee commented Jun 24, 2020

Implement feature request #1934

@hajee hajee requested a review from a team as a code owner June 24, 2020 15:13
@hajee hajee force-pushed the selective_module_sync branch 4 times, most recently from 05b0a1f to 0ad502e Compare June 24, 2020 15:29
@puppetcla
Copy link

CLA signed by all contributors.

@adreyer
Copy link
Contributor

adreyer commented Jun 24, 2020

I don't think there is a good argument for allowing this to be set per target. It makes more sense as a global setting that is used for all targets.

@hajee
Copy link
Contributor Author

hajee commented Jun 25, 2020

In our current use-case, we have multiple nodes with different roles and we run different bolt tasks on them. Each bolt task has a specific set of modules it needs.

Setting it globally means all nodes would get the same set of modules. Although that would probably help a bit, this implementation gives you maximum control.

end

private def libexec
@libexec ||= File.join(Gem::Specification.find_by_name('bolt').gem_dir, 'libexec')
end

def selected_module?(mod)
@selected_modules.any? { |m| mod =~ /#{m}/ }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@selected_modules.any? { |m| mod =~ /#{m}/ }
@selected_modules.include?(mod)

This will more accurately select the exact module, not ones that include the modulename. For example if I have module nginx and call selected_module?('nginx-config') this will return true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to have an array of regular expressions, to make it easy to select multiple modules. So I put it there explicitly. For example, the augueas provides can be added like this:

['augeasproviders_.*']

Personally I prefer this above specifying all modules individually.

@adreyer
Copy link
Contributor

adreyer commented Jun 29, 2020

I'm concerned that scoping this to nodes and trying to track exactly what is managed on each node is going to quickly become unmaintainable for many users and introduces new performance issues on the bolt controller around building multiple plugin bundles. This it works tolerably well in the case where you are applying roles to servers and can try to set which modules are needed for each role. But you're still left tracking the role assignment and module code separately from the plugin bundle mapping . What happens when a user wants to run a plan that goes outside of these bounds if a fact is simply not synced things may behave unexpectedly.

@hajee
Copy link
Contributor Author

hajee commented Jun 29, 2020

@adreyer I understand your concern regarding maintainability. On the other hand, I guess that goes for all features. Always use it moderate and wise. Although you could add this to the inventory, I think it is best used in the plan by calling

$target.set_config('selective_sync', ['stdlib'])

just before applying the puppet block. This way it keeps the selection close to the puppet code and keeps everything as neat as possible. Missing a module would be the same as missing a module in the Puppetfile. When the set_config is close to the puppet code, probably easy to spot and to fix.

I see this feature a bit like setting the puppet-agent feature. You allow the author of the plan, control if he/she wants to install an agent. If he/she tells bolt there is an agent and it is not there, you get an error.

I have been thinking about using a variable, but because it is functionally so close to a feature, I thought it best to use a config setting.

Regarding performance: I'm not sure what the impact is on large sets of nodes. If this is an issue, we could always optimize this.

@adreyer
Copy link
Contributor

adreyer commented Jun 29, 2020

I was imagining this being set in the inventory file. If the plan author is expected to control this it seems much more natural as a parameter to apply and apply_prep themselves. Otherwise the plan author has to worry about aligning with preexisting global inventory state and cleaning up to avoid polluting it.

@hajee
Copy link
Contributor Author

hajee commented Jun 29, 2020

Functionality that sounds like a great idea! I'll have a look at how we can implement this.

@adreyer
Copy link
Contributor

adreyer commented Jun 29, 2020

I think it's best if this feature starts off fairly simple and with the understanding that as a performance optimization it may be ignored/deprecated later. For that reason I think we should name it something like "required_modules" and have it be an explicit list of modules. While a regex may be convienent it seems primarily to be be an issue for augeus providers.

apply($targets, 'required_modules' => ['module_1', 'module_2']) { .... }

Limiting this further by target may be a further optimization to this but there isn't really a good syntax for that in the plan language. We have per target parameters for run_task now but that won't translate to apply because it would require a second block. This seems like it's primarily a problem when you are applying a largeish but largely unique catalog on each node. In that case the cost of syncing module code is much less important.

@hajee hajee force-pushed the selective_module_sync branch 2 times, most recently from d55ffca to e3eb677 Compare June 30, 2020 11:25
@hajee
Copy link
Contributor Author

hajee commented Jun 30, 2020

@adreyer @lucywyman The latest force commit contains the implementation as discussed

bolt-modules/boltlib/lib/puppet/functions/apply_prep.rb Outdated Show resolved Hide resolved
bolt-modules/boltlib/lib/puppet/functions/apply_prep.rb Outdated Show resolved Hide resolved
bolt-modules/boltlib/lib/puppet/functions/apply_prep.rb Outdated Show resolved Hide resolved
lib/bolt/applicator.rb Outdated Show resolved Hide resolved
@hajee hajee requested a review from beechtom July 2, 2020 09:17
lib/bolt/applicator.rb Outdated Show resolved Hide resolved
lib/bolt/applicator.rb Outdated Show resolved Hide resolved
lib/bolt/applicator.rb Outdated Show resolved Hide resolved
lib/bolt/applicator.rb Outdated Show resolved Hide resolved
lib/bolt/applicator.rb Outdated Show resolved Hide resolved
@puppetcla
Copy link

CLA signed by all contributors.

@lucywyman
Copy link
Contributor

I think this is just about ready. My only comment is that in order for this to show up in our changelog, your commit needs to have a message after the !feature flag:

!feature

* **Add selective plugin sync for apply and apply_prep** ([1934](https://github.com/puppetlabs/bolt/issues/1934))
  A new option `_required_modules` for `apply` and `apply_prep` functions will limit which module artifacts are uploaded to each target. This is useful for improving performance when applying manifests that don't require custom content from modules in environments that contain a lot of modules.

@hajee
Copy link
Contributor Author

hajee commented Jul 7, 2020

@lucywyman I can squash all commits and change the text. Is that what you want, or do you want to retain the editing history in the PR?

@lucywyman
Copy link
Contributor

lucywyman commented Jul 7, 2020

Squash and edit would be great! I'm realizing we should update our contributing document with all this info See https://github.com/puppetlabs/bolt/blob/master/CONTRIBUTING.md#pull-requests - thanks for bearing with us!

Copy link
Contributor

@nicklewis nicklewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added variables should be locals and not instance variables. Instance variables aren't meaningful for functions and we explicitly don't want it to be part of the shared Applicator instance for the apply() block case since you may run multiple apply() statements in a single plan and they can each have their own required modules. The way it's written now, those variables will just be overridden next time so it's not technically broken, but it makes the data lifecycle confusing to follow.

Other than that one detail, this looks great! Thank you for handling the implementation.

…et nodes

* **Added support for selective sync of plugins to target** ([puppetlabs#1934](puppetlabs#1934))

!feature
This feature allows a plan author to specificy what plugins need to be synced to the target node. You can do this by add an array of module names to the options parameter. Here is an example:

    apply($target, ‘required_modules => [‘stdlib’] ) {
      notice 'Hallo'
    }

The apply block here ony sync’s the stdlib module to the target. The same feature also
works on apply_prep functions. Here is an example:

    apply_prep($target, ‘required_modules => [‘stdlib’] )

When no required modules are specified, *ALL* plugins from *ALL* modules are synced.
@hajee
Copy link
Contributor Author

hajee commented Jul 7, 2020

@nicklewis I force pushed these changes

@lucywyman lucywyman merged commit 857d216 into puppetlabs:master Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants