Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved config system in flytekit and improved FlyteRemote #857

Merged
merged 57 commits into from
Mar 8, 2022

Conversation

kumare3
Copy link
Contributor

@kumare3 kumare3 commented Feb 18, 2022

TL;DR

This PR substantially cleans up flytekit configuration, both in terms of the configuration settings themselves, and their usage from within the flytekit code base. Rarely used settings have been removed, remaining settings have been categorized together into classes that reflect usage, and global access has been reduced.

Flytekit Configuration Ecosystem

Where can configuration come from?

Both before and after this PR, configuration can come from a few different places,

  • Command line arguments. This is the ideal location for settings to go. (See pyflyte package --help for example.)
  • Environment variables. Users can specify these at compile time, but when your task is run, Flyte Propeller will also set configuration to ensure correct interaction with the platform.
  • A config file - an INI style configuration file. By default, flytekit will look for a file in two places
    1. First, a file named flytekit.config in the Python interpreter's starting directory
    2. A file in ~/.flyte/config in the home directory as detected by Python.

How is configuration used?

Configuration usage can roughly be bucketed into the following areas,

  • Compile-time settings - things like the default image, where to look for Flyte code, etc.
  • Platform settings - Where to find the Flyte backend (Admin DNS, whether to use SSL)
  • Run time (registration) settings - these are things like the K8s service account to use, a specific S3/GCS bucket to write off-loaded data (dataframes and files) to, notifications, labels & annotations, etc.
  • Data access settings - Is there a custom S3 endpoint in use? Backoff/retry behavior for accessing S3/GCS, key and password, etc.
  • Other settings - Statsd configuration, which is a run-time applicable setting but is not necessarily relevant to the Flyte platform.

FlyteRemote UX

This cleanup also improves the FlyteRemote experience. Following this PR, users are able to construct FlyteRemote objects more easily:

FlyteRemote(config=Config.auto())
FlyteRemote(config=Config.auto(config_file=....))
FlyteRemote(config=Config(....))

Changes to UX

Creating the Flyte client

For users that may be using the Flyte client directly (the RawSynchronousFlyteClient and SynchronousFlyteClient classes), we've updated the constructor to use the new configuration dataclasses.

Plain Usage

Just creating the client object with minimal parameters
Old

SynchronousFlyteClient(url="a.b.com", insecure=True)  # or
RawSynchronousFlyteClient(url="a.b.com", insecure=True) 

New

from flytekit.configuration import PlatformConfig
RawSynchronousFlyteClient(PlatformConfig(endpoint="a.b.com", insecure=True))  # or
SynchronousFlyteClient(PlatformConfig(endpoint="a.b.com", insecure=True))

Passing grpc credentials

If your Flyte backend relies on a custom certificate chain, you'd create the grpc credentials and pass through. With this release, users should instead pass in the file locations of the relevant certificates.

Old

cc: grpc.ChannelCredentials = get_my_credentials(...)
SynchronousFlyteClient(credentials=cc)  # or
FlyteRemote(grpc_credentials=cc)

New

SynchronousFlyteClient/FlyteRemote(private_key=your_private_key_bytes)

options and compression are also respected keyword arguments.

Default resources deleted

The configuration objects under configuration/resources.py have been removed because they were rarely used and caused more confusion than it solved. If you would like to specify resource requests/limits on a task, you can of course still do that in the @task() decorator. If not specified, the defaults will be determined by Admin. If you'd like to configure Admin's defaults , please refer to how-to-guide here.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Config object replacement

The series of files under flytekit/configuration have largely been deleted with this PR. Instead all the configuration is now embodied in the following dataclasses (some of which already existed).

  • PlatformConfig Roughly the entries in the platform.py file previously. Contains settings to talk to Flyte backend.
  • StatsConfig, SecretsConfig, S3Config, GCSConfig, DataConfig (which holds the S3/GCS config).
  • Ones that already existed are:EntrypointSettings, Image, ImageConfig, SerializationSettings

All these are wrapped in one parent class called flytekit.configuration.Config

  • flytekit.configuration.file.ConfigFile Wrapper object representing a config file. Config files are still the ini formatted files from before, though some settings have been deleted.

Settings that were only really used when registering launch plans have been moved to flytekit.remote.remote.Options. The remote package location makes more sense because these are effectively run-time settings used for registration, which flytectl has mostly taken over. FlyteRemote however still does on-demand registration.

  • For dynamic tasks, flytekit will now serialize the SerializationSettings at compile time (of the parent dynamic task) and store those bits as an environment variable. See the issue.

  • SerializationSettings, Image and ImageConfig moved from context_manager.py to configuration folder instead.

  • RawSynchronousFlyteClient scan kwargs for grpc channel credential arguments instead of taking a constructed credentials object.

Possible user impact changes

  • Old common configuration classes removed.
  • Use Parquet engine always when storing/reading schemas (this was always the default option so there should be no impact, no one was using the Fast Parquet engine as far as we know).

Tracking Issue

flyteorg/flyte#1359
flyteorg/flyte#2214

Follow-up issue

NA
OR
https://github.com/lyft/flyte/issues/

Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
kumare3 and others added 27 commits February 19, 2022 21:41
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Structured dataset

Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
* mostly nits, changing import styles, etc.

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* nits

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* nits

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* make fmt

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
#874)

* make fmt, core tests patched, remove additional context from ExecutionState, updated ImageConfig to not have Nones in the list

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* make raw

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
kumare3 and others added 13 commits March 4, 2022 11:32
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
* default

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* nit

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* nit

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* use os join

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
@kumare3 kumare3 changed the title [wip] improved config system in flytekit Improved config system in flytekit and improved FlyteRemote Mar 8, 2022
kumare3 and others added 9 commits March 7, 2022 21:21
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Ketan Umare <ketan.umare@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
# Always retry auth errors.
if i == (max_retries - 1):
# Exit the loop and wrap the authentication error.
raise _user_exceptions.FlyteAuthenticationException(str(e))
cli_logger.error(f"Unauthenticated RPC error {e}, refreshing credentials and retrying\n")
refresh_handler_fn = _get_refresh_handler(creds_config.AUTH_MODE.get())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we cannot pass the auth mode as an environment variable any longer. Was this intended?

# metadata will hold the value of the token to send to the various endpoints.
self._metadata = None

@classmethod
def with_root_certificate(cls, cfg: PlatformConfig, root_cert_file: str) -> RawSynchronousFlyteClient:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this being called?

Copy link
Contributor

@sonjaer sonjaer Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, now I see the SynchronousFlyteClient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants