Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value SparkHelper overrides config for spark.master #120

Closed
christopherfrieler opened this issue Jan 30, 2022 · 4 comments
Closed

Comments

@christopherfrieler
Copy link

Currently org.jetbrains.kotlinx.spark.api.withSpark(props, master, appName, logLevel, func) provides a default value for the parameter master. This overrides the value from the external spark config provided with the spark-submit command.

In my case, I'm running Spark on yarn, so I'm having a submit-command like this ./spark-submit --master yarn .... But when I start the SparkSession in my code with

withSpark(
    props = mapOf("some_key" to "some_value"),
) {
    //... my Spark code
}

, the default "local[*]" for the parameter master is used. This leads to a local Spark session on the application master (which kind of works, but does not make use of the executors provided by yarn) and yarn complaining that I never started a Spark session after my job finishes, because it doesn't recognize the local one.

I think, by default the value for master should be taken from the external config loaded by SparkConf(). As a workaround, I load the value myself:

withSpark(
    master = SparkConf().get("spark.master", "local[*]"),
    props = mapOf("some_key" to "some_value"),
) {
    //... my Spark code
}

This works, but is not nice and duplicates the default value "local[*]".

@Jolanrensen
Copy link
Collaborator

Thanks for letting us know!
How do you suggest we set the default value?
Also, maybe @asm0dey has an idea?

@asm0dey
Copy link
Contributor

asm0dey commented Feb 16, 2022

@christopherfrieler thanks for the idea! @Jolanrensen it looks like Christopher provided us with complete implementation, we should just change the default value :)

@Jolanrensen
Copy link
Collaborator

Sure! Just wanted to check whether this is the best way to go, since they did call it a "workaround" and "not nice", haha. But creating a new SparkConfig() does seem the best way to access system/jvm variables.

@Jolanrensen
Copy link
Collaborator

Added!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants