Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark #4446

Merged
merged 4 commits into from
May 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/jvm/xgboost4j_spark_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ Early Stopping

Early stopping is a feature to prevent the unnecessary training iterations. By specifying ``num_early_stopping_rounds`` or directly call ``setNumEarlyStoppingRounds`` over a XGBoostClassifier or XGBoostRegressor, we can define number of rounds if the evaluation metric going away from the best iteration and early stop training iterations.

In additional to ``num_early_stopping_rounds``, you also need to define ``maximize_evaluation_metrics`` or call ``setMaximizeEvaluationMetrics`` to specify whether you want to maximize or minimize the metrics in training.
When it comes to custom eval metrics, in additional to ``num_early_stopping_rounds``, you also need to define ``maximize_evaluation_metrics`` or call ``setMaximizeEvaluationMetrics`` to specify whether you want to maximize or minimize the metrics in training. For built-in eval metrics, XGBoost4J-Spark will automatically select the direction.

For example, we need to maximize the evaluation metrics (set ``maximize_evaluation_metrics`` with true), and set ``num_early_stopping_rounds`` with 5. The evaluation metric of 10th iteration is the maximum one until now. In the following iterations, if there is no evaluation metric greater than the 10th iteration's (best one), the traning would be early stopped at 15th iteration.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import scala.util.Random

import ml.dmlc.xgboost4j.java.{IRabitTracker, Rabit, XGBoostError, RabitTracker => PyRabitTracker}
import ml.dmlc.xgboost4j.scala.rabit.RabitTracker
import ml.dmlc.xgboost4j.scala.spark.params.LearningTaskParams
import ml.dmlc.xgboost4j.scala.{XGBoost => SXGBoost, _}
import ml.dmlc.xgboost4j.{LabeledPoint => XGBLabeledPoint}
import org.apache.commons.io.FileUtils
Expand Down Expand Up @@ -157,13 +158,21 @@ object XGBoost extends Serializable {
try {
val numEarlyStoppingRounds = params.get("num_early_stopping_rounds")
.map(_.toString.toInt).getOrElse(0)
if (numEarlyStoppingRounds > 0) {
if (!params.contains("maximize_evaluation_metrics")) {
throw new IllegalArgumentException("maximize_evaluation_metrics has to be specified")
val overridedParams = if (numEarlyStoppingRounds > 0 &&
!params.contains("maximize_evaluation_metrics")) {
if (params.contains("custom_eval")) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when num_early_stopping_rounds is non-zero, eval_metric is a built-in metric, the custom_eval is still in params, just the value is null, so it will still throw an exception.
So the if condition should be if (params.contains("custom_eval") && params("custom_eval") != null) or remove custom_eval from params.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm....I think I missed some part of this PR

The reality is early stopping doesn’t work with custom metrics

throw new IllegalArgumentException("maximize_evaluation_metrics has to be "
+ "specified when custom_eval is set")
}
val eval_metric = params("eval_metric").toString
val maximize = LearningTaskParams.evalMetricsToMaximize contains eval_metric
logger.info("parameter \"maximize_evaluation_metrics\" is set to " + maximize)
params + ("maximize_evaluation_metrics" -> maximize)
} else {
params
}
val metrics = Array.tabulate(watches.size)(_ => Array.ofDim[Float](round))
val booster = SXGBoost.train(watches.toMap("train"), params, round,
val booster = SXGBoost.train(watches.toMap("train"), overridedParams, round,
watches.toMap, metrics, obj, eval,
earlyStoppingRound = numEarlyStoppingRounds, prevBooster)
Iterator(booster -> watches.toMap.keys.zip(metrics).toMap)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,10 @@ private[spark] object LearningTaskParams {

val supportedObjectiveType = HashSet("regression", "classification")

val supportedEvalMetrics = HashSet("rmse", "mae", "logloss", "error", "merror", "mlogloss",
"auc", "aucpr", "ndcg", "map", "gamma-deviance")
val evalMetricsToMaximize = HashSet("auc", "aucpr", "ndcg", "map")

val evalMetricsToMinimize = HashSet("rmse", "mae", "logloss", "error", "merror",
"mlogloss", "gamma-deviance")

val supportedEvalMetrics = evalMetricsToMaximize union evalMetricsToMinimize
}