-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Add Bagging by Query for Lambdarank #6623
Changes from all commits
0618bb2
185bdf6
38fa4c2
2fce147
1f7f967
9e2a322
666c51e
9e2c338
3abbc11
13fa0a3
9264768
481ab03
7e51534
8124999
cc6f688
0993154
8a9b356
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4509,3 +4509,26 @@ def test_quantized_training(): | |
quant_bst = lgb.train(bst_params, ds, num_boost_round=10) | ||
quant_rmse = np.sqrt(np.mean((quant_bst.predict(X) - y) ** 2)) | ||
assert quant_rmse < rmse + 6.0 | ||
|
||
|
||
def test_bagging_by_query_in_lambdarank(): | ||
rank_example_dir = Path(__file__).absolute().parents[2] / "examples" / "lambdarank" | ||
X_train, y_train = load_svmlight_file(str(rank_example_dir / "rank.train")) | ||
q_train = np.loadtxt(str(rank_example_dir / "rank.train.query")) | ||
X_test, y_test = load_svmlight_file(str(rank_example_dir / "rank.test")) | ||
q_test = np.loadtxt(str(rank_example_dir / "rank.test.query")) | ||
params = {"objective": "lambdarank", "verbose": -1, "metric": "ndcg", "ndcg_eval_at": [5]} | ||
lgb_train = lgb.Dataset(X_train, y_train, group=q_train, params=params) | ||
lgb_test = lgb.Dataset(X_test, y_test, group=q_test, params=params) | ||
gbm = lgb.train(params, lgb_train, num_boost_round=50, valid_sets=[lgb_test]) | ||
ndcg_score = gbm.best_score["valid_0"]["ndcg@5"] | ||
|
||
params.update({"bagging_by_query": True, "bagging_fraction": 0.1, "bagging_freq": 1}) | ||
gbm_bagging_by_query = lgb.train(params, lgb_train, num_boost_round=50, valid_sets=[lgb_test]) | ||
ndcg_score_bagging_by_query = gbm_bagging_by_query.best_score["valid_0"]["ndcg@5"] | ||
|
||
params.update({"bagging_by_query": False, "bagging_fraction": 0.1, "bagging_freq": 1}) | ||
gbm_no_bagging_by_query = lgb.train(params, lgb_train, num_boost_round=50, valid_sets=[lgb_test]) | ||
ndcg_score_no_bagging_by_query = gbm_no_bagging_by_query.best_score["valid_0"]["ndcg@5"] | ||
assert ndcg_score_bagging_by_query >= ndcg_score - 0.1 | ||
assert ndcg_score_no_bagging_by_query >= ndcg_score - 0.1 | ||
Comment on lines
+4533
to
+4534
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PR's description states that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since I found the result can be random when the dataset is small. For example, on CPU, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition, we also see a significant improvement in training speed with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I see. So this test is for something like " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neNasko1 is this something that might be missing for CUDA support in #6586?