Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow JVM-Package to access inplace predict method #9167

Merged
merged 24 commits into from
Sep 11, 2023

Conversation

yoquinjo
Copy link
Contributor

@yoquinjo yoquinjo commented May 16, 2023

Here at Sovrn we have been using a fork of xgboost that with custom changes that allow us to access the inplace_predict method from the JNI. We believe this would be of use to others and are looking for feedback/reviews to add this functionality to the main xgboost library.

Close #5951 .

StephanTLavavej and others added 2 commits May 16, 2023 13:18
Squashed commits:
[2aa500b] Try a different way of creating the proxy
[c92d2df] try only one path
[2f81ff3] dMatrix Handle null check and more statments
[2e6df6f] exit on entering ifs
[688fc73] More null checks
[3e37688] change error logging
[710dae5] missing semicolon
[489c337] Put checks back in, change check in c_api
[ad5972d] time to break the cpu predictor
[7dfa3c0] No more check, do it live
[abd64ff] dereference for logging
[eb4cd9a] See if we can log what the type for proxy is
[74c4376] maybe log something
[a367ca2] uncomment for test, pass dMatrix through Booster.java
[3e98f14] comment out for testing
[66f0d8c] Changes to tests
[737b293] some more changes to test
[b3796b7] a dMatrix for testing
[e995228] Convert long to handle
[66ca8c3] c h e c k s t y l e
[14f2243] Checkstyle failure
[163bfca] missing comma
[c869a14] First try at getting dMatrix logic into changes
Squashed commits:
[99b8f1f] Remove pom change
[7b80fef] Remove changes from cpu_predictor, fix rebase miss
[a531053] Cleaning up XGBoosterInplacePredict

Further de-newification

More clean-up
[de6e109] pom change git didn't pick up (+5 squashed commits)
Squashed commits:
[6182244] One missed pom
[ab5bc5e] Update all the poms, even the unused ones
[a96c3b9] More clean-up
[aea9cd3] Further de-newification
[1539100] Cleaning up XGBoosterInplacePredict
[a81361d] increment our version (+15 squashed commits)
Squashed commits:
[c4d6d52] missed param
[b343f89] Remove redundant line
[5609102] Remove need to pass DMatrix
[e31c53d] another missed logging message
[f6a2508] comment out some logging
[148e29e] undo static cast
[6372805] More logging
[a173763] change how proxy is cast in cpu predictor
[5a08ff8] proxy to stuff
[823c92f] Dense adapter changes
[9a3214b] change some variables and things
[d9dd5c4] uncomment stuff, more logging
[7968bca] Set dense data
[9e79c2e] Try some stuff with DMatrix
[2b8cca6] Print result of p_m.get()
@yoquinjo yoquinjo force-pushed the sovrn-inplace-predict-java branch from c77e146 to ee0a029 Compare May 16, 2023 21:23
src/c_api/c_api.cc Outdated Show resolved Hide resolved
Squashed commits:
[3e181c8] Remove missed comment
[93ed445] Replace single piped equals in inplace predict
[fc9b1d4] Remove ArrayComparator in java inplace predict test for junit assertion
@yoquinjo yoquinjo force-pushed the sovrn-inplace-predict-java branch from 9ecad16 to 5ce8d65 Compare May 18, 2023 15:19
@trivialfis trivialfis self-requested a review May 18, 2023 19:58
@yoquinjo yoquinjo requested a review from dotbg May 22, 2023 14:31
Copy link
Contributor

@dotbg dotbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few cleanup comments from my side. I am not an expert in core functionality, I can only review JVM-related changes

out_dim, out_result);
}

XGB_DLL int XGBoosterInplacePredict(BoosterHandle handle,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is the same function as XGBoosterPredictFromDense, can we use it instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've been working on moving us to XGBoosterPredictFromDense but a direct usage would require us to use something similar to what is used in the python code, but from reading it it looks like the arrays that are passed through are created using methods from numpy, which there doesn't seem to be any native support or connection for these types in Java. Should we be building a custom byte stream to send in here to have all the metadata or would something like com.google.gson.Gson being added to the imports for Java be acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I think one can build the JSON interface in xgboost.cpp using the xgboost:Json class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've tried a few different ways of building the JSON interface locally, but the best I've managed failed the type check in PredictFromDense, so incorrect conversion. Could you point me to what methods in the Json class I should be using for converting a Java float array to the expected format in the api? I'm inexperienced with C++ and could use additional guidance on that side of the pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow reply. I will look into your PR tomorrow and see if I can help.

if (predContribs) {
optionMask = 4;
}
DMatrix d_mat = new DMatrix(data, num_rows, num_features, missing);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, inplace_predict doesn't need the DMatrix, which is the primary motivation of developing this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay - I'm looking through the python code that calls predictFromDense to see how that works - should we be implementing and using proxy DMatrix on the Java side of things? I'll start looking at making that change if you can just let me know if that would be the correct approach.

Historically, we didn't have DMatrix here when we initially made our changes in 1.4 - this addition of the DMatrix here was due to changes in cpu_predictor.cc that now requires a DMatrix, we were originally using the parameter dmlc::any const &x in the old version of inplacePredict there and sending a null pointer for the DMatrix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing! We unified the interface under the proxy DMatrix class. Let me do some more investigation into the JVM implementation first. I think I can help with the C++ code.

@trivialfis
Copy link
Member

trivialfis commented Jun 23, 2023

TODOs for me:

  • Use array interface with ProxyDMatrix.
  • Look into GPU-based inputs.
  • Look into Scala implementation.

@yoquinjo
Copy link
Contributor Author

yoquinjo commented Aug 4, 2023

@trivialfis Hi, I wanted to check in and see if there's anything me or my team should be working on for this pull request at the moment

@trivialfis
Copy link
Member

Apologies for the long wait, I have been clearing up items for the 2.0 release. Will get to this once #9440 is sorted out.

@trivialfis
Copy link
Member

Please take a look at sovrn#9 .

@yoquinjo
Copy link
Contributor Author

Update from our end - I merged sovrn#9 into this branch. I built a new version for testing on our end, however we are still using models for one of projects is still using models built on 1.4, so I can't do a test in a live environment yet. We're also going to look into adding a test to verify thread safety.

@trivialfis
Copy link
Member

trivialfis commented Aug 22, 2023

I think xgboost can load old models. We are still testing models from 0.9 and 1.0: https://github.com/dmlc/xgboost/blob/master/tests/python/test_model_compatibility.py

The problem arises when you load a new model with an old xgboost.

@yoquinjo
Copy link
Contributor Author

I think xgboost can load old models. We are still testing models from 0.9 and 1.0: https://github.com/dmlc/xgboost/blob/master/tests/python/test_model_compatibility.py

The problem arises when you load a new model with an old xgboost.

We're failing to load the 1.4 model in our test environment, we have a 1.7 model training now to determine if its something on our end - upgrading that project is work we have in our backlog anyway, I'll let you know if we are seeing an actual compatibility issue

@yoquinjo
Copy link
Contributor Author

We're able to load the new model trained on a 1.7 version. The models we have problems loading were trained on 1.4.1

@trivialfis
Copy link
Member

Which model format are you using?

@yoquinjo
Copy link
Contributor Author

Which model format are you using?

We use a binary for the model

*
* @return predict Result matrix
*/
public float[][] inplace_predict(float[] data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbo4958 I just learned about the existence of BigDenseMatrix, do you think we should return the prediction in that?

@trivialfis
Copy link
Member

We use a binary for the model

Feel free to share (maybe in private), we can help debug the issue.

@yoquinjo
Copy link
Contributor Author

Feel free to share (maybe in private), we can help debug the issue.

Thank you, I haven't verified yet if it is the model failing, since the error is appearing to us at the level where it could be our model or our mapping. I'm not too concerned on troubleshooting this, as we have it in our backlog to upgrade these models anyway which fixes whatever is going on. If its helpful, when compatibility checks are run and failing on your end for that version I can provide a model file. Otherwise, it might be unneeded effort since we have a fix available when this PR is merged and we can move to an official version.

@ByteSizedJoe
Copy link
Contributor

@trivialfis

Hello! I am also from Sovrn and just wanted to check in and see if you had any specific thread-safety concerns that the test should address? I've been reading through some PRs and it sounds like the predict method not thread-safe at some point and so someone on the JVM side wrapped that method in synchronized in order to work around the non thread-safe C call. Here's the PR for the C thread-safety changes: #5853

It looks like a few Java folks want to drop the synchronized off the predict call and this is on the 2.0 road map: #7027

It looks like in that PR, there is a reference to the "JVM iterator" not being thread-safe which also seems to be the same concern that came up in one of the branches for this PR: sovrn#9. I'm not sure this is related to the inplace_predict that we're trying to merge since we're not using a dmatrix but rather a float[][].

The method that we're specifically looking to get merged is the inplace_predict which utilizes "XGBoosterPredictFromDense", I think from some comments I've read in some PRs that indicate that this method is for sure thread-safe.

So with that said, is there any specific thread-safety issues you're seeing/concerned with on the Java side? Thank you!

@trivialfis
Copy link
Member

So with that said, is there any specific thread-safety issues you're seeing/concerned with on the Java side? Thank you!

No concern with thread safety at the moment. As you have discussed, the mentioned issues are related to DMatrix construction instead of the predict function.

I will review the PR. @wbo4958 would be great if you could take a look as well.

namespace linalg = xgboost::linalg;
jfloat *data = jenv->GetFloatArrayElements(jdata, nullptr);
auto t_data = linalg::MakeTensorView(
xgboost::Context::kCpuId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to rebase the PR, we made some changes to these utilities.

/**
* Handle base margin
*/
BoosterHandle proxy{nullptr};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help add a test for prediction with margin. With regression using reg:squarederror:

pred0 = booster.inplace_predict(X, margin)
pred1 = booster.inplace_predict(X) + margin
assert pred0 == pred1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trivialfis Hello! I wrote a test, but it appears to be failing. I think it may be related to this: #9536

Is the in-place prediction not working properly with margins at the moment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug happens when you are running prediction with base margin on GPU while the input is from the CPU or the other way around. I don't think it's relevant to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to push your tests, I can take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! I pushed the new tests. Let me know if you have any questions on anything! Right now what I'm seeing with the failing test is that the prediction( x with margin) != prediction(x) + margin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! That did the trick. I'll post your changes from below shortly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Please merge/rebase the latest branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Let me know if there's anything else needed!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Let me know if there's anything else needed!

Fix CI errors, if there's any. ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the CI failed with some of the new C++ code you added from the sovrn#9 branch. I tried taking a look at it, but by no means am I a C++ expert. It looks like some sort of parameter type mismatch is occurring. Could you take a look?

…e tests: inplacePredictTest and inplacePredictMultiPredictTest. Extracted common logic of generating random data sets into generateRandomDataSet method. Added new test inplacePredictWithMarginTest which should test the utilization of a margin when performing predictions.
…ounds that we're training on is 10, and we want to use all rounds arbitrarily for this test.
…on_range as iterationRange, so it is clear. Set this iteration range to [0,0) which mimics the calls in inplace_predict methods without margin.
@trivialfis
Copy link
Member

trivialfis commented Aug 31, 2023

Pushing to github.com:sovrn/xgboost.git
ERROR: Permission to sovrn/xgboost.git denied to trivialfis.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Anyway:

diff --git a/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java b/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
index 0d6020b00..e57368672 100644
--- a/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
+++ b/jvm-packages/xgboost4j/src/test/java/ml/dmlc/xgboost4j/java/BoosterImplTest.java
@@ -31,7 +31,7 @@ import static org.junit.Assert.fail;
 
 /**
  * test cases for Booster Inplace Predict
- * 
+ *
  * @author hzx and Sovrn
  */
 public class BoosterImplTest {
@@ -229,43 +229,46 @@ public class BoosterImplTest {
     DMatrix testingMatrix = new DMatrix(testX, testRows, features, Float.NaN);
     testingMatrix.setLabel(testY);
 
-    //Set booster parameters
+    // Set booster parameters
     Map<String, Object> params = new HashMap<>();
     params.put("eta", 1.0);
-    params.put("max_depth",2);
-    params.put("silent", 1);
+    params.put("max_depth", 2);
     params.put("tree_method", "hist");
+    params.put("base_score", 0.0);
 
     Map<String, DMatrix> watches = new HashMap<>();
     watches.put("train", trainingMatrix);
     watches.put("test", testingMatrix);
 
-    //Train booster on training matrix.
+    // Train booster on training matrix.
     Booster booster = XGBoost.train(trainingMatrix, params, 10, watches, null, null);
 
-    //Create a margin
+    // Create a margin
     float[] margin = new float[testRows];
     Arrays.fill(margin, 0.5f);
 
-    //Define an iteration range to use all training iterations, this should match the without margin call
-    //which defines an iteration range of [0,0)
-    int[] iterationRange = new int[] {0, 0};
+    // Define an iteration range to use all training iterations, this should match
+    // the without margin call
+    // which defines an iteration range of [0,0)
+    int[] iterationRange = new int[] { 0, 0 };
 
     float[][] inplacePredictionsWithMargin = booster.inplace_predict(testX,
-                                                                      testRows,
-                                                                      features,
-                                                                      Float.NaN,
-                                                                      iterationRange,
-                                                                      Booster.PredictionType.kValue,
-                                                                      margin);
+        testRows,
+        features,
+        Float.NaN,
+        iterationRange,
+        Booster.PredictionType.kValue,
+        margin);
     float[][] inplacePredictionsWithoutMargin = booster.inplace_predict(testX, testRows, features, Float.NaN);
 
-    for(int i = 0; i < inplacePredictionsWithoutMargin.length; i++) {
-      for(int j = 0; j < inplacePredictionsWithoutMargin[i].length; j++) {
+    for (int i = 0; i < inplacePredictionsWithoutMargin.length; i++) {
+      for (int j = 0; j < inplacePredictionsWithoutMargin[i].length; j++) {
         inplacePredictionsWithoutMargin[i][j] += margin[j];
       }
     }
-    assertArrayEquals(inplacePredictionsWithMargin, inplacePredictionsWithoutMargin);
+    for (int i = 0; i < inplacePredictionsWithoutMargin.length; i++) {
+      assertArrayEquals(inplacePredictionsWithMargin[i], inplacePredictionsWithoutMargin[i], 1e-6f);
+    }
   }
 
   private float[] generateRandomDataSet(int size) {

…g the base_score as a parameter for the booster and asserting the values between the predict with margin and predict without margin are similar within a delta(since floating point calculations vary slighty)
@ByteSizedJoe
Copy link
Contributor

@trivialfis I took a stab at trying to fix the C++ problems. I was able to build successfully locally and pass tests. If you could review and if looks good, re-trigger the CI I would appreciate it! Thank you!

@trivialfis
Copy link
Member

Thank you for the work, let me look into it next week. U don't have access to my device at the moment

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for the nice feature addition!

@ByteSizedJoe
Copy link
Contributor

Looks like the build kite multi gpu job failed. Took a look at the error and it appears that it's related to a connection being reset and not anything related to the PR. Let me know if you need me to do anything for this. Thank you!!

@trivialfis trivialfis merged commit d05ea58 into dmlc:master Sep 11, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inplace Predict Java API (xgboost4j)
7 participants