Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault while loading xgboost binary models #209

Closed
wants to merge 5 commits into from

Conversation

wphicks
Copy link
Contributor

@wphicks wphicks commented Sep 30, 2020

Closes #210

@wphicks
Copy link
Contributor Author

wphicks commented Sep 30, 2020

Backtrace:

#0  0x00007fffe7c4de9f in std::string::size (this=0x7fffffff6e68)
    at /home/nwani/m3/conda-bld/compilers_linux-64_1560109574129/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3901
#1  std::string::append (this=0x7fffffff6cb0, __str=...)
    at /home/nwani/m3/conda-bld/compilers_linux-64_1560109574129/work/.build/x86_64-conda_cos6-linux-gnu/build/build-cc-gcc-final/x86_64-conda_cos6-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:777
#2  0x00007fffcac0f16a in dmlc::io::LocalFileSystem::Open(dmlc::io::URI const&, char const*, bool) () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/xgboost/lib/libxgboost.so
#3  0x00007fffe58ff7cb in dmlc::Stream::Create(char const*, char const*, bool) () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/treelite/lib/libtreelite.so
#4  0x00007fffe58ba7b7 in treelite::frontend::LoadXGBoostModel(char const*, treelite::Model*) () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/treelite/lib/libtreelite.so
#5  0x00007fffe5855015 in TreeliteLoadXGBoostModel () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/site-packages/treelite/lib/libtreelite.so
#6  0x00007ffff65ba9dd in ffi_call_unix64 () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/lib-dynload/../../libffi.so.7
#7  0x00007ffff65ba067 in ffi_call_int () from /home/whicks/anaconda3/envs/treelite_test/lib/python3.7/lib-dynload/../../libffi.so.7
#8  0x00007ffff65d2517 in _call_function_pointer (argcount=2, resmem=0x7fffffff71e0, restype=<optimized out>, atypes=0x7fffffff71a0, avalues=0x7fffffff71c0, pProc=0x7fffe5854fb0 <TreeliteLoadXGBoostModel>, 
    flags=4353) at /usr/local/src/conda/python-3.7.9/Modules/_ctypes/callproc.c:829
#9  _ctypes_callproc (pProc=0x7fffe5854fb0 <TreeliteLoadXGBoostModel>, argtuple=<optimized out>, flags=4353, argtypes=<optimized out>, restype=0x555555b8c740, checker=0x0)
    at /usr/local/src/conda/python-3.7.9/Modules/_ctypes/callproc.c:1201
#10 0x00007ffff65d2f84 in PyCFuncPtr_call (self=<optimized out>, inargs=<optimized out>, kwds=0x0) at /usr/local/src/conda/python-3.7.9/Modules/_ctypes/_ctypes.c:4025
#11 0x00005555556c0ccb in _PyObject_FastCallKeywords (callable=0x7fffca3c3ae0, stack=0x7fffca69bba0, nargs=2, kwnames=0x0) at /tmp/build/80754af9/python_1598874792229/work/Objects/call.c:199

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

@wphicks I've seen this issue before. Treelite and XGBoost both link with dmlc-core library, and they currently use incompatible versions. Treelite should use commit 5df8305fe699d3b503d10c60a231ab0223142407 of dmlc-core, to match the version used by XGBoost.

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

Change this line to 5df8305fe699d3b503d10c60a231ab0223142407:

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

@wphicks Related issue: apache/tvm#4953. This was fixed by dmlc/xgboost#5590 by hiding all C++ symbols in libxgboost.so. It appears that the fix was not complete.

@wphicks
Copy link
Contributor Author

wphicks commented Sep 30, 2020

The fix for the underlying problem will be addressed by dmlc/xgboost#6188. I'll update this PR tomorrow with a workaround that avoids using dmlc's LocalFileSystem to access the file.

@hcho3
Copy link
Collaborator

hcho3 commented Sep 30, 2020

Great! Here are the list of files that uses dmlc::Stream::Create:

$ grep -R dmlc::Stream::Create * --color -I
c_api/c_api.cc:  std::unique_ptr<dmlc::Stream> fo(dmlc::Stream::Create(path, "w"));
compiler/ast_native.cc:        dmlc::Stream::Create(param.annotate_in.c_str(), "r"));
frontend/xgboost.cc:  std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(filename, "r"));
frontend/xgboost_json.cc:  std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(filename, "r"));
frontend/lightgbm.cc:  std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(filename, "r"));

They should all be replaced with the standard C++ library functions for file I/O.

@wphicks wphicks marked this pull request as ready for review October 1, 2020 18:21
@wphicks
Copy link
Contributor Author

wphicks commented Oct 1, 2020

This PR does not eliminate all uses of dmlc::Stream::Create since that would require changes to dmlc-core or more significant rewrites, but it eliminates the usage that causes #210 and which is preventing the merge of #202.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 6, 2020

Closing this out for now, since we're shifting over to xgboost 1.2.1rc1 to handle this instead. Will open a new PR with this approach.

@wphicks wphicks closed this Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault when loading model from XGBoost binary format
2 participants