-
Notifications
You must be signed in to change notification settings - Fork 538
Make BERT-GPU deploy compatible with MXNet 1.8 #1389
Conversation
Job PR-1389/1 is complete. |
scripts/bert/bertpass_gpu.cc
Outdated
@@ -30,6 +30,7 @@ | |||
#include <functional> | |||
#include "mxnet/lib_api.h" | |||
|
|||
#if MXNET_1_7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node* node_expand_1_bias = g->addNode(base_name + "_expand_1_bias", "expand_dims"); | ||
Node* node_expand_2_bias = g->addNode(base_name + "_expand_2_bias", "expand_dims"); | ||
Node* node_bcst_like = g->addNode(base_name + "_broadcast_like", "broadcast_like"); | ||
Node* node_add_bias = g->addNode(base_name + "_add_bias", "elemwise_add"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its nice to see that from 1.7 --> 1.8 we were able to condense the number of lines of code users need to write from 16 down to 4! 👍
node_expand_1_bias->attrs["axis"]="0"; | ||
node_expand_1_bias->inputs.resize(1); | ||
node_expand_1_bias->inputs[0].node = node_ffn1_bias; | ||
node_expand_1_bias->inputs[0].entry = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to make the code a bit more succinct you could change:
node_expand_1_bias->inputs.resize(1);
node_expand_1_bias->inputs[0].node = node_ffn1_bias;
node_expand_1_bias->inputs[0].entry = 0;
to:
node_expand_1_bias->inputs.emplace_back(node_ffn1_bias, 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestions. the emplace_back is giving me some problems:
bertpass_gpu.cc:290:64: required from here
/usr/include/c++/7/ext/new_allocator.h:136:4: error: new initializer expression list treated as compound expression [-fpermissive]
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/7/ext/new_allocator.h:136:4: error: no matching function for call to ‘mxnet::ext::NodeEntry::NodeEntry(int)’
In file included from /gluon-nlp/scripts/bert/bertpass_gpu.cc:31:0:
/incubator-mxnet/include/mxnet/lib_api.h:549:8: note: candidate: mxnet::ext::NodeEntry::NodeEntry()
struct NodeEntry {
^~~~~~~~~
/incubator-mxnet/include/mxnet/lib_api.h:549:8: note: candidate expects 0 arguments, 1 provided
/incubator-mxnet/include/mxnet/lib_api.h:549:8: note: candidate: constexpr mxnet::ext::NodeEntry::NodeEntry(const mxnet::ext::NodeEntry&)
/incubator-mxnet/include/mxnet/lib_api.h:549:8: note: no known conversion for argument 1 from ‘int’ to ‘const mxnet::ext::NodeEntry&’
/incubator-mxnet/include/mxnet/lib_api.h:549:8: note: candidate: constexpr mxnet::ext::NodeEntry::NodeEntry(mxnet::ext::NodeEntry&&)
/incubator-mxnet/include/mxnet/lib_api.h:549:8: note: no known conversion for argument 1 from ‘int’ to ‘mxnet::ext::NodeEntry&&
scripts/bert/bertpass_gpu.cc
Outdated
REGISTER_PASS(custom_pass) | ||
.setBody(custom_pass); | ||
|
||
MXReturnValue initialize(int version) { | ||
if (version >= 10400) { | ||
printf("VERSION %i\n", version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to print the version twice?
@@ -321,17 +321,29 @@ def export(prefix): | |||
arg_array['data2'] = mx.nd.ones((test_batch_size, ), dtype='float32') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kh4L didnt you make a change in 1.8 that enables not having to set the inputs like this?
@MoisesHer is setting the shapes/types still not working when using shape_dict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, If I set shapes/types, I am receiving a Segmentation fault: 11
as soon as I call sym.optimize_for
Job PR-1389/2 is complete. |
I am getting the following error, not sure why:
|
Job PR-1389/3 is complete. |
@szha Could we have someone help for the CI failure? |
Job PR-1389/4 is complete. |
Job PR-1389/5 is complete. |
Job PR-1389/6 is complete. |
Job PR-1389/7 is complete. |
Job PR-1389/9 is complete. |
Job PR-1389/10 is complete. |
@szha @TristonC |
I think the cc files are missed from the MANIFEST.in. As a result the cc files are not included in the wheel. cc @samskalicky @MoisesHer for now would it work if you include the cc file here to get unblocked first? |
thanks @szha , |
Yes, I think that would be the fastest way to get unblocked on this PR. |
im confused, we already had this PR apache/mxnet#19393 to add the file in the pip wheel. After this @MoisesHer was able to get this working (hence the PR). But why is it not working all of a sudden, did something else change? @MoisesHer are you using a diff pip wheel than before? Is that wheel built differently? |
Job PR-1389/11 is complete. |
@samskalicky I was able to make it work locally by cloning MXNet 1.8 and compiling, |
i added this line: As part of the apache/mxnet#19393 for this specific reason (after our offline discussion on Slack). I remember building the wheel and making sure the file was there. Maybe i built the wheel differently in my testing than the aws-mx one (or the ones on dist.mxnet.io). Either way, ill work with @szha on apache/mxnet#19850 and we'll backport as necessary |
@samskalicky the problem is that the files will only be included according to the manifest file. The if conditions in setup.py only copies the file in the location of packaging. |
@szha apart of the .cc file, the remaining issue is not related to this PR:
|
@MoisesHer I see. I think the CI needs an update to the python versions. Working on it. |
I finished updating the conda versions on all CI hosts (as well as the docker image). Will babysit this PR. |
Job PR-1389/12 is complete. |
Not sure why python3.5 is still picked up. I'm looking into the CI. |
Hi @szha, do you have any update? thanks |
We will switch the 0.x CI to github actions after #1531 |
ee1a1db
to
309351d
Compare
Codecov Report
@@ Coverage Diff @@
## v0.10.x #1389 +/- ##
===========================================
- Coverage 33.44% 33.43% -0.02%
===========================================
Files 155 155
Lines 15206 15213 +7
===========================================
Hits 5086 5086
- Misses 10120 10127 +7
|
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1389/309351d3983077bac905d9df4f4575594f1633b5/index.html |
@MoisesHer thanks for the change. Would you also port the change to v0.x? |
Description
Change custom graph pass implementation to make it compatible with MXNet 1.8
Solving issue #1388
Checklist
Essentials
Changes
cc @dmlc/gluon-nlp-team, @samskalicky, @Kh4L