Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Disable -DUSE_TVM_OP on GPU builds #18204

Merged
merged 1 commit into from
May 1, 2020
Merged

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Apr 30, 2020

Need to be fixed first before it can be tested on CI

#17886
#17840

@mxnet-bot
Copy link

Hey @leezu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, sanity, clang, website, windows-cpu, centos-gpu, centos-cpu, unix-cpu, unix-gpu, edge, windows-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu leezu mentioned this pull request Apr 30, 2020
7 tasks
@marcoabreu
Copy link
Contributor

Is this issue caused by using the g4 instances or do we have and clue of why they saw suddenly broken?

@leezu
Copy link
Contributor Author

leezu commented Apr 30, 2020

I think this has been broken since it was introduced.
It's unclear why it triggers non-deterministic failure in some cases, but deterministic failures in other. I think that may be due to the error message being swallowed in some cases.

Also, CI hasn't switched to G4 yet

@leezu
Copy link
Contributor Author

leezu commented Apr 30, 2020

@mxnet-bot run ci [unix-gpu, windows-gpu, centos-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu, centos-gpu, unix-gpu]

@marcoabreu
Copy link
Contributor

So how did CI pass if it was broken? 🤔

@leezu
Copy link
Contributor Author

leezu commented Apr 30, 2020

So how did CI pass if it was broken? thinking

Those features were not tested by CI. For some other tests that do rely on the broken feature, they are disabled, for example

https://github.com/apache/incubator-mxnet/blob/1496c91871b9d81d6a18785bdc8a1c3450bedbca/tests/python/unittest/test_deferred_compute.py#L327-L329

But the functionality in #18083 triggers the broken parts of TVMOP, and at this stage I think we should disable TVMOP on GPU feature instead of changing the way #18083 uses the mx.np API

@leezu leezu merged commit 03fdfe0 into apache:master May 1, 2020
@leezu leezu deleted the disabletvmgpu branch May 1, 2020 00:13
@jinboci
Copy link
Contributor

jinboci commented Jun 11, 2020

@leezu Hi, I am fixing the issue #17840. Would reverting changes in this PR reenable all the TVMOp tests?

@jinboci jinboci mentioned this pull request Jun 12, 2020
7 tasks
@leezu
Copy link
Contributor Author

leezu commented Jun 13, 2020

Thank you @jinboci for fixing the issue! Reverting this PR will indeed re-enable the previous test setup, but there may be some conflicts during the revert due to subsequent changes to the CI. You can try it though.

Instead of reverting, my recommendation is to take a look at what tests this PR has removed and think about what the best strategy for testing the TVM_OP feature would be. For example, we can consider to test it only on the CentOS system tests or only on the Ubuntu system tests. But maybe this is not ideal. Do you have any recommendation about the best testing strategy?

Sorry for the late response.

@jinboci
Copy link
Contributor

jinboci commented Jun 13, 2020

@leezu Thank you very much for your suggestions. I don't have recommendations about testing strategy yet; however, I will take a look at the test code and see if there are better ways for testing. Thanks!

AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants