Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add GPU-optimization for split op #19131

Merged
merged 6 commits into from
Oct 11, 2020
Merged

Conversation

MoisesHer
Copy link
Contributor

@MoisesHer MoisesHer commented Sep 13, 2020

Description

Optimization of split operator on GPU

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Added a specific split operator for GPU
  • The implementation includes optimal CUDA kernel that identifies if split is performed along last_axis or another, having different paths depending on that.

Performance for several scenarios can be find here:
https://docs.google.com/spreadsheets/d/1ksQcOetbs3MDAhT5pGaU3vKMoExQK-oVFqjXsol44eQ/edit?usp=sharing
When the last axis is smaller than 128, the new implementation performs in general worse than original version. Thus, in those cases we redirect those scenarios to run the original version.

@mxnet-bot
Copy link

Hey @MoisesHer , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, edge, windows-cpu, windows-gpu, unix-cpu, website, unix-gpu, clang, sanity, centos-cpu, centos-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 13, 2020
tests/python/gpu/test_operator_gpu.py Outdated Show resolved Hide resolved
tests/python/gpu/test_operator_gpu.py Outdated Show resolved Hide resolved
tests/python/gpu/test_operator_gpu.py Show resolved Hide resolved
tests/python/gpu/test_operator_gpu.py Outdated Show resolved Hide resolved
@@ -2319,3 +2319,21 @@ def test_fp16_spmm():
out = mxsps.dot(inp, weight)
out_np = mx.nd.dot(inp, weight)
assert_almost_equal(out.asnumpy(), out_np, rtol=1e-3, atol=1e-5)

@with_seed()
@pytest.mark.serial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to mark as serial

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the mark.serial is triggered for all tests in this file. Thus, we may keep mark.serial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run tests based on the tag and it has nothing to do with file. serial is only needed when test invocation is long-running and consumes lots of memory. Since this is no longer the case through parametrizing the input, the serial tag is not needed.

please open a follow up PR to finish the change.

@sxjscience sxjscience added Performance Operator and removed pr-work-in-progress PR is still work in progress labels Oct 11, 2020
@sxjscience sxjscience merged commit 16eb89b into apache:master Oct 11, 2020
chinakook pushed a commit to chinakook/mxnet that referenced this pull request Nov 17, 2020
* Add GPU-optimization for split op

* Complete operator

* unit-test: use parametrize

* fix lint

* fix lint

* fix lint
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants