Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747

Merged
merged 25 commits into from
May 8, 2023

Conversation

sommerlukas
Copy link
Contributor

Extend kernel fusion for the CUDA backend.

In contrast to the existing SPIR-V based backends, the default binary format for the CUDA backend (PTX or CUBIN) is not suitable as input for the kernel fusion JIT compiler.

This PR therefore extends the driver to additionally embed LLVM IR in the fat binary if the user specifies the -fsycl-embed-ir during compilation, by taking the output of the sycl-post-link step for the CUDA backend.

The JIT compiler has been extended to handle LLVM IR as input format and PTX assembly as output format (including translation via the NVPTX backend). Target-specific parts of the fusion process have been refactored to TargetFusionInformation.

The connecting logic to the JIT compiler in the SYCL RT has been extended to produce valid PI device binaries for the CUDA backend/PI.

Heterogeneous ND ranges are not yet supported for the CUDA backend.

@sommerlukas
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1683

@sommerlukas
Copy link
Contributor Author

@sergey-semenov: The changes to graph_builder.cpp and commands.hpp are necessary to avoid deletion of dependencies.

With fusion, the individual commands for each kernel are replaced by a single command for the fused kernel, and the original commands are deleted without execution. Without this modification, the destructor of the command would call cleanDepEventsThroughOneLevel, which would not only delete the dependency edges of the original command, but also its dependencies.

This would yield an incomplete dependency graph. Instead, the dependencies of the original command are deleted before removal, but only in the case a command is removed without execution during fusion.

@sommerlukas sommerlukas temporarily deployed to aws March 23, 2023 09:06 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws March 23, 2023 09:37 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws March 28, 2023 14:58 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws March 28, 2023 17:54 — with GitHub Actions Inactive
@sommerlukas sommerlukas requested a review from a team as a code owner April 5, 2023 14:35
@sommerlukas sommerlukas temporarily deployed to aws April 5, 2023 15:10 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws April 5, 2023 15:57 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws April 18, 2023 12:17 — with GitHub Actions Inactive
Handle reqd_work_group_size and work_group_size_hint attributes.

Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Parse each input binary only once.

Groom the nvvm annotations for functions deleted before fusion.

Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the runtime and design doc changes LGTM. Only a couple minor nits.

sycl/doc/design/KernelFusionJIT.md Outdated Show resolved Hide resolved
sycl/source/detail/scheduler/commands.hpp Show resolved Hide resolved
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
@sommerlukas sommerlukas temporarily deployed to aws May 4, 2023 15:29 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws May 4, 2023 20:46 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented May 4, 2023

@intel/dpcpp-clang-driver-reviewers, could you review driver's changes, please?

Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for driver

@sommerlukas
Copy link
Contributor Author

@intel/llvm-gatekeepers This is now approved, could someone please merge this?

@aelovikov-intel aelovikov-intel merged commit a93e59d into intel:sycl May 8, 2023
@aelovikov-intel
Copy link
Contributor

I'm seeing

Unexpectedly Passed Tests (1):
  SYCL :: KernelFusion/device_info_descriptor.cpp

in CUDA pre-commit CI tasks on unrelated PRs. @sommerlukas would you take a look at it, please?

@sommerlukas
Copy link
Contributor Author

I'm seeing

Unexpectedly Passed Tests (1):
  SYCL :: KernelFusion/device_info_descriptor.cpp

in CUDA pre-commit CI tasks on unrelated PRs. @sommerlukas would you take a look at it, please?

@aelovikov-intel Is pre-commit CI running with the latest version of the e2e tests, specifically the device_info_descriptor test? Prior to this PR, the test was marked XFAIL for cuda, but that was removed in this PR (only XFAIL on hip now).

If the version of the e2e tests in pre-commit CI still expects XFAIL for cuda, but the test now passed, that would explain the unexpected pass on CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants