Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cuda #14

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

angry-crab
Copy link
Contributor

Resolves Enable cuda

BTW, I notice that USE_LLVM is set to OFF in config.cmake.patch. Is that intended?

Xinyu Wang added 2 commits January 18, 2023 13:37
Signed-off-by: Xinyu Wang <xinyu.wang@tier4.jp>
Signed-off-by: Xinyu Wang <xinyu.wang@tier4.jp>
@ambroise-arm
Copy link
Contributor

BTW, I notice that USE_LLVM is set to OFF in config.cmake.patch. Is that intended?

I'm not sure why Josh removed it in ee82c84. At the same time this patch is for the ROS1 version of the package, which to my knowledge is only used in a test non-default config of a single package of Autoware.ai. And no one is complaining, so probably not used.

Resolves autowarefoundation/modelzoo#85

Who/what is going to compile this package with the CUDA enablement? The CI? The user?

@angry-crab
Copy link
Contributor Author

angry-crab commented Jan 24, 2023

Who/what is going to compile this package with the CUDA enablement? The CI? The user?

The user, since this package provides the runtime lib.
I made it draft, and we don't want to merge this pr until all models could be compiled for cuda backend.

@angry-crab angry-crab marked this pull request as draft January 24, 2023 09:25
@ambroise-arm
Copy link
Contributor

The user, since this package provides the runtime lib.

Then I am confused on how it would all come together. Does your comment here autowarefoundation/modelzoo#85 (comment) about not having tvm_vendor in universe still stand? Would the user be expected to check out this repository manually?

I made it draft, and we don't want to merge this pr until all models could be compiled for cuda backend.

I don't see any problem with merging this PR now. Compiling the models for CUDA is independent of the runtime functionality provided by this package. And if the CUDA runtime doesn't work on some models, it will most likely be TVM that has to be fixed, not tvm_vendor.

@angry-crab
Copy link
Contributor Author

angry-crab commented Jan 24, 2023

Then I am confused on how it would all come together.

Sorry for the confusion. I think the way how it works currently is that tvm_vendor exists in the docker image anyway because it is released on rosdistro. When we build a docker image, ROS would try to install it.

Does your comment here autowarefoundation/modelzoo#85 (comment) about not having tvm_vendor in universe still stand?

Yes, my comment still stands. After merging this package, we will make a release to rosdistro. Then the package would be available after image built.

Would the user be expected to check out this repository manually?

No, user does not need to checkout this repo. ROS would do it during setup. But if anyone wants to use it before the rosdistro release, he/she has to build this repo manually.

@ambroise-arm
Copy link
Contributor

tvm_vendor exists in the docker image anyway because it is released on rosdistro

Precisely. I am not sure how the release process works, but I assume that it is automated, and that this process uses an environment that doesn't have cuda available.
But even if it did, there is still the problem of the portability of a cuda-enabled tvm_vendor. Here autowarefoundation/modelzoo#85 (comment) I don't think that you tested the problematic case I am thinking of. I've just tried the following:

  1. Use a cuda docker image to build tvm_runtime with cuda enabled (to replicate a release of tvm_vendor with cuda enabed)
  2. Copy the resulting install directory to an non-cuda autoware image (to replicate the distribution of that package to a non-cuda autoware environment)
  3. Try to run a model with non-cuda backend
  4. It fails: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory

Because of the tvm runtime shared object:

$ ldd install/tvm_vendor/lib/libtvm_runtime.so 
	linux-vdso.so.1 (0x00007ffd7cf2a000)
	libcudart.so.11.0 => not found
	libcuda.so.1 => not found
	libOpenCL.so.1 => /lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f3d0d58f000)
	libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f3d0d521000)
	libopenblas.so.0 => /lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007f3d0b0d0000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f3d0aea6000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3d0adbf000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3d0ad9f000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3d0ab77000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f3d0d845000)
	libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f3d0a89c000)
	libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f3d0a852000)

@angry-crab
Copy link
Contributor Author

Copy the resulting install directory to an non-cuda autoware image (to replicate the distribution of that package to a non-cuda autoware environment)

I think the problem occurs here because the runtime was compiled with cuda at the first place. But if you check the Cmakelist of tvm_vendor, it will clone and compile tvm runtime in another environment. I believe rosdistro is not similar to debian packages such that the you may compile the source code in your local machine. But debian packages are compiled libs.

@ambroise-arm
Copy link
Contributor

I believe rosdistro is not similar to debian packages such that the you may compile the source code in your local machine.

I am not familiar with this. But if you are confident that it handles my concerns, then great.
Regardless, this PR can be merged now, worse case it won't change anything from the current state of things.

@angry-crab angry-crab marked this pull request as ready for review January 31, 2023 05:29
@angry-crab
Copy link
Contributor Author

@wep21 Do you have any comment/concern about this pr? Thanks.

@angry-crab angry-crab requested a review from wep21 January 31, 2023 05:30
@wep21
Copy link
Collaborator

wep21 commented Jan 31, 2023

@angry-crab I have no concern about this pr, but will you add this package into autoware repos because ros buildfarm environment does not have cuda?

@angry-crab
Copy link
Contributor Author

angry-crab commented Jan 31, 2023

ros buildfarm environment does not have cuda

I see. That will be a problem. But we probably don't want to add this package to autoware repo because it takes time to compile. I guess it means the user needs to clone and rebuild this package if he/she wants to use cuda.

I thought that when we call rosdistro install, ros would clone this package and compile it locally. Does it mean the user have to clone and rebuild the package anyway if he/she wants to use cuda?

@wep21
Copy link
Collaborator

wep21 commented Feb 1, 2023

Does it mean the user have to clone and rebuild the package anyway if he/she wants to use cuda?

Yes, I think so.

@angry-crab
Copy link
Contributor Author

Yes, I think so.

I guess then we don't have to make a rosdistro release at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable CUDA
3 participants