Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pycuda issue with CUDA toolkit 12.0 #39

Closed
Overcraft90 opened this issue Mar 18, 2023 · 8 comments
Closed

pycuda issue with CUDA toolkit 12.0 #39

Overcraft90 opened this issue Mar 18, 2023 · 8 comments

Comments

@Overcraft90
Copy link

Hi there,

I've been using instaGRAAL for a while and recently happen to upgrade my GPU. I then installed the most recent version of NVIDIA Drivers, see below

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 5000                 On | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P8                5W / 110W|   1881MiB / 16384MiB |     10%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      5133      G   /usr/lib/xorg/Xorg                          793MiB |
|    0   N/A  N/A      6811      G   /usr/bin/gnome-shell                        120MiB |
|    0   N/A  N/A      7746      G   ...vice,SpareRendererForSitePerProcess      287MiB |
|    0   N/A  N/A      7851      G   ...,WinRetrieveSuggestionsOnlyOnDemand      164MiB |
|    0   N/A  N/A      8132      G   ...ures=SpareRendererForSitePerProcess      318MiB |
|    0   N/A  N/A   3421075      G   ...7741775,17899381394246000989,131072       89MiB |
+---------------------------------------------------------------------------------------+

and CUDA toolkit

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0

Unfortunately, for some reason after pyramids build correctly and the tool starts loading them, I get prompted with the following error

Traceback (most recent call last):
File "/usr/local/bin/instagraal", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/instagraal/instagraal.py", line 2146, in main
p2 = window(
File "/usr/local/lib/python3.10/dist-packages/instagraal/instagraal.py", line 212, in init
self.simulation = simulation(
File "/usr/local/lib/python3.10/dist-packages/instagraal/simu_single.py", line 139, in init
self.sampler = sampler_lib(
File "/usr/local/lib/python3.10/dist-packages/instagraal/cuda_lib_gl_single.py", line 297, in init
self.loadProgram(kernel_adapt_entry_point)
File "/usr/local/lib/python3.10/dist-packages/instagraal/cuda_lib_gl_single.py", line 1981, in loadProgram
self.module = pycuda.compiler.SourceModule(
File "/usr/local/lib/python3.10/dist-packages/pycuda-2022.2.2-py3.10-linux-x86_64.egg/pycuda/compiler.py", line 355, in init
cubin = compile(
File "/usr/local/lib/python3.10/dist-packages/pycuda-2022.2.2-py3.10-linux-x86_64.egg/pycuda/compiler.py", line 304, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "/usr/local/lib/python3.10/dist-packages/pycuda-2022.2.2-py3.10-linux-x86_64.egg/pycuda/compiler.py", line 154, in compile_plain
raise CompileError(
pycuda.driver.CompileError: nvcc compilation of /tmp/tmp08zy1h7q/kernel.cu failed
[command: nvcc --cubin -arch sm_75 -I/usr/local/lib/python3.10/dist-packages/pycuda-2022.2.2-py3.10-linux-x86_64.egg/pycuda/cuda kernel.cu]
[stderr:
kernel.cu(39): error: texture is not a template

kernel.cu(438): warning #550-D: variable "selec_smem" was set but never used

Remark: The warnings can be suppressed with "-diag-suppress "

kernel.cu(443): warning #550-D: variable "local_count" was set but never used

kernel.cu(511): warning #177-D: variable "pos_fa_down" was declared but never referenced

kernel.cu(527): warning #177-D: variable "id_alter_contig" was declared but never referenced

kernel.cu(527): warning #177-D: variable "condition_3" was declared but never referenced

kernel.cu(527): warning #177-D: variable "condition_4" was declared but never referenced

kernel.cu(746): warning #550-D: variable "sub_pos_f_pop" was set but never used

kernel.cu(750): warning #550-D: variable "sub_l_cont_f_pop" was set but never used

kernel.cu(752): warning #550-D: variable "l_cont_bp_f_pop" was set but never used

kernel.cu(757): warning #550-D: variable "start_bp_f_pop" was set but never used

kernel.cu(760): warning #550-D: variable "or_f_pop" was set but never used

kernel.cu(1089): warning #550-D: variable "contig_f_pop" was set but never used

kernel.cu(1090): warning #550-D: variable "pos_f_pop" was set but never used

kernel.cu(1091): warning #550-D: variable "sub_pos_f_pop" was set but never used

kernel.cu(1092): warning #550-D: variable "l_cont_f_pop" was set but never used

kernel.cu(1093): warning #550-D: variable "sub_l_cont_f_pop" was set but never used

kernel.cu(1094): warning #550-D: variable "l_cont_bp_f_pop" was set but never used

kernel.cu(1097): warning #550-D: variable "start_bp_f_pop" was set but never used

kernel.cu(1098): warning #550-D: variable "id_prev_f_pop" was set but never used

kernel.cu(1099): warning #550-D: variable "id_next_f_pop" was set but never used

kernel.cu(1381): warning #550-D: variable "contig_f_pop" was set but never used

kernel.cu(1382): warning #550-D: variable "pos_f_pop" was set but never used

kernel.cu(1383): warning #550-D: variable "sub_pos_f_pop" was set but never used

kernel.cu(1384): warning #550-D: variable "l_cont_f_pop" was set but never used

kernel.cu(1385): warning #550-D: variable "l_cont_bp_f_pop" was set but never used

kernel.cu(1386): warning #550-D: variable "sub_l_cont_f_pop" was set but never used

kernel.cu(1389): warning #550-D: variable "start_bp_f_pop" was set but never used

kernel.cu(1390): warning #550-D: variable "id_prev_f_pop" was set but never used

kernel.cu(1391): warning #550-D: variable "id_next_f_pop" was set but never used

kernel.cu(1696): warning #550-D: variable "contig_f_pop" was set but never used

kernel.cu(1697): warning #550-D: variable "pos_f_pop" was set but never used

kernel.cu(1698): warning #550-D: variable "sub_pos_f_pop" was set but never used

kernel.cu(1699): warning #550-D: variable "l_cont_f_pop" was set but never used

kernel.cu(1700): warning #550-D: variable "l_cont_bp_f_pop" was set but never used

kernel.cu(1701): warning #550-D: variable "sub_l_cont_f_pop" was set but never used

kernel.cu(1704): warning #550-D: variable "start_bp_f_pop" was set but never used

kernel.cu(1705): warning #550-D: variable "id_prev_f_pop" was set but never used

kernel.cu(1706): warning #550-D: variable "id_next_f_pop" was set but never used

kernel.cu(1719): warning #550-D: variable "id_prev_f_ins" was set but never used

kernel.cu(2147): warning #550-D: variable "pop_is_ext" was set but never used

kernel.cu(2427): warning #550-D: variable "or_f_cut_a" was set but never used

kernel.cu(2437): warning #550-D: variable "or_f_cut_b" was set but never used

kernel.cu(2755): warning #550-D: variable "id_prev_f_ins" was set but never used

kernel.cu(2997): warning #550-D: variable "or_f_cut" was set but never used

kernel.cu(3375): warning #550-D: variable "sub_pos_fA" was set but never used

kernel.cu(3379): warning #550-D: variable "len_bp_fA" was set but never used

kernel.cu(3380): warning #550-D: variable "sub_len_fA" was set but never used

kernel.cu(3381): warning #550-D: variable "start_bp_fA" was set but never used

kernel.cu(3382): warning #550-D: variable "id_prev_fA" was set but never used

kernel.cu(3383): warning #550-D: variable "id_next_fA" was set but never used

kernel.cu(3384): warning #550-D: variable "circ_fA" was set but never used

kernel.cu(3385): warning #550-D: variable "or_fA" was set but never used

kernel.cu(3390): warning #550-D: variable "sub_pos_fB" was set but never used

kernel.cu(3394): warning #550-D: variable "len_bp_fB" was set but never used

kernel.cu(3395): warning #550-D: variable "sub_len_fB" was set but never used

kernel.cu(3396): warning #550-D: variable "start_bp_fB" was set but never used

kernel.cu(3397): warning #550-D: variable "id_prev_fB" was set but never used

kernel.cu(3398): warning #550-D: variable "id_next_fB" was set but never used

kernel.cu(3399): warning #550-D: variable "circ_fB" was set but never used

kernel.cu(3400): warning #550-D: variable "or_fB" was set but never used

kernel.cu(3742): error: identifier "int2float" is undefined

kernel.cu(3718): warning #550-D: variable "is_activ_fi" was set but never used

kernel.cu(3718): warning #550-D: variable "is_rep_fi" was set but never used

kernel.cu(3718): warning #177-D: variable "swap" was declared but never referenced

kernel.cu(3720): warning #177-D: variable "s" was declared but never referenced

kernel.cu(3722): warning #550-D: variable "is_circle" was set but never used

kernel.cu(3804): error: identifier "int2float" is undefined

kernel.cu(3781): warning #550-D: variable "is_activ_fi" was set but never used

kernel.cu(3781): warning #550-D: variable "is_rep_fi" was set but never used

kernel.cu(3781): warning #177-D: variable "swap" was declared but never referenced

kernel.cu(3783): warning #177-D: variable "s" was declared but never referenced

kernel.cu(3886): error: identifier "int2float" is undefined

kernel.cu(3959): error: identifier "int2float" is undefined

kernel.cu(3942): warning #177-D: variable "tmp_val" was declared but never referenced

kernel.cu(4016): warning #177-D: variable "id" was declared but never referenced

kernel.cu(4062): warning #177-D: variable "start" was declared but never referenced

kernel.cu(4150): error: identifier "int2float" is undefined

kernel.cu(4177): error: identifier "int2float" is undefined

kernel.cu(4290): error: identifier "int2float" is undefined

kernel.cu(4323): error: identifier "int2float" is undefined

kernel.cu(4431): error: identifier "int2float" is undefined

kernel.cu(4404): warning #177-D: variable "row" was declared but never referenced

kernel.cu(4404): warning #177-D: variable "col" was declared but never referenced

kernel.cu(4405): warning #177-D: variable "is_circle" was declared but never referenced

kernel.cu(4642): warning #550-D: variable "max_len" was set but never used

kernel.cu(4650): warning #177-D: variable "min_id_c_new" was declared but never referenced

kernel.cu(4685): error: identifier "int2float" is undefined

kernel.cu(4767): error: identifier "int2float" is undefined

kernel.cu(4810): error: identifier "int2float" is undefined

kernel.cu(4891): error: identifier "int2float" is undefined

kernel.cu(4886): warning #550-D: variable "condition_pix" was set but never used

kernel.cu(4887): warning #177-D: variable "start" was declared but never referenced

14 errors detected in the compilation of "kernel.cu".
]

which at first seemed to be related to kernel.cu but even after following the procedure in issue #18, the problem persists. So, at a first thought I believed something might have changed from CUDA toolkit 11 to 12 so that those lines in the kernel_sparse.cu and in the kernel_sparse_adapt.cu which should handle the changes in the v11 are no longer robust to the latest NVIDIA updates.
Alternatively, it could be an issue with the most recent version of pycuda, the problem is that I tried to go back to last year's version 2021.1 but I'm not allowed to, possibly because of the newer implementation of the NVIDIA Driver and toolkit.

Please, let me know whether there is a solution to this, thanks!

@ABignaud
Copy link
Member

Hi,

It works on cuda 10. I am already surprised that you manage to make it work with cuda 11. It's also not working using the last version of pycuda. I just used a commit of beginning of 2021 (when it was working) and it works for the docker. So basically the solution will be to downgrade your versions and drivers.

If you want to use it without downgrading you drivers, you can use the docker either locally or your computer (see installation in the readme) or using galaxy. It has been installed on the european galaxy server. You can find it at this address: https://usegalaxy.eu/?tool_id=instagraal&version=0.1.6.

Amaury

@Overcraft90
Copy link
Author

Overcraft90 commented Mar 20, 2023

Hi @ABignaud thanks a lot for the info,

I believe I will resort to Docker, as I'm not a big fan of Galaxy (also I believe at this point without a payed account on Galaxy my machine would perform much better). With that said, the only thing concerning me about Docker is this:

Note: Running the container requires the dependency nvidia-docker2 [installation]

which I'm not entirely sure how to manage, or whether I already have this requisite satisfied. Please, let me know since you could make it work on Docker. Thanks!

P. S. Is there, by any chance, interest of having instaGRAAL tuned to work on CUDA toolkit v12 — and possibly later releases?

@ABignaud
Copy link
Member

Hi,

Nvidia-docker2 is just a "docker package" to allow to setup nvidia in a docker container. Once it's installed just pull the image and run it with the gpus option. it will run as the normal container. You just won't have the graphical plot during the scaffolding.

For the PS question, actually there are no one in the lab with enough knowledge and time to do it. We are just maintaining it and providing docker or others stuff to be able to use with modern configuration. Thus, there are no plan to tune it for cuda 12. However if you want to do a pull request or if you know someone who want to do one, we will welcome it.

Amaury

@Overcraft90
Copy link
Author

Overcraft90 commented Mar 23, 2023

Hi @ABignaud

I got my nvidia-docker2 running and, potentially, I might want to switch to a Docker version for the tool; however, it's sad to know it won't show the movie. It was kind of useful to gauge when to terminate the process as it reached convergence.

I'm in contact with some people on the NVIDIA forum, maybe I will switch back to CUDA toolkit v11.x.

@ABignaud
Copy link
Member

Hi,
If you want an idea of what the matrix looks like at each iteration, you can still use the --save-matrix option. It's not as well as the movie, but it still gave you an idea about the convergence.

@Overcraft90
Copy link
Author

Overcraft90 commented Mar 23, 2023

@ABignaud good to know!

For now I'm just trying to get it to work; in fact, when I launch it from Docker with the following

sudo docker run -v `pwd`:/media/storage/ --gpus all koszullab/instagraal instagraal -l 6 -n 50 /media/storage/2.curation/HG00733/instaGRAAL_hap2/ /media/storage/1.assemblies/HG00733_hifi.hic.hap2.fasta /media/storage/2.curation/HG00733/out_hap2/

I get this error message

Traceback (most recent call last):
File "/usr/local/bin/instagraal", line 11, in
load_entry_point('instagraal', 'console_scripts', 'instagraal')()
File "/src/instagraal/instagraal/instagraal.py", line 1178, in main
output_folder=output_folder,
File "/src/instagraal/instagraal/instagraal.py", line 160, in init
output_folder=output_folder,
File "/src/instagraal/instagraal/simu_single.py", line 74, in init
self.select_data_set(name)
File "/src/instagraal/instagraal/simu_single.py", line 743, in select_data_set
thresh_factor=self.thresh_factor,
File "/src/instagraal/instagraal/pyramid_sparse.py", line 58, in build_and_filter
os.mkdir(all_pyramid_folder)
FileNotFoundError: [Errno 2] No such file or directory: '/media/storage/2.curation/HG00733/instaGRAAL_hap2/pyramids'

Which seems to be addressed here #13 and here #20, but for some reason it still prompts me with this message. Maybe there is something wrong with my command?

@ABignaud
Copy link
Member

It seems that you don't have the permission access while you mount your folder in the docker repository. Can you try changing the permissions to all before running the docker:
chmod 777/media/storage/2.curation/HG00733/out_hap2/
chmod 777 $WORKDIR

Also add a working directory in the docker run command to write the log file in a folder where you have the permission.

docker run \
            --gpus all \
            --workdir $WORKDIR \
            --mount "type=bind,src=$WORKDIR,dst=$WORKDIR"  \
          koszullab/instagraal

@Overcraft90
Copy link
Author

Hi @ABignaud thanks a lot this solve the issue!

If one day there will be the chance for someone to work on instaGRAAL to interface with the latest CUDA releases, that would be very cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants