CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762

nasawyer7 · 2024-01-03T18:23:12Z

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
I have a 6700s amd gpu, 8gb vram. I got ooga to work on this computer, but I can't get llama.ccp to work. I compiled with
make clean && make -j16 LLAMA_HIPBLAS=1 AMDGPU_TARGETS=gxf1032
And everything went fine. However, when I try to run, I do export HSA_OVERRIDE_GFX_VERSION=10.3.0
then HIP_VISIBLE_DEVICES=0 ./main -ngl 50 -m /home/lenovoubuntu/Downloads/text-generation-webui-main/models/dolphin-2.6-mistral-7b-dpo.Q4_K_M.gguf -p "Write a function in TypeScript that sums numbers".
(I do HIP devices function since my devices has an igpu as well).

It returns .................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 64.00 MB
llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 76.19 MiB
llama_new_context_with_model: VRAM scratch buffer: 73.00 MiB
llama_new_context_with_model: total VRAM used: 4232.06 MiB (model: 4095.06 MiB, context: 137.00 MiB)
CUDA error: invalid device function
current device: 0, in function ggml_cuda_op_flatten at ggml-cuda.cu:7971
hipGetLastError()
GGML_ASSERT: ggml-cuda.cu:226: !"CUDA error"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

So, I ran it as as sudo, as it suggested using this command. sudo LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH HSA_OVERRIDE_GFX_VERSION=10.3.0 HIP_VISIBLE_DEVICES=0 ./main -ngl 50 -m /home/lenovoubuntu/Downloads/text-generation-webui-main/models/dolphin-2.6-mistral-7b-dpo.Q4_K_M.gguf -p "Write a function in TypeScript that sums numbers"
I used all of those environment variables since ooga required them, and I was hoping they would fix things here too.

However, that just returns this after seemingly loading the model.

CUDA error: invalid device function
current device: 0, in function ggml_cuda_op_flatten at ggml-cuda.cu:7971
hipGetLastError()
GGML_ASSERT: ggml-cuda.cu:226: !"CUDA error"
[New LWP 23593]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f34398ea42f in __GI___wait4 (pid=23599, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0 0x00007f34398ea42f in __GI___wait4 (pid=23599, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x000055fb56cca7fb in ggml_print_backtrace ()
#2 0x000055fb56d90f95 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#3 0x000055fb56d9da1e in ggml_cuda_op_flatten(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_tensor const, ggml_tensor const*, ggml_tensor*, float const*, float const*, float*, ihipStream_t*)) ()
#4 0x000055fb56d92df3 in ggml_cuda_compute_forward ()
#5 0x000055fb56cf8898 in ggml_graph_compute_thread ()
#6 0x000055fb56cfca98 in ggml_graph_compute ()
#7 0x000055fb56dbc41e in ggml_backend_cpu_graph_compute ()
#8 0x000055fb56dbcf0b in ggml_backend_graph_compute ()
#9 0x000055fb56d2b046 in llama_decode_internal(llama_context&, llama_batch) ()
#10 0x000055fb56d2bb63 in llama_decode ()
#11 0x000055fb56d66316 in llama_init_from_gpt_params(gpt_params&) ()
#12 0x000055fb56cbc31a in main ()
[Inferior 1 (process 23582) detached]
Aborted

dariox1337 · 2024-01-04T07:19:13Z

I get a similar error CUDA error: invalid device function current device: 0, in function ggml_cuda_op_flatten at ggml-cuda.cu:7971 on amd 780m (igpu) while trying to run any model.
llama.cpp compiled with LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1100 and ran with HSA_OVERRIDE_GFX_VERSION=gfx1100
ROCm version 5.7.1

TheAceBlock · 2024-01-07T03:19:17Z

I also had similar error when running on my gfx90c device (which needs to be overridden to gfx900).

What solved the problem for me was also setting the environment variable HSA_OVERRIDE_GFX_VERSION when running make (together with the AMDGPU_TARGETS, although I'm not exactly sure if this value actually changes anything).

So for me, the make command would look like this:

HSA_OVERRIDE_GFX_VERSION=9.0.0 make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx900

I honestly didn't think that this would work at all, but it certainly did! For me, though, since my iGPU lacks INT8 operators, performance was worse than just using CPU, but it did run on the iGPU (checked with nvtop).

Hope that this works for you too!

My guess on why this hasn't been reported much

I would say that quite a few people have already ran export HSA_OVERRIDE_GFX_VERSION=xxx before, which would make this environment variable available to all programs running in the shell, making subsequent explicit declaration unnecessary.

dariox1337 · 2024-01-07T18:50:53Z

What solved the problem for me was also setting the environment variable HSA_OVERRIDE_GFX_VERSION when running make (together with the AMDGPU_TARGETS, although I'm not exactly sure if this value actually changes anything).

Thank you! This hint finally allowed me to run all 33 layers of Mixtral Q5_K_M on iGPU. Since it's an APU with shared ram, it can't compete with dGPUs, but the speedup is close to 70% nonetheless.

CPU (7840u):

llama_print_timings:        load time =    2052.23 ms
llama_print_timings:      sample time =     111.57 ms /   727 runs   (    0.15 ms per token,  6515.97 tokens per second)
llama_print_timings: prompt eval time =   34619.23 ms /   538 tokens (   64.35 ms per token,    15.54 tokens per second)
llama_print_timings:        eval time =  248061.72 ms /   726 runs   (  341.68 ms per token,     2.93 tokens per second)
llama_print_timings:       total time =  283023.52 ms

GPU (780m):

llama_print_timings:        load time =   39038.83 ms
llama_print_timings:      sample time =     132.02 ms /   867 runs   (    0.15 ms per token,  6567.34 tokens per second)
llama_print_timings: prompt eval time =   44011.30 ms /   538 tokens (   81.81 ms per token,    12.22 tokens per second)
llama_print_timings:        eval time =  181460.51 ms /   866 runs   (  209.54 ms per token,     4.77 tokens per second)
llama_print_timings:       total time =  225876.68 ms

Strangely, prompt processing is slower on GPU.

github-actions · 2024-05-09T01:06:29Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

nasawyer7 added the bug-unconfirmed label Jan 3, 2024

tdavie mentioned this issue Jan 4, 2024

CUDA error: unknown error when offloading to gfx1035 #4770

Closed

github-actions bot added stale and removed stale labels Mar 22, 2024

github-actions bot added the stale label Apr 24, 2024

github-actions bot closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762

CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762

nasawyer7 commented Jan 3, 2024

dariox1337 commented Jan 4, 2024 •

edited

Loading

TheAceBlock commented Jan 7, 2024

dariox1337 commented Jan 7, 2024

github-actions bot commented May 9, 2024

CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762

CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762

Comments

nasawyer7 commented Jan 3, 2024

dariox1337 commented Jan 4, 2024 • edited Loading

TheAceBlock commented Jan 7, 2024

dariox1337 commented Jan 7, 2024

github-actions bot commented May 9, 2024

dariox1337 commented Jan 4, 2024 •

edited

Loading