vulkan : reuse parent extra for views #7806

slaren · 2024-06-06T18:14:22Z

@rhjdvsgsgks @metal3d @stduhpf Can you check if this fixes the issue?

stduhpf · 2024-06-06T18:31:00Z

This does fix the issue for me!

rhjdvsgsgks · 2024-06-06T18:36:54Z

i can confirm it fixed my issue

0cc4m · 2024-06-06T19:02:31Z

@slaren Has something changed about how views are handled? Cause extra->offset should contain the view_offs, they get added during tensor initialization, here.

slaren · 2024-06-06T19:09:16Z

This is what changed (from #7730 (comment)):

@0cc4m this is probably my bad, I made some changes to the way views are initialized in ggml-backend that may have created this issue. Views are now initialized in the buffer of their parent tensor, instead of on the compute buffer. The reason I made this change is because I came to the conclusion that allocating views on the compute buffer cannot work reliably because the compute buffer is not always of the same type as the buffer used to allocate the tensor originally, and backends should be able to use the same extra as their parent anyway. I thought it was safe to make this change because the CUDA backend no longer needs extras for normal buffers, but I didn't realize that the vulkan backend still does.

Looking at the ggml_tensor_extra_gpu of the vulkan backend I think it should be possible to do this, the only change is that you would have to calculate the offset as t->extra->offset + t->view_offs. Essentially, add the offset of the view to the offset of the extra. Does that sound right?

0cc4m · 2024-06-06T19:22:52Z

Sounds right, but it should be either an offset addition on using the tensor or initially, not both, so can the part that I linked be removed?

I can't test it today, but I should be able to tomorrow. Thanks for looking into it.

slaren · 2024-06-06T19:26:24Z

I already changed the part that you linked to reuse the extra of the parent tensor instead of allocating a new one. The problem is that without doing this, the extras are allocated in the buffer of the KV cache, since these are views of the KV tensors, and eventually overwrite the extras of the original tensors. To prevent this it is necessary either to allocate the views in the compute buffer (what happened before), or simply avoid allocating extras for views, which is what this change does.

metal3d · 2024-06-06T19:31:24Z

Seems to be ok for me. Thanks!

0cc4m · 2024-06-07T03:51:18Z

I already changed the part that you linked to reuse the extra of the parent tensor instead of allocating a new one. The problem is that without doing this, the extras are allocated in the buffer of the KV cache, since these are views of the KV tensors, and eventually overwrite the extras of the original tensors. To prevent this it is necessary either to allocate the views in the compute buffer (what happened before), or simply avoid allocating extras for views, which is what this change does.

Right, sorry. I just glanced over the code and missed that.

0cc4m · 2024-06-07T04:14:13Z

This causes a validation error when run with LLAMA_VULKAN_CHECK_RESULTS, some issue with the fences in ggml_vk_compute_forward. I'll debug that this evening.

VUID-vkQueueSubmit-fence-00064(ERROR / SPEC): msgNum: -1082734195 - Validation Error: [ VUID-vkQueueSubmit-fence-00064 ] Object 0: handle = 0x5ad034261e90, type = VK_OBJECT_TYPE_QUEUE; Object 1: handle = 0xeae97200000006d2, type = VK_OBJECT_TYPE_FENCE; | MessageID = 0xbf76c98d | vkQueueSubmit():  (VkFence 0xeae97200000006d2[]) is already in use by another submission. The Vulkan spec states: If fence is not VK_NULL_HANDLE, fence must not be associated with any other queue command that has not yet completed execution on that queue (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkQueueSubmit-fence-00064)
    Objects: 2
        [0] 0x5ad034261e90, type: 4, name: NULL
        [1] 0xeae97200000006d2, type: 7, name: NULL

0cc4m

Found the issue and fixed it.

vulkan : reuse parent extra for views

a3913c2

slaren requested a review from 0cc4m June 6, 2024 18:14

slaren linked an issue Jun 6, 2024 that may be closed by this pull request

Bug: server (at least) craches using VULKAN #7769

Closed

github-actions bot added the Vulkan Issues specific to the Vulkan backend label Jun 6, 2024

Fix validation error when multiple compute contexts are used in a graph

46b6d31

0cc4m approved these changes Jun 7, 2024

View reviewed changes

0cc4m merged commit da799b4 into master Jun 7, 2024
55 of 68 checks passed

slaren mentioned this pull request Jun 12, 2024

llama : offload to RPC in addition to other backends #7640

Merged

slaren deleted the sl/fix-vk-view-extra branch June 12, 2024 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan : reuse parent extra for views #7806

vulkan : reuse parent extra for views #7806

slaren commented Jun 6, 2024

stduhpf commented Jun 6, 2024

rhjdvsgsgks commented Jun 6, 2024

0cc4m commented Jun 6, 2024

slaren commented Jun 6, 2024

0cc4m commented Jun 6, 2024

slaren commented Jun 6, 2024

metal3d commented Jun 6, 2024

0cc4m commented Jun 7, 2024

0cc4m commented Jun 7, 2024

0cc4m left a comment

vulkan : reuse parent extra for views #7806

vulkan : reuse parent extra for views #7806

Conversation

slaren commented Jun 6, 2024

stduhpf commented Jun 6, 2024

rhjdvsgsgks commented Jun 6, 2024

0cc4m commented Jun 6, 2024

slaren commented Jun 6, 2024

0cc4m commented Jun 6, 2024

slaren commented Jun 6, 2024

metal3d commented Jun 6, 2024

0cc4m commented Jun 7, 2024

0cc4m commented Jun 7, 2024

0cc4m left a comment

Choose a reason for hiding this comment