Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan : reuse parent extra for views #7806

Merged
merged 2 commits into from
Jun 7, 2024
Merged

vulkan : reuse parent extra for views #7806

merged 2 commits into from
Jun 7, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Jun 6, 2024

Should fix #7730 #7769

@rhjdvsgsgks @metal3d @stduhpf Can you check if this fixes the issue?

@slaren slaren requested a review from 0cc4m June 6, 2024 18:14
@slaren slaren linked an issue Jun 6, 2024 that may be closed by this pull request
@stduhpf
Copy link
Contributor

stduhpf commented Jun 6, 2024

This does fix the issue for me!

@rhjdvsgsgks
Copy link
Contributor

i can confirm it fixed my issue

@0cc4m
Copy link
Collaborator

0cc4m commented Jun 6, 2024

@slaren Has something changed about how views are handled? Cause extra->offset should contain the view_offs, they get added during tensor initialization, here.

@slaren
Copy link
Collaborator Author

slaren commented Jun 6, 2024

This is what changed (from #7730 (comment)):

@0cc4m this is probably my bad, I made some changes to the way views are initialized in ggml-backend that may have created this issue. Views are now initialized in the buffer of their parent tensor, instead of on the compute buffer. The reason I made this change is because I came to the conclusion that allocating views on the compute buffer cannot work reliably because the compute buffer is not always of the same type as the buffer used to allocate the tensor originally, and backends should be able to use the same extra as their parent anyway. I thought it was safe to make this change because the CUDA backend no longer needs extras for normal buffers, but I didn't realize that the vulkan backend still does.

Looking at the ggml_tensor_extra_gpu of the vulkan backend I think it should be possible to do this, the only change is that you would have to calculate the offset as t->extra->offset + t->view_offs. Essentially, add the offset of the view to the offset of the extra. Does that sound right?

@0cc4m
Copy link
Collaborator

0cc4m commented Jun 6, 2024

Sounds right, but it should be either an offset addition on using the tensor or initially, not both, so can the part that I linked be removed?

I can't test it today, but I should be able to tomorrow. Thanks for looking into it.

@slaren
Copy link
Collaborator Author

slaren commented Jun 6, 2024

I already changed the part that you linked to reuse the extra of the parent tensor instead of allocating a new one. The problem is that without doing this, the extras are allocated in the buffer of the KV cache, since these are views of the KV tensors, and eventually overwrite the extras of the original tensors. To prevent this it is necessary either to allocate the views in the compute buffer (what happened before), or simply avoid allocating extras for views, which is what this change does.

@metal3d
Copy link
Contributor

metal3d commented Jun 6, 2024

Seems to be ok for me. Thanks!

@github-actions github-actions bot added the Vulkan Issues specific to the Vulkan backend label Jun 6, 2024
@0cc4m
Copy link
Collaborator

0cc4m commented Jun 7, 2024

I already changed the part that you linked to reuse the extra of the parent tensor instead of allocating a new one. The problem is that without doing this, the extras are allocated in the buffer of the KV cache, since these are views of the KV tensors, and eventually overwrite the extras of the original tensors. To prevent this it is necessary either to allocate the views in the compute buffer (what happened before), or simply avoid allocating extras for views, which is what this change does.

Right, sorry. I just glanced over the code and missed that.

@0cc4m
Copy link
Collaborator

0cc4m commented Jun 7, 2024

This causes a validation error when run with LLAMA_VULKAN_CHECK_RESULTS, some issue with the fences in ggml_vk_compute_forward. I'll debug that this evening.

VUID-vkQueueSubmit-fence-00064(ERROR / SPEC): msgNum: -1082734195 - Validation Error: [ VUID-vkQueueSubmit-fence-00064 ] Object 0: handle = 0x5ad034261e90, type = VK_OBJECT_TYPE_QUEUE; Object 1: handle = 0xeae97200000006d2, type = VK_OBJECT_TYPE_FENCE; | MessageID = 0xbf76c98d | vkQueueSubmit():  (VkFence 0xeae97200000006d2[]) is already in use by another submission. The Vulkan spec states: If fence is not VK_NULL_HANDLE, fence must not be associated with any other queue command that has not yet completed execution on that queue (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkQueueSubmit-fence-00064)
    Objects: 2
        [0] 0x5ad034261e90, type: 4, name: NULL
        [1] 0xeae97200000006d2, type: 7, name: NULL

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the issue and fixed it.

@0cc4m 0cc4m merged commit da799b4 into master Jun 7, 2024
55 of 68 checks passed
@slaren slaren deleted the sl/fix-vk-view-extra branch June 12, 2024 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: server (at least) craches using VULKAN Bug: gpu hang after bde7cd3cd949c1a85d3a199498ac98e78039d46f
5 participants