Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Android devices produce different results. #4

Open
busiyg opened this issue Aug 17, 2024 · 26 comments
Open

Different Android devices produce different results. #4

busiyg opened this issue Aug 17, 2024 · 26 comments
Labels

Comments

@busiyg
Copy link

busiyg commented Aug 17, 2024

Hello, thank you for your work. I conducted tests in your repository, and the results differ on various Android devices. I have some comparison results and would like to understand where the specific issue lies.
i use this project: https://github.com/b0nes164/UnityGaussianSplatting
And i use Vulkan api to build android apk

device1 (pico 4):https://vulkan.gpuinfo.org/displayreport.php?id=22263
Works properly

device2 (quest pro):https://vulkan.gpuinfo.org/displayreport.php?id=32037
At first, it seems to work normally, but within a few seconds, the Gaussian splashes gradually decrease until only a few remain.

device3 (xiaomi pad 6):https://vulkan.gpuinfo.org/displayreport.php?id=27171
The Gaussian splashes never appeared.

They both have the Adreno™ 650 GPU, yet the results are different. Could you provide some suggestions? Thank you.

@b0nes164
Copy link
Owner

Hi @busiyg, thanks for your interest and the data you've collected.

As your data suggests, this bug has been very difficult to reason about. I also don't have any Qualcomm devices myself, so it has been difficult for me to test fixes.

With testing help from another user, I know that the issue is occuring somewhere in the multisplit key ranking here, but why it fails is still a mystery. What I do know is that it does not have anything to do with the wave/warp/subgroup size, as based on testing, the Qualcomm devices have chosen wave size 64 in every test I've run. I suspect that the issue is occuring somewhere in the transpilation from HLSL -> SPIR-V, but based on your data this might not even be the case.

I will push one more attempted fix to my GaussianSplatting fork, but if it still does not work, I would suggest replacing DeviceRadixSort with the previous sort here. Please let me know if the fix works thanks!

@busiyg
Copy link
Author

busiyg commented Aug 20, 2024

Hi,i try the update,and now the pico4 can't work properly,At first, it seems to work normally, but within a few seconds, the Gaussian splashes gradually decrease until only a few remain. the other devices also had old problem...

@b0nes164
Copy link
Owner

You can try change waveFlags &= t ? ballot : ~ballot; to waveFlags &= (t ? ballot : (~ballot)); in function WarpLevelMultiSplitWGE16, this can show small Gaussian Spaltting model. @b0nes164

Originally posted by @LeeSYSU in #3 (comment)

@b0nes164
Copy link
Owner

I will try that, thanks.

@b0nes164
Copy link
Owner

New patch.

@busiyg
Copy link
Author

busiyg commented Aug 21, 2024

Same result bro.

@LeeSYSU
Copy link

LeeSYSU commented Aug 21, 2024

Some operators precedence is inconsistent on some platforms after transpilation from HLSL -> SPIR-V, I guess!
I may have modified something unintentionally.
The video below is displayed on an Android phone(backend is vulkan).

20240821_GaussianTest.mp4

@sam598
Copy link

sam598 commented Aug 22, 2024

Glad to see there might be some progress on this. Because the gaussian splats degrade over time and don't recover unless the gpu memory is reset, would this likely have something to do with a race condition, or the offset being out of sync?

@LeeSYSU
Copy link

LeeSYSU commented Aug 23, 2024

I tested this library using the following two devices ,build by Unity 6 Preview(6000.0.14f1 , il2cpp , .NetFramework)

  1. XiaoMi 14: it is ok, GPU is Adreno 750, vulkan api: 1.3
  2. XiaoMi 13: it is ok, GPU is Adreno 740, vulkan api: 1.3
  3. RongYao ChangXiang 7: it is not ok, GPU is Adreno 308, do not support vulkan

vulkan info from this.

@b0nes164
Copy link
Owner

Thanks you for this data. Would you happen to have an Adreno 6xx series GPU to test on?

@edhyah
Copy link

edhyah commented Aug 26, 2024

Hey @b0nes164, thank you for the awesome work. Any update on the bug solution? Looking into the code to help out, but asking in case there's more progress.

@LeeSYSU
Copy link

LeeSYSU commented Aug 27, 2024

Thanks you for this data. Would you happen to have an Adreno 6xx series GPU to test on?

I will test it.

@b0nes164
Copy link
Owner

Hi @edhyah, could you link me to which commit of UnityGaussianSplatting you used in this comment here, as I just want to confirm which version you got these results on.

As for helping with the code, one immediate thing you can do is try to see if you can replicate @LeeSYSU's results on your Meta Quest Pro: Update your project to Unity 6 Preview and see if it runs. Make sure to use the code on my fork of UnityGaussianSplatting repo though, just to be up to date with the fixes I've attempted.

@edhyah
Copy link

edhyah commented Aug 28, 2024

@b0nes164 I'm using commit b6c0dca from aras-p/UnityGaussianSplatting (v0.7) which leads to a black screen.

I tried LeeSYSU's results (Unity 6 Preview, il2cpp, .NET framework), including the commit that they used (cf41aa9), but I still get disappearing splats on my Quest Pro and Quest 3.

That being said... I did notice that the splats don't disappear when I'm using a much smaller scene (282k splats versus the original 1 million in the train scene). Maybe memory is being overwritten somehow?

@b0nes164
Copy link
Owner

I'm using commit b6c0dca from aras-p/UnityGaussianSplatting (v0.7) which leads to a black screen.

Hmm that is extremely wierd. As you noted in the other thread, the original thread author said that the Splatting ran fine with the older version that didn't use my sort. I also know for a fact that my sort is breaking when run on Adreno 650.

I did notice that the splats don't disappear when I'm using a much smaller scene (282k splats versus the original

A smaller buffer size working correctly seems to suggest an OOB error on one of the buffers, but from testing in isolation I haven't seen OOB issues. All the issues point to the multispltting not correctly being counted on the Adreno chips.

@b0nes164
Copy link
Owner

b0nes164 commented Sep 7, 2024

Ok, so after encountering a very similar issue when coding in GLSL, I'm pretty sure I figured out whats going on. The issue is not in the transpilation from HLSL->SPIRV, but an issue with SPIRV itself pre SPIRV-1.6.

To quote David Neto's response from this issue thread here:

  1. In the beginning there was Vulkan 1.1, SPIRV-1.3. That added the subgroupSize in >VkPhysicalDeviceSubgroupProperties. Things were simple: a GPU had a single subgroup size and >gl_SubgroupSize gave it to you in the shader.
  1. Then Intel introduced GPUs that could choose a subgroup size of 8, 16, or 32, which is more flexible than what Vulkan 1.1 anticipated. So VK_EXT_subgroup_size_control was created, and then incorporated into Vulkan 1.3. This is when gl_SubgroupSize gets the unexpected behaviour. The unexpected behaviour is now deprecated. In SPIR-V 1.6 and later, or if you specify VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT then gl_SubgroupSize is the "real" size of the subgroup. But it doesn't have to match the subgroupSize physical device property. Instead it's bounded between minSubgroupSize and maxSubgroupSize from the VkPhysicalDeviceSubgroupSizeControlProperties or VkPhysicalDeviceVulkan13Properties. (There's more: you can control the subgroup size at pipeline creation time.....)

Basically, on Quallcomm devices, the wave/subgroup size can vary between 64 and 128. However in pre Vulkan1.3/SPIRV-1.6, gl_subgroupSize would return a constant value from the API, causing the code to be incorrect.

My testing did not catch this because I did the majority of my testing D3D12/HLSL, and in HLSL, WaveGetLaneCount() returns the correct value of the wave. Furthermore, because I did my Vulkan testing on an Nvidia card, which has a fixed wave size, I was not able to reproduce the issue.

I'm not aware of a way of specifying a shader compilation target in Unity, so we can't target SPIRV-1.6. Instead, we get the correct wave size by counting the bits from WaveActiveBallot().

@edhyah or @LeeSYSU could you test this patch? The specific Unity version does not matter.

@sam598
Copy link

sam598 commented Sep 8, 2024

@b0nes164 where is this patch available to test?

@edhyah
Copy link

edhyah commented Sep 8, 2024

@sam598 https://github.com/b0nes164/UnityGaussianSplatting

I'll try it out later tonight. Thanks in advance for your awesome work!

@tangxianbo
Copy link

When running on iPhone12 mini, the screen is black and the following error is reported:
截屏2024-09-15 21 12 42

@b0nes164
Copy link
Owner

Hi @tangxianbo are you sure the issue is specifically from the sorting? There are other issues that may be causing crashing unrelated to the sort, and I cannot tell from this screenshot what is happening.

@sam598 @edhyah Any update? Thanks.

@tangxianbo
Copy link

Hi @b0nes164 , there was no problem when I used the original version https://github.com/aras-p/UnityGaussianSplatting.
After switching to your forked version, it reported this error, so I first thought that the problem might be in sorting.
Of course, it could also be other reasons.

@b0nes164
Copy link
Owner

@tangxianbo That makes a lot of sense now, thanks for the info.

@b0nes164 b0nes164 added the Unity label Sep 23, 2024
@tangxianbo
Copy link

When running on iPhone12 mini, the screen is black and the following error is reported: 截屏2024-09-15 21 12 42

Hi @b0nes164 , when I use unity 6000.0.20f1, this error is gone.

@ninjamode
Copy link

ninjamode commented Oct 2, 2024

Hi @b0nes164

As no one seemed to have chimed in here so far, I can sadly report that your current changes do not fix the sorting issues on a Meta Quest Pro / Adreno 650. Tried your repo at https://github.com/b0nes164/UnityGaussianSplatting with current Unity 2023.3 and Unity 6, and both still have the disappearing splats issue.

I am not sure if it's related, but compared to your sort on other devices, where there is very little jitter in runtimes, and also compared to FFX sort on the Quest Pro, I am seeing massive performance drops every few seconds. Might be the Quest being weird though.

@b0nes164
Copy link
Owner

b0nes164 commented Oct 5, 2024

Hi @ninjamode, thanks for the updates. I'm still not 100% if my fix from last month even works correctly on Quallcomm, but another issue could be subgroup convergence, which is a whole other bag of worms. The overriding issue is that even though the latest Unity version has access to Vulkan 1.3, which has all the nice extensions that could possibly adress this issue, I'm not sure if Unity exposes access to them.

@moddyz
Copy link

moddyz commented Oct 7, 2024

Hey @b0nes164 I'm currently testing your DeviceRadixSort (from GPUSorting, not UnityGaussianSplatting) where my Unity editor (2022.3.37f1) is running on a MacOS (specifically Macbook M3 Pro), deploying to a Meta Quest 3.

What I ran into:

  • It works on the Windows, Mac, Linux (Metal) platform which is great.
  • When I switch my Editor build platform to "Android", in the editor's Player: the sorting algorithm starts to produce erratic results. It turns out WaveGetLaneCount() is returns 32 but WaveGetLaneIndex() always returns values from 0-3 instead of 0-31. The platform is "Android (Metal)" - I don't know the difference between that and the Windows, Mac, Linux (Metal) - would it use a different shader compiler (how do I check)? I've added a local helper function that returns gtid % WaveGetLaneCount() to stand in for WaveGetLaneCount... which seems fixes the sort.
  • The sort is not working on the Meta Quest 3. The WaveGetLaneCount() is 64. I'm sorting pairs of (float, uint) and the keys become NaN. After a bunch of debugging with a simple scene setup with 10 splats and I have a suspicision it's failing in WaveHistReductionExclusiveScanWGE16 during wave prefix summing as the g_d shared memory values start to diverge from the values I'm seeing on my Macbook (but I don't know enough about the algorithm if that's to be expected given the wave lane counts are different)

If you had any additional insight or pointers in the right direction that would be amazing. Happy to test any new code / dev branches / pull any diagnostics that could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants