-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Steamwebhelper is not responding" crash menu with home folder on NFS #10431
Comments
I agree that this is not the same problem as #10412, despite the superficially similar symptoms. In #10412, we don't get as far as the container runtime starting. In this issue, the container starts up fine and hands over control to This in
8018 seems to be
I should mention here that one possible route towards solving #10412 is to use the I would recommend putting the Steam installation (usually |
Steam Beta is borked, I can reproduce this crash on Arch Linux with clean Steam (removed
Current workaround is to switch back to non-Beta:
|
@smcv I'm not sure what to look for with regard to the certificates. I don't believe anything is non-standard there. Pointers as to what to check would be appreciated. The machine is my daily-driver and I've not seen anything else fail relating to certs/TLS/etc. As to the flock issue: The core problem is NFS doesn't support flock(), that only works for local filesystems. The previously mentioned workaround was a tiny C program someone contributed to #5788 called "fakeflock.c". I'd rather not move everything to a local filesystem as I have the NFS server for convenient backups and rolling snapshots (every 15 minutes) for the entire family (thankfully, they're not on the beta). Snapshot rollbacks have saved each of us from various catastrophes numerous times. I've been using it this way for well over a decade and it's worked very well for us so far. If NFS is the issue, we're likely not the only ones that will be impacted by this due to the popularity of various DIY/home NAS solutions out there. In an attempt to test it out, I unmounted, unshared, and relocated my steam filesystem on the server to prevent automatic mounting, then created a Steam subdirectory on a local filesystem with enough space, linked ~/.local/share/Steam back to this location, removed ~/.steam*, the relaunched Steam to force a full re-install. I then logged into Steam, switched over to the beta, at which point it segfaulted (I grabbed the logs, below). I then restarted Steam again and the beta came right up. Crash text:
|
@davispuh Is your home directory (or Steam installation path) mounted via NFS? |
Well it's bit complicated 😂 My home directory is not using NFS but local Here's summary of mounts:
That last
And
PS. These are just relevant excerpts |
@davispuh Unfortunately, I know very little about brtfs, so I don't know where the overlap between its capabilities and NFSv4 would be. @smcv I forgot to mention that I'm running NFSv4, which has different locking mechanisms than NFSv3 (neither of which natively support flock). |
I did some testing due to getting pinged. After switching to the beta, at first the Steam UI didn't show up at all and I also didn't get this "not responding" dialog. This reproduced a couple of times. I then tried running Steam on a local drive (ext4 filesystem), which worked. After that running on the NFS mount worked too. To further confirm the functionality I tried it on another computer, with the same NFS mount. There I got the "not responding" dialog, but the Steam UI showed up and worked as well. After choosing to restart either steam or just steamwebhelper the dialog did not reappear. I have to get on with other things, but maybe I'll try a clean install on NFS later. Both computers are running Debian unstable and have Nvidia GPUs with driver version 525.147.05. Edit: I ran the local Steam installation by changing HOME to point at a local mount. It's not impossible that it could have affected the installation on NFS, though it definitely did run from the local drive. |
Following DataBeaver's lead, I redid my fresh local install per my above description (using a link). However, one thing I noticed was a large number of errors when running a diff between the local and NFS copies. I believe all of these were dead links.
The confusing part is, this is true of both installations, so I'm not even sure how sniper can run in the local installation if this is causal in the NFS installation (so this very well may be a red-herring). Digging into them in more detail, most appear to be dead links to the /run/ hierarchy, so can be ignored.
Filtering down to absolute path links, I see the certificate issue you mentioned. It appears to be looking for different filenames than I have on my system, as I see a mixture of near-misses (eg: different numbers) and completely missing ones (eg: Staat*). Next I see that it's linking to font configs that don't exist in /usr/share/fontconfig/conf.avail/ (I have 29 total in that directory, perhaps I'm missing a package?). Digging into a few of the oddballs at the end of the list, it's become apparent to me that this runs in some sort of chrooted environment with multiple filesystem overlays or something along those lines (sorry, I'm unfamiliar with exactly what the steam-runtime-* installations do), as I'm finding the absolute paths stuffed under various different hierarchies in the Steam installation. Extrapolating this back to the certs, and it looks like they are actually there too, but in yet another subdirectory. For example, Backing up and filtering for relative paths, I find a few additional causes. Here's the list of dead links with the /run/ links stripped out: |
I can't tell at this stage whether the problem that @glabifrons is having is to do with NFS or not, so this might all be a red herring. But, we are going to need this sooner or later, so...
Does it support POSIX process-associated record locks ( We are going to need to put some sort of locking into place, otherwise we get bizarre failure modes like one process deleting a temporary runtime that another process is still using. Sorry, but avoiding that is more important than supporting NFS. If At the moment, the container runtime tries to use the Linux-specific
(If necessary, copy the whole The The last command (with
Disabling locking like this is not a solution. This will lead to concurrent processes all believing that they have the lock at the same time, and overwriting or deleting files that the other concurrent processes were using. |
We do not have enough information on this issue to be able to guess whether the failure mode you are seeing with the beta is the same as @glabifrons is seeing, or the same as #10412, or some different thing. Please look at the logs in If we try to handle multiple different problems on the same issue number, it quickly becomes really confusing, which makes it take longer to solve any of the problems that were reported; so we should reserve this issue number for the specific problem that @glabifrons is experiencing (which unfortunately we have not yet been able to identify). If we can identify that something different is going wrong for you, please open a separate issue for that, with a title that is as specific as possible.
I don't know whether any of these will interfere with the container runtime. My first guess would be that RAID shouldn't matter, because that's at a lower level than anything we're doing, but the others might. If you can try launching Steam on the same system but from a home directory that is as "ordinary and boring" as possible (perhaps by creating a temporary user whose home directory is on local disk and is not NFS-exported, and logging in as that user) then that will help to narrow down whether any of these less-usual configurations are involved. |
Unfortunately, I think this is normal if it takes an unusually long time for the If your NFS mount has enough latency to make small metadata operations like |
Back to @glabifrons:
Yes, it's the Steam container runtime, which has quite a lot of code in common with Flatpak. It's normal that some of the files below Looking at your list of dangling symlinks, the majority of them are very likely to work as intended inside the container. I do notice one bug, but it's a bug that will only affect developers who are running this stuff in a non-default configuration that isn't relevant to end-user systems. If you are copying Steam installations between filesystems, you can delete all of You can verify that
This checks both metadata and content of all of the files in there, so expect it to take up to 30 seconds on HDD, and perhaps longer on NFS. You can also get an interactive shell inside the container by running:
Inside that It would be useful for me to see a detailed log from the container runtime framework, which you can get by running:
(You can just exit from the The log file will appear in |
I'm sure that's desirable, but remote filesystems have functionality and performance characteristics that are very much unlike local filesystems, and we can't support every possible scenario. As currently implemented, the whole At the moment the way it's implemented doesn't allow for it to be a symlink or a mount point, but I'll see whether that can become possible in future. |
A quick look at the relevant manpages tells me that while NFS doesn't support
This works for me on my NFS home directory. As does Steam itself. So I think NFS is at most a contributing factor, not the root cause. It will be interesting to see glabifrons's results for the lock test. Could be that we have some configuration differences.
Understandable. It's a relatively minor annoyance, but if you want to do something about it, maybe add an option to wait a bit longer without restarting anything? Or even keep checking for responsiveness while the dialog is up, and hide it if steamwebhelper starts responding after all. |
Hmm I thought this is only issue for My crash is not #10412 because my
but that didn't change anything and
works fine without issues. In logs nothing in particular stands out
And here are backtraces but I don't know how to get symbols for it?
|
The steamwebhelper has had several major changes in the new beta. Some issues caused by those changes (like #10412) are to do with the fact that it is now running inside a container runtime, like Counterstrike 2 and Dota 2 do. Others could be to do with changes inside the steamwebhelper itself. At the moment, unfortunately I don't see enough information here to be able to say whether you are experiencing the same crash as the original reporter of this particular issue or not.
In the beta that was active over the weekend, the log was truncated every time the steamwebhelper restarted, which was unhelpful because it meant that previous error messages could be lost. Please try updating Steam to the current beta 1706390103, which has stopped truncating the log every time, so should get you better logs. You might need to do this by swapping to the stable branch (completely exit from Steam,
The general public cannot get debug symbols for the proprietary parts of Steam (and neither can I), but Valve can. In your original report, you quoted log output that said Steam has uploaded a crash dump, |
All information in my comment is with latest Beta version (today installed), there you can see Looks like it crashes very early in startup as there isn't any other log entries after In non-Beta version I see
But this is not present in Beta we never reach it. Crash seems to be inside So they need to look into this. My latest |
@smcv I tried the adverb commands both on the local filesystem and on the NFS installation, and surprisingly it worked for both. So it looks like it's not a locking issue. Just in case the information is useful though, this post has the best description of the limitations of the NFSv4 calls (IIRC, there's one for read and one for write, but none for both): Another thing you mentioned is compatibility for old kernels. I may actually have the opposite problem, as I'm running the HWE kernel: 6.5.0-15-generic. I wonder if there might be negative interaction with the newer kernels. @davispuh what kernel are you using? I ran pv-verify in both installations, and on my NVME drive it took 2.8s, and on NFS it took 5.5s. I tried your other command to launch an xterm within sniper and verified all symlinks look good. I tried to generate a log for you as requested, but no logs ever appear in either the var under steam on NFS or the local installation. Only the .ref file and a temp subdirectory (tmp-$randomchars) in my local installation and several of those subdirs in my NFS one. I like your thoughts on relocation. I do have one thought that I hope is a stupid question: Steam doesn't attempt to launch anything as root, does it? One other thing... you mentioned you don't have access to debug symbols, but Valve can... with as much effort as you're putting into this and as knowledgeable as you are on how the inner workings and even the development direction, I thought you were an employee of Valve! |
I was thinking about NFS and root-squash and the overlays and I think I figured out at least part of what's going on. I don't pretend to understand what type of container sniper is using or how it's overlaying filesystems, but I find this really strange as up until the 20th, not only did it work, but many of the games I play use Proton (I recall you saying sniper is related in the other issue thread) and I even use Proton Experimental for some of them, like Space Engineers). |
It's still difficult to tell from the information available, but the best diagnosis I can make so far is that @glabifrons might be seeing a @glabifrons, if you can find a I still think that @davispuh, am I correct to think that you don't see
Sorry, I was forgetting which layer is responsible for implementing
This should record a log like I said. Another way to get logs would be to exit from Steam completely, and then run it as:
which should record one log in
This is interesting. Normally (if you don't use It sounds as though your installation on a local disk is working correctly, but the old subdirectories are not being garbage-collected on your NFS installation. I'd be interested to see why not. If you can get a log file, it should tell us why.
Normally you would use As @davispuh said, one good indication is that each time Steam runs the
OK, good. It sounds as though we cannot rely on
This may seem weird, but is normal. When an unprivileged user creates a new user namespace, as we do in the Steam container runtime, the kernel will only allow us to create one user ID mapping (our own user ID) and one group ID mapping (our own primary group ID). Everything else is mapped to the "overflow uid" and "overflow gid" (normally nobody:nogroup), very similar to NFS root-squashing. Files owned by root and files owned by any other user will show up inside the container as though they were owned by the overflow uid, which you should interpret as meaning "owned by someone who is not me". Flatpak apps have the same behaviour, for the same reason.
Not usually, and not on the critical path for basic UI functionality. In some situations (mainly either related to VR, or on the Steam Deck) it will try to run commands via
My main test environments for new container runtime releases are Ubuntu 22.04 (with the same HWE kernel you're using) and Arch (with a very new kernel, currently 6.7), so it's heavily tested on modern kernels.
I'm a consultant helping them with the Steam Runtime and related topics. If your particular issue is a problem with the container runtime, my team might be able to fix it; if it's a problem with |
I just launched the beta in the foreground again and got two crash-IDs.
I can provide more output for context if needed. I was able to get a log using the _v2-entry-point, so thank you for the correction. Thank you very much for the overflow uid explanation. That's a huge relief that it's not what I was afraid of, as that would have meant that solution wasn't NFS compatible. I'm glad you mentioned Steam's VR... I guess I'll be putting off playing with that for a while (was looking at it recently due to a deal on woot that I almost bought). Hopefully the above log is helpful, but if not, we now have crashdumps as well. |
Sifting through the log, I find the error about not finding libvdpau.so.1 to be interesting, as it placed a copy into Steam/ubuntu12_64/steam-runtime-sniper/var/tmp-92DII2/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu/. I saw it clean out the tmp dirs then give a errors that they're not empty. Most were empty by the time I looked. I did an rmdir * in the var under sniper and it removed most of them (7 remain of 19 that were there before). This could be purely a timing/sync issue with a file being removed server-side. A 1 second sleep should be more than enough to solve that. As issues go, that's incredibly minor. Other than that, I don't see anything jumping out at me in that log. You'll know better than I what to look for though. |
Probably it found that your host system had a 64-bit
No, anything that is created in
Sorry, I am not going to slow down each container startup for every Steam-on-Linux user just to benefit NFS users. If it's leaving behind nearly-empty directories, then the disk space cost is trivially small. I suspect that what might be happening here is that we're deleting the directory |
Please could a Valve developer look up the backtraces for the two crash IDs referenced by @glabifrons in #10431 (comment)
and check whether they are the same thing that @davispuh is experiencing, which is this?
|
From the log in #10431 (comment), we are likely to be using the container's root CA certificates (derived from Debian 11's @glabifrons is using Ubuntu, which is Debian-derived, so this is not a mismatch between Debian and e.g. Fedora search paths for root CA certificates, or anything like that. A potentially interesting factor is that we have pulled in
so one possible factor for Valve developers to investigate would be whether these can somehow collide? [Edited to add: the fact that #10431 (comment) didn't solve this, according to #10431 (comment), suggests that this probably wasn't the problem.] |
@glabifrons, if you are comfortable with using unreleased software, one thing you could try is:
If that makes it work, then my theory about |
@smcv Thank you very much for coming up with more ways to narrow this down. |
I created a directory on local disk, /mnt/username installed steam it worked correctly. However when I run 7 Days to Die I get "Anti Cheat Launcher Error". I tried symlinking ~/.steam to /mnt/username/.steam and ~/.local/share/Steam to /mnt/username. I set $HOME to /mnt/username, I also tried updating /etc/passwd for my account to point to /mnt/username. I know I probably will not get an answer, but what do I need to add to get the game to run. If you would like to send me (or post here) what files I need to edit to get my game to play. I have legitimately purchased the game and have about 400 hours into the game. It does not make sense that I can't have steam installed in a directory other then my home directory. Since I have to log into Steam to run game. |
Personally after installing, I verify it's NOT launching with proton as it's a linux native game thanks to unity. Then I use the launcher that pops up on first run to switch to using Vulkan for graphics, even though it says experimental, it's been working better then opengl since before amdgpu came out for me. If both of those are true, my guess would be that having steam outside of home is messing with Pressure-Vessel. Pressure-Vessel is a modified flatpak that tries to load whatever libraries are newer between the various steam-runtime versions and your actual OS files. I'm guessing it's having issues with the symlinking not always being followed like they should. I would try mounting to ~/.steam, it's not a good solution for multiple users, but you can make the steam library outside of that mount just fine. if you still have issues after trying that, run steam from the terminal, then after the game crashes post whatever you think is relevant. And lastly a big thank you @rcornwell since you also play 7 days to die and posted your hours, I peeked at mine and fate smiled at me. |
I got steam running several games with local directory, however I can't seem to get the anti-cheap to run correctly. I am the only user on my machine. I do have test user account, however I can only play sound on one user at a time. Note I am trying to disable AntiCheat software, just make it run so I can run the game. |
Anticheat's all work fine over here, you can disable loading it in 7dtd with the start popup. Go in Properties on the game in the steam library side panel, and under launch options on the right, pick show launcher. I don't run any client sided mods, so maybe that makes a difference. |
This issue is about |
It still is an issue. I got steam to work by messing around. I now have steam installed on a local disk with symbolic links in my home directory from .steam and .local/share/Steam to the local drive. I am still unable to install steam on my NFS mounted home directory like I have been able to do for over a year prior. These are work arounds not fixes. As a Unix/Linux developer who has been using Linux since basically day one, and Unix a long time before. I see no reason I can't install steam on a NFS filesystem, I also see no reason things should break when I install it on a local disk. Something in the update that was pushed out at beginning of May caused things to break. |
@smcv steamwebhelper crashes don't lead to steam not loading for me, if I let it run with the crash dialog up, steam does indeed load in a broken state that does allow at least some gameplay, but there's no networking features like the friends list refuses to go online. I did previously state that. If there's anything I can do to help troubleshoot the issues please let me know. |
It still is an issue. I got steam to work by messing around. I now have steam installed on a local disk with symbolic links in my home directory from .steam and .local/share/Steam to the local drive. I am still unable to install steam on my NFS mounted home directory like I have been able to do for over a year prior. These are work arounds not fixes. As a Unix/Linux developer who has been using Linux since basically day one, and Unix a long time before. I see no reason I can't install steam on a NFS filesystem, I also see no reason things should break when I install it on a local disk. Something in the update that was pushed out at beginning of May caused things to break. |
Do "steam -no-browser" |
The problem is it keeps looping during the install. It never brings up the browser. Installed on local disk on same machine it works fine. Until it updated early this month it worked fine on NFS system. This is installation issue, not a run time issue. |
@MrFrog222 adding -no-browser has no effect here, vmtouch is the only work around with any real impact for me. Keep the suggestions coming please. |
@tsukasa1234567 for me this was the only thing that worked in combination with installing steam-native-runtime but i dont know if thats important |
I am seeing the same issue. Ubuntu 20.04, suddenly cannot see family-shared games, pull up the friends list, or do anything else. I do get the steamwebhelper crash notification and then the UI loads anyways. |
-no-browser doesnt fix anything for me |
i just copied ~/.local/share/Steam to another ssd using the xfs filesystem and i changed the stuff in ~/.steam to link to the stuff in /mnt/steam |
From what I've seen trying to debug and workaround various issues with steam, the pressure-vessel bubble-wrap flatpak containerization breaks pretty easily when when symlinks are added. If you instead mounted that ssd onto ~/.local/share/Steam it's got a pretty decent chance of working. |
mounting /dev/nvme0n1 to ~/.local/share/Steam does seem to work EDIT: |
I was able to run just symlinking The 3GB of local space for Steam app is a small price to pay for allowing game storage to be from nfs share: |
I tried this also, it would not let me select the NFS store to save games in. Also is saved games under ~/.local/share/Steam rather then ~/.local/share where it did before. So we are looking at a large amount of storage on my local disk. My local disk is not much bigger then I need for the system. And I can flush it if my system gets messed up. With Steam on it I would have to back it up before trashing it. |
I have a few machines with no hard drives at all, so the only alternative that I tried was using zram/ext4 mounted at ~/.local/share/Steam, it works for using steam without the steamwebhelper crashes, but it won't simply let me pick a folder for using as the steam library, and I'm not about to install games to RAM lol. |
So, steam had an update in the last few days, luckily I got to grab some good games from the steam sale before rebooting. Now even using vmtouch doesn't seem to be helping the chances of getting into steam without steamwebhelper crashing. I used to be able to just leave the error window alone, and it would load into steam with friends and some other things not working, but I could play games still. |
Please try to copy/paste text, instead of pictures of text: pictures of text aren't searchable or available to accessibility tools. The error message shown in the screenshot is:
This is a different issue, which is not the same thing discussed here. It's probably best if you report it separately. (The issue described in the initial report here is the "Steamwebhelper is not responding" crash dialog. In the error message shown in your screenshot, the |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, I'll try to copy the text next time, also, I still had the error window with steamwebhelper has crashed, like I said before usually I can still play games by ignoring that error window. So steamwebhelper was not communicating correctly. Don't worry about it too much, it seems to be a kernel bug, I tried updating mesa, reboot, test, then kernel, reboot, test. I didn't see anything in the shortlog's that seemed fitting. I'll try to stay more on topic. Are there any links you can share about pressure-vessel's general execution flow? I'd like to mess around with it since not much has been discovered on what happened to cause this issue way back in Jan. I'll do some digging, but tips from the pro would be helpful |
CrashID=bp-dfd0d27a-89ab-43fb-965a-9a24e2240713 |
Looks like there are multiple crash ID's so I ran it again to paste all of them. That's in 1 start up attempt, yes it opens a steamwebhelper crash dialog, and yes it open the black screen saying What other information can I provide that'll help get this resolved? |
I've made lots of progress, I can get steam to start without error's now on the 2nd attempt. The issue is with the tmp-RANDOM folders being created, I haven't figured out the why part yet at all. Forcing only those to tmpfs wastes about 3.5 gigs of ram, so it's not ideal, but sure beats looping restarts at around 40 seconds a piece with closing the errors and all. @smcv do you have any ideas how I can narrow down things any further? The 2 tmp folders have the issue. |
So the 1st tmp-blah folder is created trying to load steam. After the crash, reloading steam usually works. If it doesn't I have to run an "env-update && . /etc/profile" to regenerate the ld.so cache, then do it again and it works. The 2nd tmp-blah folder seems to be generated by proton, I made a file name listing of both tmp-blah folders and compared them, they are identical. So it might be worth looking into to have proton reuse the already generated tmp folder of which libraries to load, that would be 1.7 GB wasted instead of 3.5 GB |
I believe all that's needed is to check that the fallback copying of the libs is complete before moving on, I get the "For best results" ... "should both be on the same fully-featured Linux filesystem" For quite awhile after the steamwebhelper crash. |
Your system information
Please describe your issue in as much detail as possible:
Expected: Steam launches as normal.
Result: "Steamwebhelper is not responding" crash dialog appears.
Steps for reproducing this issue:
Details:
This started several days back. By my logs, it looks like the last time I successfully launched Steam's beta was on 2024-01-19.
I went through issue #10412 which had the same initial dialog, but ruled out the same root cause.
While writing this up, I noticed another issue (#10417) that indicated some people were having better luck upgrading to NVidia driver 545 from 535 (which I was using).
I upgraded to 545 using Ubuntu's packages and tried switching back to Steam's beta after the upgrade (and reboot) with the same results reported above.
To be absolutely sure I followed each tip in #10412, I even removed steam-runtime-sniper before switching from release to beta on the last attempt. No change in symptoms.
Observation:
On a couple attempts, I noticed that Steam was going through the various Proton installations (one by one) and running .local/share/Steam/ubuntu12_32/../bin/d3ddriverquery64.exe even after I selected the exit option from the dialog.
I left this running to completion hoping that would solve the issue (figuring that maybe it's an incomplete driver installation within Proton or something similar), but this appeared to make no difference.
Other:
I doubt this matters, but it was related to two Steam bugs in the past so I will note it here: My home directory is mounted via NFS with the Solaris server's backing filesystem being ZFS. Several years back I had to create a 2TB quota on my steam installation share to work around #4982. The other issue (with using flock on NFS) has since been resolved (I no longer use the workaround).
These are the only things I would consider odd or unusual about my installation.
The text was updated successfully, but these errors were encountered: