Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Steamwebhelper is not responding" crash menu with home folder on NFS #10431

Open
glabifrons opened this issue Jan 26, 2024 · 114 comments
Open

"Steamwebhelper is not responding" crash menu with home folder on NFS #10431

glabifrons opened this issue Jan 26, 2024 · 114 comments

Comments

@glabifrons
Copy link

Your system information

  • Steam client version (build number or date): Current beta as of 2024-01-25
  • Distribution (e.g. Ubuntu): Ubuntu MATE 22.04.3 LTS
  • Opted into Steam client beta?: Yes
  • Have you checked for system updates?: Yes
  • Steam Logs: steam-logs.tar.gz
  • GPU: Nvidia
  • GPU drivers: tested both 535 & 545
  • Steam installation/packaging from steampowered.com (not Ubuntu's repository version).

Please describe your issue in as much detail as possible:

Expected: Steam launches as normal.

Result: "Steamwebhelper is not responding" crash dialog appears.
image

Steps for reproducing this issue:

  1. Switch to Steam Beta
  2. Restart Steam
  3. Observe crash dialog options either cause it to exit completely or don't appear to do anything (dialog goes away with Steam still running in the background in those cases).

Details:
This started several days back. By my logs, it looks like the last time I successfully launched Steam's beta was on 2024-01-19.
I went through issue #10412 which had the same initial dialog, but ruled out the same root cause.

  1. Executing "./run-in-sniper vkcube" from the steam-runtime-sniper directory worked and displayed a spinning cube.
  2. Even though the above appeared to confirm the sniper installation, I verified Steam was not running (ps -ef | grep -i steam) then renamed the steam-runtime-sniper directory to force it to be re-extracted per instructions in the other ticket. Restarting Steam at this point still resulted in the crash dialog.
  3. I switched back to the production Steam using "steam -clearbeta" and Steam came up properly at the next launch.
  4. Switching back to Public Beta again resulted in the same crash (this was repeated multiple times).

While writing this up, I noticed another issue (#10417) that indicated some people were having better luck upgrading to NVidia driver 545 from 535 (which I was using).
I upgraded to 545 using Ubuntu's packages and tried switching back to Steam's beta after the upgrade (and reboot) with the same results reported above.
To be absolutely sure I followed each tip in #10412, I even removed steam-runtime-sniper before switching from release to beta on the last attempt. No change in symptoms.

Observation:
On a couple attempts, I noticed that Steam was going through the various Proton installations (one by one) and running .local/share/Steam/ubuntu12_32/../bin/d3ddriverquery64.exe even after I selected the exit option from the dialog.
I left this running to completion hoping that would solve the issue (figuring that maybe it's an incomplete driver installation within Proton or something similar), but this appeared to make no difference.

Other:
I doubt this matters, but it was related to two Steam bugs in the past so I will note it here: My home directory is mounted via NFS with the Solaris server's backing filesystem being ZFS. Several years back I had to create a 2TB quota on my steam installation share to work around #4982. The other issue (with using flock on NFS) has since been resolved (I no longer use the workaround).
These are the only things I would consider odd or unusual about my installation.

@smcv
Copy link
Contributor

smcv commented Jan 26, 2024

I agree that this is not the same problem as #10412, despite the superficially similar symptoms. In #10412, we don't get as far as the container runtime starting. In this issue, the container starts up fine and hands over control to steamwebhelper, but the steamwebhelper is crashing.

This in steamwebhelper.log looks bad, and perhaps distinctive:

[0125/020544.602831:ERROR:nss_util.cc(357)] After loading Root Certs, loaded==false: NSS error code: -8018

8018 seems to be SEC_ERROR_UNKNOWN_PKCS11_ERROR. Is there anything unusually set up on your system involving PKCS11 or certificates, perhaps?

My home directory is mounted via NFS ... The other issue (with using flock on NFS)

I should mention here that one possible route towards solving #10412 is to use the flock(1) utility to take out a lock on a file or directory in ~/.steam/root/ubuntu12_64, to prevent concurrent access - so perhaps preemptively check that you can do that.

I would recommend putting the Steam installation (usually ~/.local/share/Steam) on a local filesystem that is well-supported by Linux (ext4, xfs, btrfs, that sort of thing), and doing the same for the Steam library that contains your compatibility tools (all versions of Proton and Steam Linux Runtime) if different, even if you also have a secondary Steam library on NFS. Some of the things that the container runtime framework needs to do are metadata operations that have really bad performance on remote filesystems.

@davispuh
Copy link

Steam Beta is borked, I can reproduce this crash on Arch Linux with clean Steam (removed ~/.steam and ~/.local/share/Steam) and then after enabling Beta

XRRGetOutputInfo Workaround: initialized with override: 0 real: 0xe3432dc0
XRRGetCrtcInfo Workaround: initialized with override: 0 real: 0xe3431500
steamwebhelper.sh[119628]: === piektdiena, 2024. gada 26. janvāris, 22:46:21 EET ===
steamwebhelper.sh[119628]: Starting steamwebhelper under bootstrap sniper steam runtime at ~/.local/share/Steam/ubuntu12_64/steam-runtime-sniper
Steam Runtime Launch Service: starting steam-runtime-launcher-service
Steam Runtime Launch Service: steam-runtime-launcher-service is running pid 119719
bus_name=com.steampowered.PressureVessel.LaunchAlongsideSteam
CAppInfoCacheReadFromDiskThread took 515 milliseconds to initialize
steamwebhelper.sh[119917]: === piektdiena, 2024. gada 26. janvāris, 22:46:31 EET ===
steamwebhelper.sh[119917]: Starting steamwebhelper under bootstrap sniper steam runtime at ~/.local/share/Steam/ubuntu12_64/steam-runtime-sniper
steamwebhelper.sh[120126]: === piektdiena, 2024. gada 26. janvāris, 22:46:42 EET ===
steamwebhelper.sh[120126]: Starting steamwebhelper under bootstrap sniper steam runtime at ~/.local/share/Steam/ubuntu12_64/steam-runtime-sniper
src/steamUI/steamuisharedjscontroller.cpp (545) : Failed creating offscreen shared JS context
src/steamUI/steamuisharedjscontroller.cpp (545) : Failed creating offscreen shared JS context
01/26 22:46:45 Init: Installing breakpad exception handler for appid(steam)/version(1706155871)/tid(119545)
assert_20240126224645_30.dmp[120310]: Uploading dump (out-of-process)
/tmp/dumps/assert_20240126224645_30.dmp
assert_20240126224645_30.dmp[120310]: Finished uploading minidump (out-of-process): success = yes
assert_20240126224645_30.dmp[120310]: response: CrashID=bp-1e48b135-8144-4efd-a813-f1b892240126
assert_20240126224645_30.dmp[120310]: file ''/tmp/dumps/assert_20240126224645_30.dmp'', upload yes: ''CrashID=bp-1e48b135-8144-4efd-a813-f1b892240126''
[2024-01-26 22:46:54] Shutdown

Current workaround is to switch back to non-Beta:

$ rm -f ~/.local/share/Steam/package/beta

@glabifrons
Copy link
Author

@smcv I'm not sure what to look for with regard to the certificates. I don't believe anything is non-standard there. Pointers as to what to check would be appreciated. The machine is my daily-driver and I've not seen anything else fail relating to certs/TLS/etc.

As to the flock issue: The core problem is NFS doesn't support flock(), that only works for local filesystems. The previously mentioned workaround was a tiny C program someone contributed to #5788 called "fakeflock.c".
I just tried that same workaround with this beta, and it did not change the ultimate outcome. However, I did notice a dialog with a moving progress bar that I didn't see yesterday. In the other ticket you mentioned that there likely going to be another update to re-add it. I'm guessing that code has been updated and it's not due to my preloading the libfakeflock library.

I'd rather not move everything to a local filesystem as I have the NFS server for convenient backups and rolling snapshots (every 15 minutes) for the entire family (thankfully, they're not on the beta). Snapshot rollbacks have saved each of us from various catastrophes numerous times. I've been using it this way for well over a decade and it's worked very well for us so far. If NFS is the issue, we're likely not the only ones that will be impacted by this due to the popularity of various DIY/home NAS solutions out there.

In an attempt to test it out, I unmounted, unshared, and relocated my steam filesystem on the server to prevent automatic mounting, then created a Steam subdirectory on a local filesystem with enough space, linked ~/.local/share/Steam back to this location, removed ~/.steam*, the relaunched Steam to force a full re-install. I then logged into Steam, switched over to the beta, at which point it segfaulted (I grabbed the logs, below). I then restarted Steam again and the beta came right up.
So it seems we've managed to narrow it down to NFS.
steam-logs-local.tar.gz

Crash text:

crash_20240126194512_39.dmp[451629]: Uploading dump (out-of-process)
/tmp/dumps/crash_20240126194512_39.dmp
crash_20240126194512_39.dmp[451629]: Finished uploading minidump (out-of-process): success = yes
crash_20240126194512_39.dmp[451629]: response: CrashID=bp-9cc63e34-48bf-4e6b-bbf3-a3ec02240126
crash_20240126194512_39.dmp[451629]: file ''/tmp/dumps/crash_20240126194512_39.dmp'', upload yes: ''CrashID=bp-9cc63e34-48bf-4e6b-bbf3-a3ec02240126''
/var/cache/fscache/Steam/steam.sh: line 798: 450752 Segmentation fault      (core dumped) "$STEAMROOT/$STEAMEXEPATH" "$@"

@glabifrons
Copy link
Author

@davispuh Is your home directory (or Steam installation path) mounted via NFS?
If not, what filesystem are you using? What mount options? I think we're making progress on narrowing this down.

@davispuh
Copy link

davispuh commented Jan 27, 2024

@davispuh Is your home directory (or Steam installation path) mounted via NFS? If not, what filesystem are you using? What mount options? I think we're making progress on narrowing this down.

Well it's bit complicated 😂 My home directory is not using NFS but local btrfs but I do have it exported with NFS and I'm also using bind mounts, subvolumes, btrfs raid1 and md raid6.

Here's summary of mounts:

/dev/nvme1n1p2 on / type btrfs (rw,noatime,ssd,discard=async,space_cache=v2,subvolid=256,subvol=/Arch)
/dev/sdp       on /home type btrfs (rw,noatime,compress=lzo,space_cache=v2,subvolid=593,subvol=/home,x-systemd.automount)
/dev/md127     on /mnt/Data type btrfs (rw,noatime,compress=zstd:3,space_cache=v2,subvolid=257,subvol=/Data,x-systemd.automount,x-systemd.mount-timeout=10m)
/dev/sdp       on /mnt/RAID type btrfs (rw,noatime,compress=lzo,space_cache=v2,subvolid=395,subvol=/RAID,x-systemd.automount,x-systemd.mount-timeout=10m)
/dev/sdp       on /srv/nfs/RAID type btrfs (rw,noatime,compress=lzo,space_cache=v2,subvolid=395,subvol=/RAID,x-systemd.automount)

That last /dev/sdp is bind mount, fstab looks like this:

UUID=ee7e665c-3de5-43e3-80b8-d312bdf58dae  /  btrfs  rw,noatime,noautodefrag,ssd,space_cache=v2,subvol=Arch  0  0
UUID=cf489774-f2f9-4d80-9cb7-08ebad25bfb3  /home  btrfs  rw,noatime,noautodefrag,space_cache=v2,compress=lzo,subvol=home,noauto,x-systemd.automount  0  0
UUID=cf489774-f2f9-4d80-9cb7-08ebad25bfb3  /mnt/RAID  btrfs  rw,noatime,noautodefrag,space_cache=v2,compress=lzo,subvol=RAID,noauto,x-systemd.automount,x-systemd.device-timeout=10m,x-systemd.mount-timeout=10m  0  0
UUID=502744d5-e441-47d1-ab41-bcf2eb800e2f  /mnt/Data  btrfs  rw,noatime,space_cache=v2,compress=zstd,subvol=Data,noauto,x-systemd.automount,x-systemd.device-timeout=10m,x-systemd.mount-timeout=10m  0  0
/mnt/RAID                                  /srv/nfs/RAID  none   bind,noauto,x-systemd.automount  0  0

And exports

/srv/nfs/RAID       172.16.0.0/25(rw,fsid=100,insecure,no_subtree_check,anonuid=20000,anongid=20000) 172.24.0.0/25(rw,fsid=100,insecure,no_subtree_check,anonuid=20000,anongid=20000)
/home/Dāvis         172.16.0.0/28(rw,fsid=101,insecure,no_subtree_check,anonuid=20000,anongid=20000) 172.24.0.0/28(rw,fsid=101,insecure,no_subtree_check,anonuid=20000,anongid=20000)

PS. These are just relevant excerpts

@glabifrons
Copy link
Author

@davispuh Unfortunately, I know very little about brtfs, so I don't know where the overlap between its capabilities and NFSv4 would be.

@smcv I forgot to mention that I'm running NFSv4, which has different locking mechanisms than NFSv3 (neither of which natively support flock).
I just tried flock_to_setlk from #5788 by @DataBeaver (compiled into both 32 bit and 64 bit versions, both preloaded via LD_PRELOAD) with no luck. While flock is triggered many times, this workaround (which is more functional than the fakeflock one) did not solve the problem.
Further down in that issue discussion, @eqvinox has an excellent description of the limitations of NFSv4 and which calls to use.

@DataBeaver
Copy link

DataBeaver commented Jan 27, 2024

I did some testing due to getting pinged. After switching to the beta, at first the Steam UI didn't show up at all and I also didn't get this "not responding" dialog. This reproduced a couple of times. I then tried running Steam on a local drive (ext4 filesystem), which worked. After that running on the NFS mount worked too. To further confirm the functionality I tried it on another computer, with the same NFS mount. There I got the "not responding" dialog, but the Steam UI showed up and worked as well. After choosing to restart either steam or just steamwebhelper the dialog did not reappear. I have to get on with other things, but maybe I'll try a clean install on NFS later.

Both computers are running Debian unstable and have Nvidia GPUs with driver version 525.147.05.

Edit: I ran the local Steam installation by changing HOME to point at a local mount. It's not impossible that it could have affected the installation on NFS, though it definitely did run from the local drive.

@glabifrons
Copy link
Author

Following DataBeaver's lead, I redid my fresh local install per my above description (using a link).
I then copied (via tar to preserve links) the installation back to NFS to see if it was only the installer with the issue, or if it was a post-installation problem.
I launched Steam and ended up with the same dialog, so I believe the installer is ruled out.

However, one thing I noticed was a large number of errors when running a diff between the local and NFS copies. I believe all of these were dead links.

~/.steam$ find root/ -xtype l -exec test ! -e {} \; -ls | wc -l
411

The confusing part is, this is true of both installations, so I'm not even sure how sniper can run in the local installation if this is causal in the NFS installation (so this very well may be a red-herring).

Digging into them in more detail, most appear to be dead links to the /run/ hierarchy, so can be ignored.

~/.steam$ find root/ -xtype l -exec test ! -e {} \; -ls  | grep -v ' /run/' | wc -l
101

Filtering down to absolute path links, I see the certificate issue you mentioned. It appears to be looking for different filenames than I have on my system, as I see a mixture of near-misses (eg: different numbers) and completely missing ones (eg: Staat*).
From my below findings, this is likely not relevant, but I'm leaving this here for context.

Next I see that it's linking to font configs that don't exist in /usr/share/fontconfig/conf.avail/ (I have 29 total in that directory, perhaps I'm missing a package?).
Again, this is likely not relevant, given the below.

Digging into a few of the oddballs at the end of the list, it's become apparent to me that this runs in some sort of chrooted environment with multiple filesystem overlays or something along those lines (sorry, I'm unfamiliar with exactly what the steam-runtime-* installations do), as I'm finding the absolute paths stuffed under various different hierarchies in the Steam installation.
Examples:
/etc/python3.9/sitecustomize.py is found under ubuntu12_64/steam-runtime-sniper/sniper_platform_0.20240125.75305/files/
/usr/lib/i386-linux-gnu/libXaw.so.7 is found under ubuntu12_32/steam-runtime/

Extrapolating this back to the certs, and it looks like they are actually there too, but in yet another subdirectory. For example,
/usr/share/ca-certificates/mozilla/Staat_der_Nederlanden_EV_Root_CA.crt is in ubuntu12_64/steam-runtime-sniper/var/tmp-74JLI2/

Backing up and filtering for relative paths, I find a few additional causes.
Some are simply missing targets (eg: 4 occurrences of an selinux link that point to a non-existing entry in its parent) where the file doesn't exist in the entire installation.
The certs seem to be pointing to the wrong directory. For example the links in the */usr/etc/ssl/certs/ directory point to the current directory, while the files are actually found in */etc/ssl/certs/ instead (/etc instead of /usr/etc, but both the link and extant file paths are under the ubuntu12_64/steam-runtime-sniper/var/tmp-74JLI2 path).
The os-release link is specified one level too deep (would resolve to */usr/usr/lib/os-release).
This appears to also be true of all of the dead library links.

Here's the list of dead links with the /run/ links stripped out:
dead-links.txt

@smcv
Copy link
Contributor

smcv commented Jan 29, 2024

I can't tell at this stage whether the problem that @glabifrons is having is to do with NFS or not, so this might all be a red herring. But, we are going to need this sooner or later, so...

The core problem is NFS doesn't support flock()

Does it support POSIX process-associated record locks (fcntl F_SETLKW) and/or Linux open-file-description locks (fcntl F_OFD_SETLKW)?

We are going to need to put some sort of locking into place, otherwise we get bizarre failure modes like one process deleting a temporary runtime that another process is still using. Sorry, but avoiding that is more important than supporting NFS. If flock(1) and flock(2) are unavailable, a different locking mechanism is a possibility, but having no locking at all is not really an option.

At the moment, the container runtime tries to use the Linux-specific fcntl F_OFD_SETLKW, falling back to POSIX fcntl F_SETLKW on ancient kernels. You could test this: with Steam not running,

adverb="$HOME/.steam/root/ubuntu12_64/steam-runtime-sniper/pressure-vessel/bin/pressure-vessel-adverb"
ref="$HOME/.steam/root/ubuntu12_64/steam-runtime-sniper/.ref"

"$adverb" --write --wait --lock-file="$ref" -- sleep 600 &
"$adverb" --write --lock-file="$ref" -- true
"$adverb" --write --wait --lock-file="$ref" -- true

(If necessary, copy the whole steam-runtime-sniper directory into a temporary location on NFS, and adjust the paths accordingly.)

The "$adverb" --write --lock-file="$ref" -- true command (without --wait) should fail with error message "E: Unable to lock ... for writing: file is busy".

The last command (with --wait) should block, with no output, until you kill the sleep process (or wait 10 minutes for it to exit on its own), at which point the last command should exit successfully.

The previously mentioned workaround was a tiny C program someone contributed to #5788 called "fakeflock.c"

Disabling locking like this is not a solution. This will lead to concurrent processes all believing that they have the lock at the same time, and overwriting or deleting files that the other concurrent processes were using.

@smcv
Copy link
Contributor

smcv commented Jan 29, 2024

@davispuh:

We do not have enough information on this issue to be able to guess whether the failure mode you are seeing with the beta is the same as @glabifrons is seeing, or the same as #10412, or some different thing. Please look at the logs in ~/.steam/root/logs/, especially steamwebhelper.log and webhelper.txt.

If we try to handle multiple different problems on the same issue number, it quickly becomes really confusing, which makes it take longer to solve any of the problems that were reported; so we should reserve this issue number for the specific problem that @glabifrons is experiencing (which unfortunately we have not yet been able to identify). If we can identify that something different is going wrong for you, please open a separate issue for that, with a title that is as specific as possible.

My home directory is not using NFS but local btrfs but I do have it exported with NFS and I'm also using bind mounts, subvolumes, btrfs raid1 and md raid6.

I don't know whether any of these will interfere with the container runtime. My first guess would be that RAID shouldn't matter, because that's at a lower level than anything we're doing, but the others might. If you can try launching Steam on the same system but from a home directory that is as "ordinary and boring" as possible (perhaps by creating a temporary user whose home directory is on local disk and is not NFS-exported, and logging in as that user) then that will help to narrow down whether any of these less-usual configurations are involved.

@smcv
Copy link
Contributor

smcv commented Jan 29, 2024

@DataBeaver:

There I got the "not responding" dialog, but the Steam UI showed up and worked as well

Unfortunately, I think this is normal if it takes an unusually long time for the steamwebhelper to start. Steam cannot currently distinguish between multiple different reasons why the steamwebhelper might fail to start, and it also cannot currently distinguish between "it's taking a long time, but might still work" and "it will never work, however long we wait".

If your NFS mount has enough latency to make small metadata operations like link(2) and chmod(2) unexpectedly slow, then it's going to take a while to start. We've seen this before with Steam libraries on other network filesystems like SMB.

@smcv
Copy link
Contributor

smcv commented Jan 29, 2024

Back to @glabifrons:

it's become apparent to me that this runs in some sort of chrooted environment with multiple filesystem overlays or something along those lines

Yes, it's the Steam container runtime, which has quite a lot of code in common with Flatpak. It's normal that some of the files below steam-runtime-sniper/ are symbolic links to filenames that don't exist on your host system. As long as those symlinks work correctly inside the container, everything is fine.

Looking at your list of dangling symlinks, the majority of them are very likely to work as intended inside the container. I do notice one bug, but it's a bug that will only affect developers who are running this stuff in a non-default configuration that isn't relevant to end-user systems.

If you are copying Steam installations between filesystems, you can delete all of steam-runtime-sniper/var/ instead of copying it: the subdirectories in there are temporary, and are deleted and re-created automatically. In fact, you could even delete all of steam-runtime-sniper/, because Steam will automatically unpack it from steam-runtime-sniper.tar.xz.

You can verify that steam-runtime-sniper/ has the expected contents by running:

~/.steam/root/ubuntu12_64/steam-runtime-sniper/pressure-vessel/bin/pv-verify

This checks both metadata and content of all of the files in there, so expect it to take up to 30 seconds on HDD, and perhaps longer on NFS.

You can also get an interactive shell inside the container by running:

~/.steam/root/ubuntu12_64/steam-runtime-sniper/run -- xterm

Inside that xterm, you should find that all the symbolic links in /etc/ssl/certs are working (they have a valid target, and if you use ls --color they will typically appear in cyan rather than red).

It would be useful for me to see a detailed log from the container runtime framework, which you can get by running:

STEAM_LINUX_RUNTIME_LOG=1 \
STEAM_LINUX_RUNTIME_VERBOSE=1 \
~/.steam/root/ubuntu12_64/steam-runtime-sniper/run -- xterm

(You can just exit from the xterm when it has opened, or run a simpler command like true.)

The log file will appear in steam-runtime-sniper/var/, with a symbolic link slr-latest.log that points to it.

@smcv
Copy link
Contributor

smcv commented Jan 29, 2024

I'd rather not move everything to a local filesystem as I have the NFS server for convenient backups and rolling snapshots

I'm sure that's desirable, but remote filesystems have functionality and performance characteristics that are very much unlike local filesystems, and we can't support every possible scenario.

As currently implemented, the whole steam-runtime-sniper/ directory is actually "expendable": it needs to exist while Steam is running, but it does not contain any user data, so the only thing that is lost when it's deleted (assuming it isn't in active use) is some time. If we can set up some sort of mechanism for redirecting this from its normal location onto a local disk, then that would make it faster and more robust for you, and would also avoid it wasting space and time in your backups.

At the moment the way it's implemented doesn't allow for it to be a symlink or a mount point, but I'll see whether that can become possible in future.

@DataBeaver
Copy link

The core problem is NFS doesn't support flock()

Does it support POSIX process-associated record locks (fcntl F_SETLKW) and/or Linux open-file-description locks (fcntl F_OFD_SETLKW)?

A quick look at the relevant manpages tells me that while NFS doesn't support flock natively, Linux can emulate it using fcntl locks, albeit with slightly different semantics. There's no mention if whether it's F_SETLKW or F_OFD_SETLKW, nor does the fcntl manpage say whether those two have any difference over NFS. I assume they work the same, since the only difference is that OFD locks will also block other file descriptors from the same process, and from the NFS protocol's point of view it doesn't matter if it's two different processes on the same client or two fds in the same process.

The "$adverb" --write --lock-file="$ref" -- true command (without --wait) should fail with error message "E: Unable to lock ... for writing: file is busy".

The last command (with --wait) should block, with no output, until you kill the sleep process (or wait 10 minutes for it to exit on its own), at which point the last command should exit successfully.

This works for me on my NFS home directory. As does Steam itself. So I think NFS is at most a contributing factor, not the root cause. It will be interesting to see glabifrons's results for the lock test. Could be that we have some configuration differences.

There I got the "not responding" dialog, but the Steam UI showed up and worked as well

Unfortunately, I think this is normal if it takes an unusually long time for the steamwebhelper to start. Steam cannot currently distinguish between multiple different reasons why the steamwebhelper might fail to start, and it also cannot currently distinguish between "it's taking a long time, but might still work" and "it will never work, however long we wait".

Understandable. It's a relatively minor annoyance, but if you want to do something about it, maybe add an option to wait a bit longer without restarting anything? Or even keep checking for responsiveness while the dialog is up, and hide it if steamwebhelper starts responding after all.

@davispuh
Copy link

Hmm I thought this is only issue for steamwebhelper crash due to new Beta but looks like there are several issues causing crashes...

My crash is not #10412 because my steam-runtime-sniper is complete without missing files and I also tried

$ rm -rf ~/.steam/root/ubuntu12_64/steam-runtime-sniper

but that didn't change anything and

$ ~/.steam/root/ubuntu12_64/steam-runtime-sniper/run-in-sniper vkcube

works fine without issues.

In logs nothing in particular stands out

$ cat ~/.steam/steam/logs/steamwebhelper.log
steamwebhelper.sh[43847]: Starting steamwebhelper with sniper steam runtime at /mnt/Games/SteamLinux/ubuntu12_64/steam-runtime-sniper
exec ./steamwebhelper --no-sandbox --no-sandbox -lang=en_US -cachedir=/mnt/Games/SteamLinux/config/htmlcache -steampid=42835 -buildid=1706390103 -steamid=xxx -logdir=/mnt/Games/SteamLinux/logs -uimode=7 -startcount=2 -steamuniverse=Public -realm=Global -clientui=/mnt/Games/SteamLinux/clientui -steampath=/mnt/Games/SteamLinux/ubuntu12_32/steam -launcher=0 -no-restart-on-ui-mode-change --enable-media-stream --enable-smooth-scrolling --password-store=basic --log-file=/mnt/Games/SteamLinux/logs/cef_log.txt --disable-quick-menu --disable-features=DcheckIsFatal
[0129/194030.399588:ERROR:context.cc(100)] The browser_subprocess_path directory (./steamwebhelper) is not an absolute path. Defaulting to empty.
[0129/194030.426813:WARNING:crash_reporting.cc(278)] Failed to set crash key: UserID with value: 0
[0129/194030.426864:WARNING:crash_reporting.cc(278)] Failed to set crash key: BuildID with value: 1706389061
[0129/194030.426867:WARNING:crash_reporting.cc(278)] Failed to set crash key: SteamUniverse with value: Public
[0129/194030.426870:WARNING:crash_reporting.cc(278)] Failed to set crash key: Vendor with value: Valve
[0129/194030.426873:WARNING:crash_reporting.cc(278)] Failed to set crash key: Platform with value: Linux
[0129/194030.427856:INFO:crash_reporting.cc(239)] Crash reporting enabled for process: browser
[0129/194030.429320:WARNING:task_impl.cc(32)] No task runner for threadId 0
[0129/194030.430817:WARNING:task_impl.cc(32)] No task runner for threadId 0
[0129/194030.457887:WARNING:crash_reporting.cc(278)] Failed to set crash key: UserID with value: xxx
[0129/194030.457969:WARNING:crash_reporting.cc(278)] Failed to set crash key: BuildID with value: 1706390103
[0129/194030.457973:WARNING:crash_reporting.cc(278)] Failed to set crash key: SteamUniverse with value: Public
[0129/194030.457976:WARNING:crash_reporting.cc(278)] Failed to set crash key: Vendor with value: Valve
[0129/194030.457980:WARNING:crash_reporting.cc(278)] Failed to set crash key: Platform with value: Linux
[0129/194030.461011:WARNING:crash_reporting.cc(278)] Failed to set crash key: UserID with value: xxx
[0129/194030.461099:WARNING:crash_reporting.cc(278)] Failed to set crash key: BuildID with value: 1706390103
[0129/194030.461102:WARNING:crash_reporting.cc(278)] Failed to set crash key: SteamUniverse with value: Public
[0129/194030.461105:WARNING:crash_reporting.cc(278)] Failed to set crash key: Vendor with value: Valve
[0129/194030.461108:WARNING:crash_reporting.cc(278)] Failed to set crash key: Platform with value: Linux
[0129/194031.302189:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[0129/194031.302222:ERROR:file_io_posix.cc(144)] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2)
$ cat ~/.steam/steam/logs/webhelper.txt
[...]
[1970-01-01 03:00:00] Client version: no bootstrapper found
[1970-01-01 03:00:00] Startup - webhelper launched with: ./steamwebhelper --no-sandbox -lang=en_US -cachedir=/mnt/Games/SteamLinux/config/htmlcache -steampid=42835 -buildid=1706390103 -steamid=xxx -logdir=/mnt/Games/SteamLinux/logs -uimode=7 -startcount=2 -steamuniverse=Public -realm=Global -clientui=/mnt/Games/SteamLinux/clientui -steampath=/mnt/Games/SteamLinux/ubuntu12_32/steam -launcher=0 -no-restart-on-ui-mode-change --enable-media-stream --enable-smooth-scrolling --password-store=basic --log-file=/mnt/Games/SteamLinux/logs/cef_log.txt --disable-quick-menu --disable-features=DcheckIsFatal
[1970-01-01 03:00:00] Disabling sandbox due to a previous crash in CefInitialize with the sandbox enabled
[1970-01-01 03:00:00] Browser - launching child process with: /mnt/Games/SteamLinux/ubuntu12_64/steamwebhelper --type=zygote --no-zygote-sandbox --no-sandbox --user-agent-product=Valve Steam Client --lang=en_US.UTF-8 --log-file=/mnt/Games/SteamLinux/logs/cef_log.txt --crashpad-handler-pid=43969 --buildid=1706390103 --steamid=xxx
[1970-01-01 03:00:00] Browser - launching child process with: /mnt/Games/SteamLinux/ubuntu12_64/steamwebhelper --type=zygote --no-sandbox --user-agent-product=Valve Steam Client --lang=en_US.UTF-8 --log-file=/mnt/Games/SteamLinux/logs/cef_log.txt --crashpad-handler-pid=43969 --buildid=1706390103 --steamid=xxx


[1970-01-01 03:00:00] Client version: no bootstrapper found
[1970-01-01 03:00:00] Startup - webhelper launched with: ./steamwebhelper --no-sandbox --no-sandbox -lang=en_US -cachedir=/mnt/Games/SteamLinux/config/htmlcache -steampid=42835 -buildid=1706390103 -steamid=xxx -logdir=/mnt/Games/SteamLinux/logs -uimode=7 -startcount=2 -steamuniverse=Public -realm=Global -clientui=/mnt/Games/SteamLinux/clientui -steampath=/mnt/Games/SteamLinux/ubuntu12_32/steam -launcher=0 -no-restart-on-ui-mode-change --enable-media-stream --enable-smooth-scrolling --password-store=basic --log-file=/mnt/Games/SteamLinux/logs/cef_log.txt --disable-quick-menu --disable-features=DcheckIsFatal
[1970-01-01 03:00:00] Disabling sandbox due to a previous crash in CefInitialize with the sandbox enabled
[1970-01-01 03:00:00] Browser - launching child process with: /mnt/Games/SteamLinux/ubuntu12_64/steamwebhelper --type=zygote --no-zygote-sandbox --no-sandbox --user-agent-product=Valve Steam Client --lang=en_US.UTF-8 --log-file=/mnt/Games/SteamLinux/logs/cef_log.txt --crashpad-handler-pid=44120 --buildid=1706390103 --steamid=xxx
[1970-01-01 03:00:00] Browser - launching child process with: /mnt/Games/SteamLinux/ubuntu12_64/steamwebhelper --type=zygote --no-sandbox --user-agent-product=Valve Steam Client --lang=en_US.UTF-8 --log-file=/mnt/Games/SteamLinux/logs/cef_log.txt --crashpad-handler-pid=44120 --buildid=1706390103 --steamid=xxx
$ cat ~/.steam/root/logs/cef_log.txt
[0129/194027.772949:INFO:crash_reporting.cc(239)] Crash reporting enabled for process: browser
[0129/194027.774154:WARNING:task_impl.cc(32)] No task runner for threadId 0
[0129/194027.775646:WARNING:task_impl.cc(32)] No task runner for threadId 0
[0129/194030.427856:INFO:crash_reporting.cc(239)] Crash reporting enabled for process: browser
[0129/194030.429320:WARNING:task_impl.cc(32)] No task runner for threadId 0
[0129/194030.430817:WARNING:task_impl.cc(32)] No task runner for threadId 0

And here are backtraces but I don't know how to get symbols for it?

#0  0x000070f2f7d3f003 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x8b3f003)
#1  0x000070f2f7d2dc3f n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x8b2dc3f)
#2  0x000070f2f372235c n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x452235c)
#3  0x000070f2f3c811c5 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x4a811c5)
#4  0x000070f2f372269f n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x452269f)
#5  0x000070f2f3724dec n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x4524dec)
#6  0x000070f2f142cc3d n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x222cc3d)
#7  0x000070f2f14a84ac n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x22a84ac)
#8  0x000070f2f4fa63b0 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x5da63b0)
#9  0x000070f2f4fa7670 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x5da7670)
#10 0x000070f2f4fa7454 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x5da7454)
#11 0x000070f2f4fa4c41 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x5da4c41)
#12 0x000070f2f142bc32 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x222bc32)
#13 0x000070f2f142b995 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x222b995)
#14 0x000070f2f13fdac5 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x21fdac5)
#15 0x000070f2f13fd654 n/a (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x21fd654)
#16 0x000070f2f137336b cef_initialize (/mnt/Games/SteamLinux/ubuntu12_64/libcef.so + 0x217336b)
#17 0x00000000005d077a CefInitialize(CefMainArgs const&, CefStructBase<CefSettingsTraits> const&, scoped_refptr<CefApp>, void*) (steamwebhelper + 0x1d077a)
#18 0x0000000000519b1a CCEFThread::Init(int, char**, scoped_refptr<CefApp>) (steamwebhelper + 0x119b1a)
#19 0x0000000000517839 InitializeCef(int, char**, scoped_refptr<CefApp>) (steamwebhelper + 0x117839)
#20 0x0000000000590323 main (steamwebhelper + 0x190323)
#21 0x000070f2ee924cd0 n/a (/run/host/usr/lib/libc.so.6 + 0x27cd0)
#0  0x00007fefe313f003 n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x8b3f003)
#1  0x00007fefe312dc3f n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x8b2dc3f)
#2  0x00007fefe03a5494 n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x5da5494)
#3  0x00007fefe03a66ff n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x5da66ff)
#4  0x00007fefe03a7428 n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x5da7428)
#5  0x00007fefe03a4d46 n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x5da4d46)
#6  0x00007fefe03a4e3a n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x5da4e3a)
#7  0x00007fefdc82c391 n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x222c391)
#8  0x00007fefdc7fd517 n/a (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x21fd517)
#9  0x00007fefdc773260 cef_execute_process (/mnt/Game/SteamLinux/ubuntu12_64/libcef.so + 0x2173260)
#10 0x00000000005d069f CefExecuteProcess(CefMainArgs const&, scoped_refptr<CefApp>, void*) (steamwebhelper + 0x1d069f)
#11 0x000000000058fc7e RealMain(int, char**) (steamwebhelper + 0x18fc7e)
#12 0x0000000000590e38 main (steamwebhelper + 0x190e38)
#13 0x00007fefd9d24cd0 n/a (/run/host/usr/lib/libc.so.6 + 0x27cd0)

@smcv
Copy link
Contributor

smcv commented Jan 29, 2024

Hmm I thought this is only issue for steamwebhelper crash due to new Beta but looks like there are several issues causing crashes...

The steamwebhelper has had several major changes in the new beta. Some issues caused by those changes (like #10412) are to do with the fact that it is now running inside a container runtime, like Counterstrike 2 and Dota 2 do. Others could be to do with changes inside the steamwebhelper itself.

At the moment, unfortunately I don't see enough information here to be able to say whether you are experiencing the same crash as the original reporter of this particular issue or not.

$ cat ~/.steam/steam/logs/webhelper.txt

In the beta that was active over the weekend, the log was truncated every time the steamwebhelper restarted, which was unhelpful because it meant that previous error messages could be lost. Please try updating Steam to the current beta 1706390103, which has stopped truncating the log every time, so should get you better logs.

You might need to do this by swapping to the stable branch (completely exit from Steam, rm ~/.steam/root/package/beta, start it again), and then back to the beta.

And here are backtraces but I don't know how to get symbols for it?

The general public cannot get debug symbols for the proprietary parts of Steam (and neither can I), but Valve can. In your original report, you quoted log output that said Steam has uploaded a crash dump, CrashID=bp-1e48b135-8144-4efd-a813-f1b892240126. That could be useful information to a Valve developer. If you can get a similar CrashID from the current beta, that will also help.

@davispuh
Copy link

All information in my comment is with latest Beta version (today installed), there you can see steamwebhelper was ran with -buildid=1706390103 parameter. Also I deleted logs folder before run so there isn't old info.

Looks like it crashes very early in startup as there isn't any other log entries after Browser - launching child process with.

In non-Beta version I see

[2024-01-29 21:16:20] CreateBrowser 512367304 type:12 flags:0  (-2147483648, -2147483648) 0x0

But this is not present in Beta we never reach it.

Crash seems to be inside libcef.so - Chromium Embedded Framework (CEF) which is open source so I think even with Valve's modifications it might be possible to match it up with relevant functions with some reversing but that's not our job.

So they need to look into this. My latest CrashID=bp-5aaf6174-a286-49b8-a3b3-eb2fe2240129

@glabifrons
Copy link
Author

@smcv I tried the adverb commands both on the local filesystem and on the NFS installation, and surprisingly it worked for both. So it looks like it's not a locking issue.

Just in case the information is useful though, this post has the best description of the limitations of the NFSv4 calls (IIRC, there's one for read and one for write, but none for both):
#5788 (comment)
Two posts below that, @eqvinox has a short explanation of how the Linux kernel translates the calls (so it does add that, but they don't look like the ones you asked about).

Another thing you mentioned is compatibility for old kernels. I may actually have the opposite problem, as I'm running the HWE kernel: 6.5.0-15-generic. I wonder if there might be negative interaction with the newer kernels. @davispuh what kernel are you using?

I ran pv-verify in both installations, and on my NVME drive it took 2.8s, and on NFS it took 5.5s.
Both came back verified on both checks it performs.

I tried your other command to launch an xterm within sniper and verified all symlinks look good.

I tried to generate a log for you as requested, but no logs ever appear in either the var under steam on NFS or the local installation. Only the .ref file and a temp subdirectory (tmp-$randomchars) in my local installation and several of those subdirs in my NFS one.
I updated the beta to the latest (released on the 26th... how do you identify the version number?) and the results were the same - no log. I tried exporting the variables (thinking maybe there was another subshell being triggered), same results, no log in steam-runtime-sniper/var.
Since I couldn't generate those logs (in local or NFS installations), I re-collected the main logs as initially requested after the above mentioned upgrade to the latest beta.
steam-logs-jan26beta.tar.gz

I like your thoughts on relocation.
Sniper is well under 1GB, so /tmp or /var/tmp would likely work for most (and /tmp would make the most sense).
It could be a manual process with links done by experienced users, or maybe even something from a path entry dialog down the road labeled something like "Copy temporary runtime to local filesystem" available only if the system detects that it's running in a remote filesystem.

I do have one thought that I hope is a stupid question: Steam doesn't attempt to launch anything as root, does it?
I ask as if it does so in my home directory, it will fail as root is "squashed" on NFS by default (its a security precaution).
Any actions a root from a client show up as user "nobody" from the server's perspective, so will fail any operation on any path that's not world-read/execute (at a minimum) perms all the way up.

One other thing... you mentioned you don't have access to debug symbols, but Valve can... with as much effort as you're putting into this and as knowledgeable as you are on how the inner workings and even the development direction, I thought you were an employee of Valve!

@glabifrons
Copy link
Author

I was thinking about NFS and root-squash and the overlays and I think I figured out at least part of what's going on.
From within the sniper shell (the xterm launched per your instructions), all device files (the entire output of ls -l /dev/) are shown as owned by nobody:nogroup.

I don't pretend to understand what type of container sniper is using or how it's overlaying filesystems, but I find this really strange as up until the 20th, not only did it work, but many of the games I play use Proton (I recall you saying sniper is related in the other issue thread) and I even use Proton Experimental for some of them, like Space Engineers).

@smcv
Copy link
Contributor

smcv commented Jan 30, 2024

It's still difficult to tell from the information available, but the best diagnosis I can make so far is that @glabifrons might be seeing a steamwebhelper crash that is not directly related to the introduction of the container runtime, similar to @davispuh. If that's the case, then a Steam developer will have to take over investigating this.

@glabifrons, if you can find a CrashID in recent Steam output (it will look similar to the one @davispuh provided) then that would probably be useful information for a Steam developer. Equally, if you can't find a CrashID anywhere, that would also be an interesting data point - it might imply that I'm wrong about this being a steamwebhelper crash.

I still think that [0129/213808.640966:ERROR:nss_util.cc(357)] After loading Root Certs, loaded==false: NSS error code: -8018 in your cef_log.txt might be significant, but I don't know why that would happen to you but not to others. It sounds as though the CA certificates in the sniper container are set up correctly (their symbolic links are not broken in the container environment). A container runtime verbose log might help to figure out what is different about your system.

@davispuh, am I correct to think that you don't see NSS error code: -8018 in your cef_log.txt or other logs?

I tried to generate a log for you as requested, but no logs ever appear in either the var under steam on NFS or the local installation

Sorry, I was forgetting which layer is responsible for implementing slr-latest.log. The command I should have suggested is:

STEAM_LINUX_RUNTIME_LOG=1 \
STEAM_LINUX_RUNTIME_VERBOSE=1 \
~/.steam/root/ubuntu12_64/steam-runtime-sniper/_v2-entry-point -- xterm

This should record a log like I said.

Another way to get logs would be to exit from Steam completely, and then run it as:

STEAM_LINUX_RUNTIME_LOG=1 \
STEAM_LINUX_RUNTIME_KEEP_LOGS=1 \
STEAM_LINUX_RUNTIME_VERBOSE=1 \
steam

which should record one log in steam-runtime-sniper/var each time it tries to restart the steamwebhelper, with slr-latest.log always pointing to the newest one.

the .ref file and a temp subdirectory (tmp-$randomchars) in my local installation and several of those subdirs in my NFS one

This is interesting. Normally (if you don't use STEAM_LINUX_RUNTIME_KEEP_LOGS=1), the container runtime is meant to delete old var/tmp-XXXXXXX subdirectories during startup - so at any given time, you should usually only have one.

It sounds as though your installation on a local disk is working correctly, but the old subdirectories are not being garbage-collected on your NFS installation. I'd be interested to see why not. If you can get a log file, it should tell us why.

how do you identify the version number?

Normally you would use Help → About Steam, but when your issue is that the UI isn't starting, obviously that isn't an option.

As @davispuh said, one good indication is that each time Steam runs the steamwebhelper, it passes it an argument -buildid=1706390103 which gives the build ID. Many of the other logs also mention the build ID, for example [2024-01-29 21:36:16] Client version: 1706390103 in console_log.txt and [2024-01-29 21:37:22] Download skipped: /steam_client_publicbeta_ubuntu12?t=3744699689 version 1706390103, installed version 1706390103, existing pending version 0 in bootstrap_log.txt.

I tried the adverb commands both on the local filesystem and on the NFS installation, and surprisingly it worked for both

OK, good. It sounds as though we cannot rely on flock(1) or flock(2) on NFS, but the fcntl locking that we already use should be safe. I'll bear that in mind for future work on this topic.

From within the sniper shell (the xterm launched per your instructions), all device files (the entire output of ls -l /dev/) are shown as owned by nobody:nogroup.

This may seem weird, but is normal. When an unprivileged user creates a new user namespace, as we do in the Steam container runtime, the kernel will only allow us to create one user ID mapping (our own user ID) and one group ID mapping (our own primary group ID). Everything else is mapped to the "overflow uid" and "overflow gid" (normally nobody:nogroup), very similar to NFS root-squashing. Files owned by root and files owned by any other user will show up inside the container as though they were owned by the overflow uid, which you should interpret as meaning "owned by someone who is not me".

Flatpak apps have the same behaviour, for the same reason.

Steam doesn't attempt to launch anything as root, does it?

Not usually, and not on the critical path for basic UI functionality. In some situations (mainly either related to VR, or on the Steam Deck) it will try to run commands via pkexec.

Another thing you mentioned is compatibility for old kernels. I may actually have the opposite problem, as I'm running the HWE kernel: 6.5.0-15-generic.

My main test environments for new container runtime releases are Ubuntu 22.04 (with the same HWE kernel you're using) and Arch (with a very new kernel, currently 6.7), so it's heavily tested on modern kernels.

I thought you were an employee of Valve

I'm a consultant helping them with the Steam Runtime and related topics. If your particular issue is a problem with the container runtime, my team might be able to fix it; if it's a problem with steamwebhelper itself, probably someone else will need to take over.

@glabifrons
Copy link
Author

I just launched the beta in the foreground again and got two crash-IDs.

assert_20240130192454_27.dmp[1678661]: response: CrashID=bp-76d14b36-100b-48a3-aec4-615e12240130
assert_20240130192454_27.dmp[1678661]: file ''/tmp/dumps/assert_20240130192454_27.dmp'', upload yes: ''CrashID=bp-76d14b36-100b-48a3-aec4-615e12240130''

assert_20240130192527_134.dmp[1679205]: response: CrashID=bp-722467e1-6953-4b8a-9ad0-01f382240130
assert_20240130192527_134.dmp[1679205]: file ''/tmp/dumps/assert_20240130192527_134.dmp'', upload yes: ''CrashID=bp-722467e1-6953-4b8a-9ad0-01f382240130''

I can provide more output for context if needed.

I was able to get a log using the _v2-entry-point, so thank you for the correction.
Interestingly, it deletes the old log when it creates a new one, but still doesn't clean up the tmp-$random directories.
This one was just using true as the target executable.
slr-non-steam-game-t20240130T203006.log

Thank you very much for the overflow uid explanation. That's a huge relief that it's not what I was afraid of, as that would have meant that solution wasn't NFS compatible. I'm glad you mentioned Steam's VR... I guess I'll be putting off playing with that for a while (was looking at it recently due to a deal on woot that I almost bought).

Hopefully the above log is helpful, but if not, we now have crashdumps as well.

@glabifrons
Copy link
Author

Sifting through the log, I find the error about not finding libvdpau.so.1 to be interesting, as it placed a copy into Steam/ubuntu12_64/steam-runtime-sniper/var/tmp-92DII2/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu/.
I'm guessing it copied it from Steam/ubuntu12_64/steam-runtime-sniper/sniper_platform_0.20240125.75305/files/lib/x86_64-linux-gnu/libvdpau.so.1.0.0 (note the extra ".0.0" on the end, I'm not sure if it's expecting an *.so.1 in that location or where else it's looking where it's not seeing it).

I saw it clean out the tmp dirs then give a errors that they're not empty. Most were empty by the time I looked. I did an rmdir * in the var under sniper and it removed most of them (7 remain of 19 that were there before). This could be purely a timing/sync issue with a file being removed server-side. A 1 second sleep should be more than enough to solve that. As issues go, that's incredibly minor.

Other than that, I don't see anything jumping out at me in that log. You'll know better than I what to look for though.

@smcv
Copy link
Contributor

smcv commented Jan 31, 2024

Sifting through the log, I find the error about not finding libvdpau.so.1 to be interesting, as it placed a copy into Steam/ubuntu12_64/steam-runtime-sniper/var/tmp-92DII2/usr/lib/pressure-vessel/overrides/lib/x86_64-linux-gnu

Probably it found that your host system had a 64-bit libvdpau.so.1 but not a 32-bit libvdpau.so.1. Ideally you will want both of those on Nvidia-based systems, to get hardware acceleration for video encoding and decoding in both 64- and 32-bit processes. On Ubuntu this means installing both libvdpau1:amd64 and libvdpau1:i386. This is worth fixing, but probably not what is causing your crash, since the steamwebhelper is a 64-bit process that will be unaffected by the absence of a 32-bit library.

. I'm guessing it copied it from Steam/ubuntu12_64/steam-runtime-sniper/sniper_platform_0.20240125.75305/files/lib/x86_64-linux-gnu/libvdpau.so.1.0.0

No, anything that is created in usr/lib/pressure-vessel/overrides is a symbolic link pointing to graphics-stack-related files from your host system (via the /run/host mount point inside the container).

I saw it clean out the tmp dirs then give a errors that they're not empty.[...] This could be purely a timing/sync issue with a file being removed server-side. A 1 second sleep should be more than enough to solve that.

Sorry, I am not going to slow down each container startup for every Steam-on-Linux user just to benefit NFS users. If it's leaving behind nearly-empty directories, then the disk space cost is trivially small.

I suspect that what might be happening here is that we're deleting the directory tmp-XXXXXX while the file tmp-XXXXXX/.ref is open and locked. POSIX guarantees that we can delete files while they are still open, but NFS implements deletion of open files by renaming them to some weird name that will be removed when the file is eventually closed, and then that means the directory isn't empty, so rmdir() refuses to delete it.

@smcv
Copy link
Contributor

smcv commented Jan 31, 2024

Please could a Valve developer look up the backtraces for the two crash IDs referenced by @glabifrons in #10431 (comment)

assert_20240130192454_27.dmp[1678661]: response: CrashID=bp-76d14b36-100b-48a3-aec4-615e12240130
assert_20240130192527_134.dmp[1679205]: response: CrashID=bp-722467e1-6953-4b8a-9ad0-01f382240130

and check whether they are the same thing that @davispuh is experiencing, which is this?

CrashID=bp-5aaf6174-a286-49b8-a3b3-eb2fe2240129

@smcv
Copy link
Contributor

smcv commented Jan 31, 2024

From the log in #10431 (comment), we are likely to be using the container's root CA certificates (derived from Debian 11's ca-certificates package, and available in /etc/ssl/) rather than the host's root CA certificates. This makes it odd that @glabifrons is getting an error message relating to root CA certificates, but I'm not: if there was a problem with the container's /etc/ssl, I would expect it to affect everyone, including me.

@glabifrons is using Ubuntu, which is Debian-derived, so this is not a mismatch between Debian and e.g. Fedora search paths for root CA certificates, or anything like that.

A potentially interesting factor is that we have pulled in /lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.545.29.06 as part of the graphics stack, which has OpenSSL 3 /lib/x86_64-linux-gnu/libcrypto.so.3 as a dependency. This means that code inside the container can see all of these OpenSSL libraries:

  • libssl.so.1.1 from sniper (actually from Debian 11, which sniper is based on)
  • libcrypto.so.1.1 from sniper (actually from Debian 11)
  • libcrypto.so.3 from Ubuntu 22.04
  • whatever version of OpenSSL is statically linked into steamwebhelper, if any

so one possible factor for Valve developers to investigate would be whether these can somehow collide?

[Edited to add: the fact that #10431 (comment) didn't solve this, according to #10431 (comment), suggests that this probably wasn't the problem.]

@smcv
Copy link
Contributor

smcv commented Feb 2, 2024

@glabifrons, if you are comfortable with using unreleased software, one thing you could try is:

If that makes it work, then my theory about libnvidia-pkcs11-openssl3.so.* in #10431 (comment) is probably going in the right direction. If it still fails with that modified version of pressure-vessel, then my theory was probably wrong, but that could still be a useful data point for someone with more insight into the proprietary parts of Steam.

@glabifrons
Copy link
Author

@smcv Thank you very much for coming up with more ways to narrow this down.
Unfortunately, I tried it and still got the crash dialog.

@rcornwell
Copy link

I created a directory on local disk, /mnt/username installed steam it worked correctly. However when I run 7 Days to Die I get "Anti Cheat Launcher Error". I tried symlinking ~/.steam to /mnt/username/.steam and ~/.local/share/Steam to /mnt/username. I set $HOME to /mnt/username, I also tried updating /etc/passwd for my account to point to /mnt/username. I know I probably will not get an answer, but what do I need to add to get the game to run. If you would like to send me (or post here) what files I need to edit to get my game to play. I have legitimately purchased the game and have about 400 hours into the game. It does not make sense that I can't have steam installed in a directory other then my home directory. Since I have to log into Steam to run game.

@tsukasa1234567
Copy link

Personally after installing, I verify it's NOT launching with proton as it's a linux native game thanks to unity. Then I use the launcher that pops up on first run to switch to using Vulkan for graphics, even though it says experimental, it's been working better then opengl since before amdgpu came out for me. If both of those are true, my guess would be that having steam outside of home is messing with Pressure-Vessel.

Pressure-Vessel is a modified flatpak that tries to load whatever libraries are newer between the various steam-runtime versions and your actual OS files. I'm guessing it's having issues with the symlinking not always being followed like they should.

I would try mounting to ~/.steam, it's not a good solution for multiple users, but you can make the steam library outside of that mount just fine.

if you still have issues after trying that, run steam from the terminal, then after the game crashes post whatever you think is relevant.

And lastly a big thank you @rcornwell since you also play 7 days to die and posted your hours, I peeked at mine and fate smiled at me.

Untitled

@rcornwell
Copy link

I got steam running several games with local directory, however I can't seem to get the anti-cheap to run correctly. I am the only user on my machine. I do have test user account, however I can only play sound on one user at a time. Note I am trying to disable AntiCheat software, just make it run so I can run the game.

@tsukasa1234567
Copy link

Anticheat's all work fine over here, you can disable loading it in 7dtd with the start popup. Go in Properties on the game in the steam library side panel, and under launch options on the right, pick show launcher. I don't run any client sided mods, so maybe that makes a difference.

@smcv
Copy link
Contributor

smcv commented May 20, 2024

This issue is about steamwebhelper specifically. If you are having problems with any game (7 Days to Die or otherwise) then that is not this issue: if Steam gets far enough into launching that you are able to start games, then the steamwebhelper must be working correctly.

@rcornwell
Copy link

It still is an issue. I got steam to work by messing around. I now have steam installed on a local disk with symbolic links in my home directory from .steam and .local/share/Steam to the local drive. I am still unable to install steam on my NFS mounted home directory like I have been able to do for over a year prior. These are work arounds not fixes. As a Unix/Linux developer who has been using Linux since basically day one, and Unix a long time before. I see no reason I can't install steam on a NFS filesystem, I also see no reason things should break when I install it on a local disk. Something in the update that was pushed out at beginning of May caused things to break.

@tsukasa1234567
Copy link

@smcv steamwebhelper crashes don't lead to steam not loading for me, if I let it run with the crash dialog up, steam does indeed load in a broken state that does allow at least some gameplay, but there's no networking features like the friends list refuses to go online. I did previously state that. If there's anything I can do to help troubleshoot the issues please let me know.

@rcornwell
Copy link

It still is an issue. I got steam to work by messing around. I now have steam installed on a local disk with symbolic links in my home directory from .steam and .local/share/Steam to the local drive. I am still unable to install steam on my NFS mounted home directory like I have been able to do for over a year prior. These are work arounds not fixes. As a Unix/Linux developer who has been using Linux since basically day one, and Unix a long time before. I see no reason I can't install steam on a NFS filesystem, I also see no reason things should break when I install it on a local disk. Something in the update that was pushed out at beginning of May caused things to break.

@MrFrog222
Copy link

Do "steam -no-browser"

@rcornwell
Copy link

The problem is it keeps looping during the install. It never brings up the browser. Installed on local disk on same machine it works fine. Until it updated early this month it worked fine on NFS system. This is installation issue, not a run time issue.

@tsukasa1234567
Copy link

@MrFrog222 adding -no-browser has no effect here, vmtouch is the only work around with any real impact for me. Keep the suggestions coming please.

@MrFrog222
Copy link

@tsukasa1234567 for me this was the only thing that worked in combination with installing steam-native-runtime but i dont know if thats important

@BellRampion
Copy link

@smcv steamwebhelper crashes don't lead to steam not loading for me, if I let it run with the crash dialog up, steam does indeed load in a broken state that does allow at least some gameplay, but there's no networking features like the friends list refuses to go online. I did previously state that. If there's anything I can do to help troubleshoot the issues please let me know.

I am seeing the same issue. Ubuntu 20.04, suddenly cannot see family-shared games, pull up the friends list, or do anything else. I do get the steamwebhelper crash notification and then the UI loads anyways.

@DiarrheaMcgee
Copy link

-no-browser doesnt fix anything for me
also for some reason now whenever i try to open some of the games that were not working instead of just crashing it freezes and then i have to restart xorg to get opengl or vulkan working again

@DiarrheaMcgee
Copy link

i just copied ~/.local/share/Steam to another ssd using the xfs filesystem and i changed the stuff in ~/.steam to link to the stuff in /mnt/steam
it is actually using that directory but it still doesnt work even though using another distro without zfs does

@tsukasa1234567
Copy link

From what I've seen trying to debug and workaround various issues with steam, the pressure-vessel bubble-wrap flatpak containerization breaks pretty easily when when symlinks are added. If you instead mounted that ssd onto ~/.local/share/Steam it's got a pretty decent chance of working.

@DiarrheaMcgee
Copy link

DiarrheaMcgee commented Jun 8, 2024

mounting /dev/nvme0n1 to ~/.local/share/Steam does seem to work

EDIT:
a reboot fixed most of the games on there when using this workaround but some native games still dont work

@oscen0
Copy link

oscen0 commented Jun 12, 2024

I was able to run just symlinking ~/.local/share/Steam and others in ~/.steam to a local ext4 directory (Steam v021-20240524-1716584667 NVidia 550.78 Fedora 40 xfce4).

The 3GB of local space for Steam app is a small price to pay for allowing game storage to be from nfs share: Settings -> Storage, click current local drive, Add Drive, choose nfs mount path where games are stored, click ... button, Make Default.

@rcornwell
Copy link

I tried this also, it would not let me select the NFS store to save games in. Also is saved games under ~/.local/share/Steam rather then ~/.local/share where it did before. So we are looking at a large amount of storage on my local disk. My local disk is not much bigger then I need for the system. And I can flush it if my system gets messed up. With Steam on it I would have to back it up before trashing it.

@tsukasa1234567
Copy link

I have a few machines with no hard drives at all, so the only alternative that I tried was using zram/ext4 mounted at ~/.local/share/Steam, it works for using steam without the steamwebhelper crashes, but it won't simply let me pick a folder for using as the steam library, and I'm not about to install games to RAM lol.

@tsukasa1234567
Copy link

So, steam had an update in the last few days, luckily I got to grab some good games from the steam sale before rebooting. Now even using vmtouch doesn't seem to be helping the chances of getting into steam without steamwebhelper crashing. I used to be able to just leave the error window alone, and it would load into steam with friends and some other things not working, but I could play games still.

But now I get this
SteamHoliday

@smcv
Copy link
Contributor

smcv commented Jul 8, 2024

But now I get this

Please try to copy/paste text, instead of pictures of text: pictures of text aren't searchable or available to accessibility tools.

The error message shown in the screenshot is:

Something went wrong while displaying this content. _Refresh_

Error Reference: Shared SteamUI_9015621_3b4c674e2a32af39

Cannot read properties of undefined (reading 'GetPlayer')

This is a different issue, which is not the same thing discussed here. It's probably best if you report it separately.

(The issue described in the initial report here is the "Steamwebhelper is not responding" crash dialog. In the error message shown in your screenshot, the steamwebhelper must be communicating with Steam successfully - otherwise the window content would be blank - but the web content that it's displaying is not working as intended. The fact that it mentions properties of undefined suggests that something has gone wrong at the Javascript level.)

@TheArcaneBrony

This comment was marked as off-topic.

@tsukasa1234567
Copy link

Sorry, I'll try to copy the text next time, also, I still had the error window with steamwebhelper has crashed, like I said before usually I can still play games by ignoring that error window. So steamwebhelper was not communicating correctly. Don't worry about it too much, it seems to be a kernel bug, I tried updating mesa, reboot, test, then kernel, reboot, test. I didn't see anything in the shortlog's that seemed fitting. I'll try to stay more on topic.

Are there any links you can share about pressure-vessel's general execution flow? I'd like to mess around with it since not much has been discovered on what happened to cause this issue way back in Jan. I'll do some digging, but tips from the pro would be helpful

@tsukasa1234567
Copy link

CrashID=bp-dfd0d27a-89ab-43fb-965a-9a24e2240713
reinstalled steam, it wouldn't work for me to even sign-in without the steamwebhelper crash dialog.
Then I made a zram/ext4 drive, installed steam to it, everything worked fine, used about 3.5 gigs of ram.
couldn't add a library folder anywhere, after I pick a location it just instantly crashes.
copied the same install back to nfs, getting the steamwebhelper crashes again, but now I'm seeing this in the terminal repeated quite a bit
src/clientdll/compatmanager.cpp (1104) : GetCompatibilityToolCommandLineInternal: inconsistent state! do not call while cache off job is running.

@tsukasa1234567
Copy link

Looks like there are multiple crash ID's so I ran it again to paste all of them.
CrashID=bp-ccb122bf-1578-4ee0-a8ab-6f19d2240713
CrashID=bp-d4d9bc45-8a55-42db-a3bf-b78ef2240713
CrashID=bp-bbb68945-13e1-4339-9c4d-e0c122240713
CrashID=bp-9d461bb6-4285-489c-98c1-ba8d62240713

That's in 1 start up attempt, yes it opens a steamwebhelper crash dialog, and yes it open the black screen saying
Something went wrong while displaying this content. Refresh
Error Reference: Shared SteamUI_9015621_3b4c674e2a32af39
Cannot read properties of undefined (reading 'GetPlayer')

What other information can I provide that'll help get this resolved?

@tsukasa1234567
Copy link

I've made lots of progress, I can get steam to start without error's now on the 2nd attempt. The issue is with the tmp-RANDOM folders being created, I haven't figured out the why part yet at all. Forcing only those to tmpfs wastes about 3.5 gigs of ram, so it's not ideal, but sure beats looping restarts at around 40 seconds a piece with closing the errors and all.

@smcv do you have any ideas how I can narrow down things any further? The 2 tmp folders have the issue.

@tsukasa1234567
Copy link

So the 1st tmp-blah folder is created trying to load steam. After the crash, reloading steam usually works. If it doesn't I have to run an "env-update && . /etc/profile" to regenerate the ld.so cache, then do it again and it works.

The 2nd tmp-blah folder seems to be generated by proton, I made a file name listing of both tmp-blah folders and compared them, they are identical. So it might be worth looking into to have proton reuse the already generated tmp folder of which libraries to load, that would be 1.7 GB wasted instead of 3.5 GB

@tsukasa1234567
Copy link

I believe all that's needed is to check that the fallback copying of the libs is complete before moving on, I get the "For best results" ... "should both be on the same fully-featured Linux filesystem" For quite awhile after the steamwebhelper crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests