Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot update games over NFS (locking issue?) #5788

Closed
DataBeaver opened this issue Sep 27, 2018 · 57 comments
Closed

Cannot update games over NFS (locking issue?) #5788

DataBeaver opened this issue Sep 27, 2018 · 57 comments
Assignees

Comments

@DataBeaver
Copy link

DataBeaver commented Sep 27, 2018

Your system information

  • Steam client version: Sep 26 2018, at 22:21:27
  • Distribution: Debian unstable, kernel 4.16.5
  • Opted into Steam client beta?: Yes
  • Have you checked for system updates?: Yes

Please describe your issue in as much detail as possible:

I started Steam today after not using it for a couple of weeks, only to find that game updates were failing with "disk write error". Poking it with strace revealed this:

5092 openat(AT_FDCWD, ".../.steam/SteamApps/downloading/state_250820_250823.patch", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0777 <unfinished ...>
5092 <... openat resumed> ) = 54
5092 flock(54, LOCK_SH) = -1 EBADF (Bad file descriptor)

Trying to start SteamVR from the title bar gives an error dialog with that file name, so this seems to be a likely culprit.

I made a simple test program which opens a file in write-only mode and tries to acquire a shared lock on it. It works on a local filesystem but fails on NFS. The manpage for flock says that the open mode should not matter, so it seems the actual error is in the Linux kernel. However I've so far been unable to trace the exact location of the check that causes this error. It does seems a bit strange to use a shared lock with a write-only file, and LOCK_EX actually does work over NFS.

@albanpeignier
Copy link

Same problem with debian unstable, kernel 4.17.0-3 or 4.18.0-1, with steam stable or beta client.

A possible workaround is moving the downloadings directory in a local filesystem :

cd .steam/steam/steamapps/
mv downloading /var/tmp/
ln -s /var/tmp/downloading

Games are now downloading normally.

@Reisen-Wandern-Tauchen
Copy link

Same issue on OpenSuse Tumbleweed, kernel 4.18.10-1 with steam beta client
Fortunately, the workaround is working for me too.

@cmoncure
Copy link

Same issue on Arch Linux; workaround is successful.

@Gooberpatrol66
Copy link

Pretty sure I'm having the same issue. Gentoo, 4.14.65.

@LubosD
Copy link

LubosD commented Oct 6, 2018

Same issue here, please fix it!

@LubosD
Copy link

LubosD commented Oct 6, 2018

My workaround - because I run Steam on a diskless system and I don't have enough space in my tmpfs.

fakeflock.c:

int flock()
{
    return 0;
}

Compile as a 32-bit lib:

gcc -shared -m32 -o libfakeflock.so fakeflock.c

and then run Steam with LD_PRELOAD=/path/to/libfakelock.so.

@milon21
Copy link

milon21 commented Oct 13, 2018

Yes I can confirm this issue. Nfs server is Debian9, steam client is Debian9. Kernel 4.14.71. Tested stable and beta client with the same result.
Fileflock workaround works for me, nice work, many thanks for that :-)

@Phalen
Copy link

Phalen commented Oct 16, 2018

same for ubuntu 18.04 x64 please fix this and the nfs size issues!

I unfortunatly got errors when trying to flock option

@vollschauer
Copy link

Same problem on Gentoo with nfs4

@lcarlier
Copy link

lcarlier commented Oct 25, 2018

Same issue here on Ubuntu 16.04. Workaround of LubosD worked for me.
Command that I used to compile the library is
gcc -shared -o libfakeflock.so fakeflock.c -m32
I had to install gcc-multilib package.

@glabifrons
Copy link

The library hacks worked, but this definitely is still an issue.
Ubuntu MATE 16.04 (issues on my entire family's computers) with NFSv4 mounted from Solaris 11.3 server (ZFS backed NFS).

@jrsantos
Copy link

jrsantos commented Nov 3, 2018

Same problem here. The flock() does not work on NFSv4. It could be made to work on NFSv3 with the local_lock=flock mount option but this option does not work on NFSv4. Seems like the solution would be for the Steam client to use POSIX locks through the fcntl() calls.

This should be relatively easy to fix depending on the code structure. Please fix. Pretty please with sugar on top.

@tkln
Copy link

tkln commented Nov 5, 2018

I'm also seeing this issue on my system. Changing the mount options as suggested by @jrsantos seems to work.

@olavgg
Copy link

olavgg commented Nov 6, 2018

Same issue here on Ubuntu 18.04 4.15.0-38-generic with a FreeBSD 11.2 ZFS NFSv4
Neither the symlink or fakeflock seems to works.

EDIT: Symlink works if you move the download folder to a local disk, not another NFS drive 😅

@dusares
Copy link

dusares commented Nov 16, 2018

Same issue here with Ubuntu 18.04 as game client and NFSv4 server. Symlinking the downloading folder works of course, but is not really a solution. Do we have a status on a fix?

@L0rdBootysniffer
Copy link

L0rdBootysniffer commented Nov 22, 2018

The solution @albanpeignier proposed partially fixes the problem. I'm extremely convinced that this has something to do with Proton at this point. Games that are native to Linux install without a hitch, but the installation of games that require the Proton compatibility tool locks every time. Can anyone else try to replicate my suggestion? The other solutions surrounding flock do not work for me.

The client machine is Arch 4.19.2, and the server is Arch w/ ZFS on Linux mounted over NFSv4.

@Oblomov
Copy link

Oblomov commented Nov 24, 2018

The issue isn't limited to Proton games, I only have native games and all of them fail to update due to the lock issue.

@jagdtigger
Copy link

Hi all.

Also encountered this, the symlink trick solved it but its not ideal since the minipc i use has a single 250GB NVME drive. Client: Ubuntu Mate 18.10, Server: Synology DSM, NFS version: 4.

@kisak-valve
Copy link
Member

Although unlikely, it may be worthwhile to check if https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=fde872682e175743e0c3ef939c89e3c6008a1529 influences this issue.

@mianosm
Copy link

mianosm commented Dec 26, 2018

Same problem with debian unstable, kernel 4.17.0-3 or 4.18.0-1, with steam stable or beta client.

A possible workaround is moving the downloadings directory in a local filesystem :

cd .steam/steam/steamapps/
mv downloading /var/tmp/
ln -s /var/tmp/downloading

Games are now downloading normally.

This is a great workaround, however:

[mianosm@h03 ~]$ df -hPT /var
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/fedora_h03-root ext4 49G 28G 20G 59% /

Trying to get a game like X-Plane, or anything else larger than 20G will now kill my root partition, and I'm slightly averse to creating a new volume for var on my gaming machine (funny enough due to not having enough space to really allocate an additional amount to it)...

@cilki
Copy link

cilki commented Dec 29, 2018

@kisak-valve That commit was just merged into the mainline: torvalds/linux@f6b1495

If it's a client-side issue, I can test a nightly build soon. If it's server-side, then I'll unfortunately have to wait a while for QNAP to patch their kernel.

Edit: updating to the latest nightly kernel made no difference in the error.

@MasterCATZ
Copy link

MasterCATZ commented Dec 30, 2018

decided to play a game as its x-mas / new years break
and find my mount containing 36T of "Steam Library" is broken
Ubuntu 18.04
kernels 4.15 ~ 4.20

everything latest updates + beta proton

even this failed
mount -o bind /mountednetworksharepath /localdrivepath

@nyanloutre
Copy link

nyanloutre commented Jan 2, 2019

@MasterCATZ if you tried to play Proton games there is another issue with the startup script that hang when running game stored on an NFS share: ValveSoftware/Proton#987

@MasterCATZ
Copy link

MasterCATZ commented Jan 19, 2019

for me it might even be related to another issue with my mergerfs which contains 96 3tb drives
as I could not get games working over the network I did a direct connection to my main array for programs with 4x 12gbs links and games would not fireup

the games silently crashing was not helping me find the root cause

removing "use_ino" seemed to allow some more to load up, but that is a setting I really want to keep on because of my hard-linking

mmap does not work when "direct_io" is enabled so that had to go as well

/mnt/SnapRaidShelves/* /SnapRaidShelves fuse.mergerfs comment=x-gvfs-show,rw,defaults,allow_other,use_ino,category.create=mfs,moveonenospc=true,minfreespace=20G,fsname=SnapRaidShelves 0 0

now I have to go back through everything and unbreak all my GPU drivers again a lot of proton games are hanging shortly after loading up

I pretty much gutted my old steam install and now have to re-tweak everything again all because some mergerfs issue has recently popped up

@DataBeaver
Copy link
Author

I created an improved flock wrapper which translates the flock calls to fcntl(F_SETLK) calls, making sure that the lock type makes sense within the context of the file's open mode (no read locks on files open for writing only and vice versa). It appears to solve the problem for me and Steam is happily updating games again.

The source code is at https://gist.github.com/DataBeaver/0aa46844c8e1788207fc882fc2a221f6. Compile with gcc flock_to_setlk.c -o flock_to_setlk_32.so -shared -m32 and load with LD_PRELOAD as with the previous fakeflock solution.

I also took another go at reading the kernel source code and found that flock on NFS is emulated with F_SETLK, which checks the lock type against the open mode of the file. The NFS protocol does not support flock and the kernel developers have taken the cautious route of not altering the requested lock type but failing the call instead. I think this is a sensible approach for the kernel and the real solution here is to fix Steam to use consistent lock types.

@bemug
Copy link

bemug commented Mar 14, 2019

Just fell into this. Still not fixed.

The symlink does not solves the issue.
I'll try @DataBeaver workaround.

Edit: Nevermind i just needed to mkdir /var/tmp/downloading

@antpk
Copy link

antpk commented Mar 17, 2019

I think this also affects verifying games. I'm seeing steam report files as corrupt. Will be testing and let people know.

@antpk
Copy link

antpk commented Mar 17, 2019

@LubosD You solution worked for me and I can verify games. One thing to note. Some games I had to uninstall and reinstall for it to work.
@DataBeaver I tried your solution and was hoping it would work but alas downloading and verification errors.

@Oblomov
Copy link

Oblomov commented May 4, 2019

Running the current Steam client beta, now it seems that even moving (and symlinking) the download directory is broken, all my updates have stalled again.

@eqvinox
Copy link

eqvinox commented Jul 14, 2019

Copied over from #6392 (didn't see this issue):

Please describe your issue in as much detail as possible:

The Steam downloader (for client updates, game downloads as well as workshop downloads) seems to be locking files using flock(LOCK_SH). However, the files seem to be opened using open(O_WRONLY), which does not work on NFS(v4) file systems. Files must be opened as open(O_RDWR) to take a LOCK_SH lock.

On NFSv4:

  • LOCK_SH only works if the file is opened for reading
  • LOCK_EX only works if the file is opened for writing

In both cases, the "other" direction is irrelevant, i.e. O_RDWR works in all cases.

Steps for reproducing this issue:

  1. mount directory used by steam itself (or some game) over NFS(v4)
  2. try downloading anything
  3. the download fails ("Disk write error")

The problem occurs with these directories:

  • ...steam/steamapps/downloading
  • ...steam/steamapps/workshop/downloads
  • ~/.local/share/Steam

Steps for fixing this issue:

  1. grep through your code base for uses of LOCK_SH
  2. make sure the fds used in these flock() calls are opened for read-write access (O_RDWR), not write-only access (O_WRONLY).

@eqvinox
Copy link

eqvinox commented Jul 14, 2019

Great copy-over collision 😁

@eqvinox
Copy link

eqvinox commented Jul 14, 2019

For reference, this is the relevant Linux kernel code:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/nfs/nfs4proc.c#n7055

LOCK_SH => F_RDLCK,
LOCK_EX => F_WRLCK

(To my knowledge this is a limitation in the NFSv4 protocol, i.e. the kernel has no other choice there as the operation would fail server-side otherwise.)

Either way this is a bug that needs to be fixed in the Steam codebase. Also, I need to point out that a LOCK_SH on a file opened write-only has really no purpose. Maybe you meant to use LOCK_EX? To prevent simultaneous writing to the same file?

@eqvinox
Copy link

eqvinox commented Jul 14, 2019

Here are some bonus info nuggets:

I made a simple test program which opens a file in write-only mode and tries to acquire a shared lock on it. It works on a local filesystem but fails on NFS. The manpage for flock says that the open mode should not matter, so it seems the actual error is in the Linux kernel. However I've so far been unable to trace the exact location of the check that causes this error. It does seems a bit strange to use a shared lock with a write-only file, and LOCK_EX actually does work over NFS.

The manpage is simply incorrect because it fails to consider that NFSv4 maps flock() to fcntl(F_SETLK) internally. The fcntl man page correctly states:

In order to place a read lock, fd must be open for reading. In order to place a write lock, fd must be open for writing. To place both types of lock, open a file read-write.

(This worded slightly ambiguously — opening a file read-write works for all locks.)

<serveraddr>:<export> <destination> nfs defaults,local_lock=all,vers=3 0 0
Gets games downloading for me

The NFSv3 locking protocol is completely different from NFSv4. It might even work without the local_lock option, but I haven't tried.

To my knowledge this is a limitation in the NFSv4 protocol, i.e. the kernel has no other choice there as the operation would fail server-side otherwise.

https://tools.ietf.org/html/rfc7530#section-16.10.5

LOCK operations are subject to permission checks and to checks against the access type of the associated file. However, the specific rights and modes required for various types of locks reflect the semantics of the server-exported file system, and are not specified by the protocol. For example, Windows 2000 allows a write lock of a file open for READ, while a POSIX-compliant system does not.

@c-janousek
Copy link

Does anyone know if there are worthwhile bugs to follow upstream?

@eqvinox
Copy link

eqvinox commented Aug 9, 2019

Does anyone know if there are worthwhile bugs to follow upstream?

There aren't because this is not an upstream bug. It's a bug in Steam.

(Unless someone finds out the bug is in a library that Steam uses, but so far there is no indication of that.)

@TTimo
Copy link
Collaborator

TTimo commented Jan 8, 2020

Sorry, this fell through the cracks. Next Steam client beta (> Dec 20) will have a fix.

@kisak-valve
Copy link
Member

Hello, per "Fix for Steam Library on some NFS mounts" in the 2019-01-09 Steam client beta update, please opt into Steam's beta and retest.

@john-tho
Copy link

john-tho commented Jan 9, 2020

With client updated to 2019-01-09,
and using default library location in an NFS mountpoint

An update download & new install both worked for me without an flock_to_setlk LD_PRELOAD.

These had failed with DISK_WRITE_ERROR without flock_to_setlk on the Dec 20 client.

Cheers!

@Omar007
Copy link

Omar007 commented Jan 9, 2020

Before beta update to 2019-01-09 release the updates I had got stuck on DISK WRITE ERROR.
Verified this still being the case just now with 'Onward' still getting that error.

After update to the 2019-01-09 beta, the same Onward update was applied successfully. LGTM as well :)

@glabifrons
Copy link

Confirmed!
I just updated the Steam beta client, shut it down, started it using the conventional method (instead of the script I've been using) and was able to install two games (one was ~22GB). :)

Thank you very much for fixing this!

@DataBeaver
Copy link
Author

Confirmed working. I updated the Steam client and restarted it without the wrapper library. Games installed on NFS are now updating properly.

I'm glad to see there are still developers who care about NFS.

@anthr76
Copy link

anthr76 commented Jan 12, 2020

Has this been pushed to flatpak yet ? I'm still receiving the error using autofs

Edit: Opting into beta client kicked off all the downloads. Thank you ~!
Edit2: Some downloaded successfully but im still getting disk write error on others. Will investigate later.

@And4713
Copy link

And4713 commented Jan 21, 2020

This still does not work for me after yesterdays update (package version 1579321278 on Ubuntu 18.04). Editing the fstab to allow nfsv4 and rebooting results in the following for every acf file during client startup, which takes longer as a result and doesn't show games installed on the mount:
flock /mysteamappspath/appmanifest_appid.acf LOCK_SH failed. errno = 37
Using nfsv3 and local_lock=all continues to operate properly.

@DataBeaver
Copy link
Author

Updating continues to work for me with Steam package version 1579321278 on an NFSv4 mount. It should be noted that errno 37 corresponds to ENOLCK rather than EBADF as in the original report. This suggests that the NFS mount does not allow locks for some reason. Are other program able to use locks (and particularly the flock variant) on it?

@And4713
Copy link

And4713 commented Jan 21, 2020

@DataBeaver What would be the best way (in regards to this issue) to test this.

@DataBeaver
Copy link
Author

Upon closer inspection of the flock(2) man page, ENOLCK refers to lack of memory. Which would normally seem pretty strange, but I wonder if file lock limits (ulimit -x) can cause that to happen. Can you check the value of that limit?

You can also use the flock utility to see if locks work. Since the number of simultaneous locks may be an issue, try it at a time when Steam is trying and failing to acquire locks.

If these do not reveal anything, an strace of Steam may be useful (with the -f flag so threads and child processes are included).

@kisak-valve
Copy link
Member

Hello @And4713, please open a separate issue report so your follow up issue can be tracked properly.

@And4713
Copy link

And4713 commented Jan 22, 2020

@DataBeaver Using flock --verbose -n -e ./appmanifest_8000.acf -c 'sleep 6' succeeds on both client and server and ulimit -x returns unlimited, however:
@kisak-valve before opening the issue I noticed something I may have overlooked. I cannot start Steam to be sure until I am physically in front of that machine again. (~4-5 hours for that)

@And4713
Copy link

And4713 commented Jan 22, 2020

@DataBeaver @kisak-valve Problem was oversight. Thanks for helping me stumble over the reason.
Detail:
The mount behavior of the fstab falls back to nfsv3 without failing even when fstype is nfs4 if there are any v3 only options remaining. In this case not re-removing mountproto=tcp is considered v3 only despite nfsv4 using tcp anyway. This becomes apparent when mount is forced to use specific option combinations. Of course v3 has inferior locking capability so it wasn't working.

@bphd
Copy link

bphd commented Sep 2, 2021

Trying to install MSFS 2020

GameAction [AppID 1250410, ActionID 1] : InstallApps changed task to CreateNextApp with ""
GameAction [AppID 1250410, ActionID 1] : InstallApps failed with AppError_11 with ""

This is if I try to install from root of my nVME. But if I choose a folder more deep into my nVME who already has the game, Steam try to update it just fine, producing other kind of errors about missing files, but not about location

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests