Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timeout on systems with large numbers of loops #872

Merged
merged 1 commit into from
May 23, 2024

Conversation

waveform80
Copy link
Contributor

On systems with a very large number of snap packages installed (for example), there are a considerable number of loop devices. In this case, the lsblk command in linuxdrivelist fills the stdout pipe, blocks, and the rpi-imager process assumes it has timed out 1.

This PR is a trivial work-around that simply excludes loop devices (major=7) from the lsblk output. Given subsequent code 2 excludes everything starting with /dev/loop anyway, there should be no change in user experience with this exclusion other than rpi-imager not failing on such systems (with a very large number of loop devices).

However, it may be preferable to fix the call to lsblk so that it doesn't (potentially) block the main UI thread for up to two seconds and die when the lsblk output is sufficiently large to block the pipe (i.e. have it loop on QProcess.waitForReadyRead instead of just blocking on QProcess.waitForFinished). If this is preferred, I'm happy to submit a PR for that instead.

On systems with a very large number of snap packages installed, there
are a considerable number of loop devices. In this case, the `lsblk`
command in linuxdrivelist fills the stdout pipe, blocks, and the
rpi-imager process assumes it has timed out [1].

This is a trivial work-around that simply excludes loop devices
(major=7) from the `lsblk` output. Given subsequent code excludes
everything starting with `/dev/loop` anyway, there should be no change
in user experience with this exclusion.

[1]: waveform80/imager-snap#6
@maxnet
Copy link
Collaborator

maxnet commented May 21, 2024

In this case, the lsblk command in linuxdrivelist fills the stdout pipe

That isn't supposed to be happening as QProcess reads stuff to its own buffers that grow very well.

However, it may be preferable to fix the call to lsblk so that it doesn't (potentially) block the main UI thread

Code isn't run from main thread. (method that calls waitForFinished() is run from DriveListModelPollThread)

@kenvandine
Copy link

I don't think it actually blocks the UI, but it never populates the drive list.

@tdewey-rpi
Copy link
Collaborator

Happy with the workaround, thanks for the submission @waveform80

@tdewey-rpi tdewey-rpi merged commit b008150 into raspberrypi:qml May 23, 2024
@waveform80
Copy link
Contributor Author

In this case, the lsblk command in linuxdrivelist fills the stdout pipe

That isn't supposed to be happening as QProcess reads stuff to its own buffers that grow very well.

I was quite surprised at that; I'd thought the calling process needed to respond to readyRead in order for QProcess to continue reading from the process; i.e. that it would only fill a single limited-size buffer without intervention, but a quick experiment confirmed you're absolutely right (read from a little Python script that spews out infinite garbage, and QProcess merrily eats gigabytes of memory while the calling process twiddles its thumbs!).

Anyway, thanks for merging that!

@waveform80 waveform80 deleted the lsblk-timeout branch May 23, 2024 09:44
@maxnet
Copy link
Collaborator

maxnet commented May 23, 2024

Happy with the workaround

While this patch does not hurt to have, at the same time do note that it is unlikely to solve the actual problem reported.

Hint: we already print a debug warning message if lsblk takes longer than 1 second to complete.
But do not actually kill the process if it is less than 2 seconds. And even if killed, it retries a second later.

As you can see in kenvandine's debug output in the other issue, it is indeed true that lsblk got killed 4 times due to timeouts, but it did also run at least 6 times successfully.
If the disk did not appear at all even briefly, there is some other issue as well.
Be it that the security restrictions imposed by snap hide the drive (note that he is not running stock rpi-imager), or that there is another problem. E.g. that the drive is not seen as removable by lsblk and therefore is hidden by Imager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants