Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel 5.15.5: [drm] *ERROR* CPU pipe A FIFO underrun on Intel Mesa #89

Open
why-not-try-calmer opened this issue Dec 3, 2021 · 8 comments
Assignees

Comments

@why-not-try-calmer
Copy link

Distribution (run cat /etc/os-release):
Pop!_OS 21.04.

Related Application and/or Package Version (run apt policy $PACKAGE NAME):
Linux kernel.

Issue/Bug Description:
Show-stopping graphic glitches with pinkish horizontal lines blurring the bottom half of the screen, as if this part of the screen could not be refreshed properly. The system is not usable in that state.

Relevant error log: pop-os kernel:

pop-os kernel: i915 0000:00:02.0: [drm] *ERROR* CPU pipe A FIFO underrun

I was not able to extract more specific logs.

Steps to reproduce (if you know):
Boot the machine. Every hour or so, the bug occurs, making the machine unusable expect by rebooting.

Expected behavior:
No issue at all on 5.13.

Other Notes:

lspci
> 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)

I also have an NVIDIA GPU but not using it (nouveau.modeset=1 and no proprietary driver used either).

@jackpot51 jackpot51 transferred this issue from pop-os/pop Dec 3, 2021
@jackpot51
Copy link
Member

Could you describe if there are any specific steps you take before the issue happens? Also, could you describe your hardware?

@jackpot51 jackpot51 self-assigned this Dec 3, 2021
@why-not-try-calmer
Copy link
Author

why-not-try-calmer commented Dec 3, 2021

Could you describe if there are any specific steps you take before the issue happens? Also, could you describe your hardware?

Hello Jeremy, very honored!

My hardware: lwhw.txt

Steps taken before the issue happens: Actually nothing special, writing code on VSCode with email client and web browser running in the background. I noticed the issue seems to happen chronically -- about every hour or every other hour.

Not sure if that happens but it resonates quite a bit with https://bbs.archlinux.org/viewtopic.php?id=263720. I've tried to adapt the fix by adding intel_idle.max_cstate=3 to my kernel parameters; the only difference is that with the extra param I am able to avoid rebooting -- the screen will go back to normal after some pain and contorsions.

@jackpot51
Copy link
Member

Which web browser?

@why-not-try-calmer
Copy link
Author

why-not-try-calmer commented Dec 3, 2021

Which web browser?

Brave from https://brave-browser-apt-release.s3.brave.com/, V1.32.113

@why-not-try-calmer
Copy link
Author

I've looked into #86 and the solution mentioned there fully solves all issues mentioned here. Kudos to @spxak1 for his work!

I think you can mark this Issue as a duplicate of #86, or I can close it.

@spxak1
Copy link

spxak1 commented Dec 7, 2021

I've looked into #86 and the solution mentioned there fully solves all issues mentioned here. Kudos to @spxak1 for his work!

I think you can mark this Issue as a duplicate of #86, or I can close it.

Ah, you said Intel HD 530, that's 6th Gen Intel, I should have picked that up. It's part of the same issue apparently. Thanks for confirming.

Hopefully this will be fixed soon, and the fact that disabling VT-d solves it, is probably a pointer for the devs.

Thanks again.

@spxak1
Copy link

spxak1 commented Dec 14, 2021

I can confirm this is an issue with Haswell (4th Gen) Intel CPUs too.

When audio is via the HDMI, all video playback will stop (or won't play). Also there is no audio at all (video playback or not).

The problem is solved with any kernel before 5.15.5 (5.15.0, 5.15.4 and 5.13 tested) or by disabling VT-d in the BIOS.

Which means this is related to #86.

Here a screen capture of the issue in all its glory: https://www.youtube.com/watch?v=XprqYp9iMtc

13r0ck pushed a commit that referenced this issue Mar 13, 2023
commit 996d120 upstream.

If IOMMU domain for device group is not setup properly then we may hit
IOMMU page fault. Current page fault handler assumes that domain is
always setup and it will hit NULL pointer derefence (see below sample log).

Lets check whether domain is setup or not and log appropriate message.

Sample log:
----------
 amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 6
 BUG: kernel NULL pointer dereference, address: 0000000000000058
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Not tainted 6.2.0-rc2+ #89
 Hardware name: xxx
 RIP: 0010:report_iommu_fault+0x11/0x90
 [...]
 Call Trace:
  <TASK>
  amd_iommu_int_thread+0x60c/0x760
  ? __pfx_irq_thread_fn+0x10/0x10
  irq_thread_fn+0x1f/0x60
  irq_thread+0xea/0x1a0
  ? preempt_count_add+0x6a/0xa0
  ? __pfx_irq_thread_dtor+0x10/0x10
  ? __pfx_irq_thread+0x10/0x10
  kthread+0xe9/0x110
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2c/0x50
  </TASK>

Reported-by: Matt Fagnani <matt.fagnani@bell.net>
Suggested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216865
Link: https://lore.kernel.org/lkml/15d0f9ff-2a56-b3e9-5b45-e6b23300ae3b@leemhuis.info/
Link: https://lore.kernel.org/r/20230215052642.6016-3-vasant.hegde@amd.com
Cc: stable@vger.kernel.org
[joro: Edit commit message]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: "Limonciello, Mario" <Mario.Limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@why-not-try-calmer
Copy link
Author

It should be added that intel_idle.max_cstate=4 also works around the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants