Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RPi5] [rpi-6.14.y] kernel panic with some PCIe I/O activity #6696

Closed
ElDavoo opened this issue Mar 2, 2025 · 5 comments
Closed

[RPi5] [rpi-6.14.y] kernel panic with some PCIe I/O activity #6696

ElDavoo opened this issue Mar 2, 2025 · 5 comments

Comments

@ElDavoo
Copy link

ElDavoo commented Mar 2, 2025

Describe the bug

Hi,
On 6.14 kernels, some I/O activity over PCIe (I guess any kind of activity?) results in a kernel panic.

Doesn't happen on any other branch (6.13, 6.12...)

Steps to reproduce the behaviour

I didn't investigate a precise pattern for triggering it, what I do is starting the bees deduplication daemon that does RW activity, after some seconds the kernel crashes.

Device (s)

Raspberry Pi 5

System

Raspberry Pi reference 2024-11-13
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 919f1e86b959321edaa8266ee271e5d0870f5298, stage2
2025/02/12 10:51:52 
Copyright (c) 2012 Broadcom
version f788aab6 (release) (embedded)

Linux pi5 6.14.0-rc4-v8-16k+ #68 SMP Sun Mar 2 12:23:52 CET 2025 aarch64 GNU/Linux

Logs

kmsg.txt

Additional context

No response

@pelwell
Copy link
Contributor

pelwell commented Mar 2, 2025

The log you posted includes the following message:

read-write for sector size 4096 with page size 16384 is experimental

Please repeat your test with kernel=kernel8.img in config.txt, which will switch your environment to the usual 4kB pages.

@ElDavoo
Copy link
Author

ElDavoo commented Mar 3, 2025

The log you posted includes the following message:

That message has been useless for many releases, they're planning it to remove it in a few versions.

Same thing happened with 4kb page size anyway. I'll try to bisect.

kmsg2.txt

@popcornmix
Copy link
Collaborator

popcornmix commented Mar 3, 2025

Might be worth trying with #6675.

@ElDavoo
Copy link
Author

ElDavoo commented Mar 4, 2025

You're right, with #6675 it's behaving normally.

@P33M
Copy link
Contributor

P33M commented Mar 4, 2025

The associated patch set downgrades a PCIe read failure (timeout, or negative response) to all 1s as the read data - which would also make the nvme driver complain and offline the controller. As you don't get that, the other feature of the patch set is to greatly extend the read completion timeout. The drive is probably just slow to generate responses under certain conditions.

@P33M P33M closed this as completed Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants