Block nvme0n1: No UUID available providing old NGUID

I am getting a lot of kernel messages on my Raspberry Pi 5, running Home Assistant OS, using the Argon Neo 5 m.2 nvme case with a Lexar NM790 2TB.

kernel: block nvme0n1: No UUID available providing old NGUID

I also get the following Error from time to time

kernel: pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
kernel: pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
kernel: pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00000040/00002000
kernel: pcieport 0000:00:00.0:    [ 6] BadTLP

My Pi5 eeprom is configured to include:

WAKE_ON_GPIO=0
POWER_OFF_ON_HALT=1
BOOT ORDER=0xf416
PCIE_PROBE=1

The /mnt/boot/config.txt is configured as follows:

usb_max_current_enable=1
dtparam=nvme
dptaram=pciex1_gen=3

NOTE: The Argon user manual says to use dtparam=pciex1_1=gen3, but that doesnā€™t work!

I also once got the following scary message:

kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
kernel: nvme nvme0: 4/0/0 default/read/poll queues

This is a very new setup, from today actually :slight_smile:
So far, everything seems to work. But I am afraid that these messages might mean that I have an unstable setup.

Can you help me troubleshoot?
Should I be worried? Is the Lexar NM790 not supported by the Neo 5 nvme hat?

As in the warning line recommended, please add this parameter to your kernel line in cmdline.txt file and see what happens.

:frowning: quote=ā€œfbsdmon, post:1, topic:2775ā€]
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
[/quote]

That didnā€™t help :frowning:

Here is what I configured in # cat /mnt/boot/cmdline.txt

zram.enabled=1 zram.num_devices=3 rootwait cgroup_enable=memory fsck.repair=yes console=tty1 root=PARTUUID=a3ec664e-32ce-4665-95ea-7ae90ce9aa20 ro rauc.slot=B systemd.machine_id=f7b27aa3ab9a4540aaa33dd76e829add nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

Here are some dmesg messages

[   33.779590] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[   33.779597] nvme nvme0: Does your device have a faulty power saving mode enabled?
[   33.779599] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[   33.815708] nvme0n1: I/O Cmd(0x2) @ LBA 1313794, 2 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815721] I/O error, dev nvme0n1, sector 1313794 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[   33.815740] nvme0n1: I/O Cmd(0x2) @ LBA 5687296, 128 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815746] I/O error, dev nvme0n1, sector 5687296 op 0x0:(READ) flags 0x83700 phys_seg 16 prio class 2
[   33.815761] nvme0n1: I/O Cmd(0x2) @ LBA 9881600, 56 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815767] I/O error, dev nvme0n1, sector 9881600 op 0x0:(READ) flags 0x83700 phys_seg 7 prio class 2
[   33.815778] nvme0n1: I/O Cmd(0x2) @ LBA 14075904, 72 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815784] I/O error, dev nvme0n1, sector 14075904 op 0x0:(READ) flags 0x83700 phys_seg 9 prio class 2
[   33.815794] nvme0n1: I/O Cmd(0x2) @ LBA 18270232, 16 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815800] I/O error, dev nvme0n1, sector 18270232 op 0x0:(READ) flags 0x83700 phys_seg 2 prio class 2
[   33.815808] nvme0n1: I/O Cmd(0x2) @ LBA 18270256, 16 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815813] I/O error, dev nvme0n1, sector 18270256 op 0x0:(READ) flags 0x83700 phys_seg 2 prio class 2
[   33.815820] nvme0n1: I/O Cmd(0x2) @ LBA 18270320, 16 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815826] I/O error, dev nvme0n1, sector 18270320 op 0x0:(READ) flags 0x83700 phys_seg 2 prio class 2
[   33.815833] nvme0n1: I/O Cmd(0x2) @ LBA 22464560, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815838] I/O error, dev nvme0n1, sector 22464560 op 0x0:(READ) flags 0x83700 phys_seg 1 prio class 2
[   33.815845] nvme0n1: I/O Cmd(0x2) @ LBA 26658816, 16 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815851] I/O error, dev nvme0n1, sector 26658816 op 0x0:(READ) flags 0x83700 phys_seg 2 prio class 2
[   33.815858] nvme0n1: I/O Cmd(0x2) @ LBA 26658848, 16 blocks, I/O Error (sct 0x3 / sc 0x71)
[   33.815863] I/O error, dev nvme0n1, sector 26658848 op 0x0:(READ) flags 0x83700 phys_seg 2 prio class 2
[   33.843609] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[   33.849950] nvme nvme0: 4/0/0 default/read/poll queues
[   36.487094] audit: type=1334 audit(1714466591.599:14): prog-id=15 op=LOAD
[   66.547581] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[   66.547588] nvme nvme0: Does your device have a faulty power saving mode enabled?
[   66.547591] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[   66.595617] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[   66.601977] nvme nvme0: 4/0/0 default/read/poll queues

And here is the # journalctl -b |grep "nvme "

Apr 30 08:42:36 homeassistant kernel: nvme nvme0: pci function 0000:01:00.0
Apr 30 08:42:36 homeassistant kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Apr 30 08:42:36 homeassistant kernel: nvme nvme0: missing or invalid SUBNQN field.
Apr 30 08:42:36 homeassistant kernel: nvme nvme0: allocated 32 MiB host memory buffer.
Apr 30 08:42:36 homeassistant kernel: nvme nvme0: 4/0/0 default/read/poll queues
Apr 30 08:43:08 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Apr 30 08:43:08 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Apr 30 08:43:08 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Apr 30 08:43:08 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Apr 30 08:43:08 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
Apr 30 08:43:41 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Apr 30 08:43:41 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Apr 30 08:43:41 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Apr 30 08:43:41 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Apr 30 08:43:41 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues

Seems to have made it worst :frowning:
Also, the block nvme0n1: No UUID available providing old NGUID messages seem more frequent now.

Any ideas what I could try?

I think the UUID message can be ignored, if itā€˜s only happen one time during boot. But the other messages indicates an issue at the PCIe bus. This flexible pcb (cable) is very sensitive against other devices emitting high frequencies in the near of it. Do you have Bluetooth or WiFi enabled at your RPi5?

Also itā€˜s important that your bootloader and firmware/kernel is up to date. I donā€™ know how current your Home Assistant OS installation is.
Do you use already Home Assistant OS 12.3.rc2?
If yes, are you see a chance to test it first with another SD card and PiOS on it?

Because the RPi5 isnā€˜t certified for Gen 3 you can also try this
dptaram=pciex1_gen=2
or
dptaram=pciex1_gen=1
and see if the errors are gone after that.

I get the ā€œNo UUID available providing old NGUIDā€ error constantly.

# journalctl -b | grep NGUID
May 01 11:27:06 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:06 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:09 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:09 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:14 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:14 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:35 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:35 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:36 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:36 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:36 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:36 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:50 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:50 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:52 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:27:52 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:00 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:00 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:20 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:20 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:26 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:26 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:40 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:40 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:43 raza kernel: block nvme0n1: No UUID available providing old NGUID
May 01 11:28:43 raza kernel: block nvme0n1: No UUID available providing old NGUID

I also get other concerning messages.

# journalctl -b | grep -i nvme | grep -v NGUID
May 01 10:12:11 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 10:12:11 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 10:12:11 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 10:12:11 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 10:12:11 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 10:12:11 raza udisksd[148]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Lexar_SSD_NM790_2TB_NLE638R000039P2202: Error updating Health Information: Failed to open device '/dev/nvme0': Resource temporarily unavailable (g-bd-nvme-error-quark, 2)
May 01 10:18:15 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 10:18:15 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 10:18:15 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 10:18:15 raza kernel: nvme0n1: I/O Cmd(0x2) @ LBA 2773608, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
May 01 10:18:15 raza kernel: I/O error, dev nvme0n1, sector 2773608 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
May 01 10:18:15 raza kernel: nvme0n1: I/O Cmd(0x2) @ LBA 471205192, 32 blocks, I/O Error (sct 0x3 / sc 0x71)
May 01 10:18:15 raza kernel: I/O error, dev nvme0n1, sector 471205192 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2
May 01 10:18:15 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 10:18:15 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 10:22:16 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 10:22:16 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 10:22:16 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 10:22:16 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 10:22:16 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 10:22:16 raza udisksd[148]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Lexar_SSD_NM790_2TB_NLE638R000039P2202: Error updating Health Information: Failed to open device '/dev/nvme0': Resource temporarily unavailable (g-bd-nvme-error-quark, 2)
May 01 10:38:24 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 10:38:24 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 10:38:24 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 10:38:24 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 10:38:24 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 10:42:26 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 10:42:26 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 10:42:26 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 10:42:26 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 10:42:26 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 10:42:26 raza udisksd[148]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Lexar_SSD_NM790_2TB_NLE638R000039P2202: Error updating Health Information: Failed to open device '/dev/nvme0': Resource temporarily unavailable (g-bd-nvme-error-quark, 2)
May 01 10:56:34 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 10:56:34 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 10:56:34 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 10:56:34 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 10:56:34 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 10:58:34 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 10:58:34 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 10:58:34 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 10:58:34 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 10:58:34 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 11:02:36 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 11:02:36 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 11:02:36 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 11:02:36 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 11:02:36 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 11:02:36 raza udisksd[148]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Lexar_SSD_NM790_2TB_NLE638R000039P2202: Error updating Health Information: Failed to open device '/dev/nvme0': Resource temporarily unavailable (g-bd-nvme-error-quark, 2)
May 01 11:04:37 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 11:04:37 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 11:04:37 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 11:04:37 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 11:04:37 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 11:10:39 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 11:10:39 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 11:10:39 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 11:10:39 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 11:10:39 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 11:12:41 raza kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 01 11:12:41 raza kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 01 11:12:41 raza kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 01 11:12:41 raza kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 01 11:12:41 raza kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 01 11:12:41 raza udisksd[148]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Lexar_SSD_NM790_2TB_NLE638R000039P2202: Error updating Health Information: Failed to open device '/dev/nvme0': Resource temporarily unavailable (g-bd-nvme-error-quark, 2)

My WiFi is disabled, but the Bluetooth is enabled. And I kinda need my bluetooth to be active :slight_smile:

I am running the latest HA OS v12.2, I do not run release candidates, only stable versions, and this is the latest stable oneā€¦ I also run the latest Pi eeprom.

I just tried running dptaram=pciex1_gen=2 and I got the exact same errors.

I donā€™t have another SSD to test with, but I can try reformatting the SSD with PiOS.

But I doubt that will change anything. It looks like the Lexar nm790 is not compatible with your SSD Pi HAT.

Can you recommend any other settings to try? Or at least recommend a compatible 2TB SSD for the Argon Nao 5 m.2 NVME case?

Perhaps that NVMe device draws too much power.

It looks, that I was maybe misunderstood by you. Iā€™m not affiliated with Argon40. I share only my knowledge which I collected left and right during my research to get the needed information to make the LibreELEC addon compatible to Argon ONE V3 case.

I assume this message is triggered every time the NVMe returns from unavailable/disconnected status. So it indicates the PCIe bus is very unstable.
Only for sureness, do you have already tried the combination ā€œdptaram=pciex1_gen=3ā€ in config.txt and ā€œpcie_aspm=offā€ in cmdline.txt? Iā€™m asking, because I doesnā€™t meant to add ā€œnvme_core.default_ps_max_latency_us=0ā€. Please remove that, if that parameter is currently there.

Normaly this is a good list to start the research for a possible compatible device: NVMe Base for Raspberry Pi 5 - NVMe Base
I want force ā€œpossibleā€ because it looks to me like a lottery, you never know that you get until you has tested (different hardware revisions, firmware versions, bad cable without impedance correction, manufacturing quality ā€¦).

You can, but you doesnā€™t need to format the NVMe at the first attempt. It would be enough to use a SD card with a known working current OS (PiOS). ā€œcurrentā€ is the most important term, because the kernel must be current (6.6.28+) to ensure that the most know issues were fixed. With Home Assistant OS 12.2 you use a much older kernel.

Regarding the NVMe support, itā€™s important to have the most recent combination of firmware and kernel.
For every new/unknown NVMe this can, but not must, be the different between: it works like charm vs. it doesnā€™t work.

Sorry, but such sentences are unfortunately without any information for others than the reporter itself.

I also run the latest Pi eeprom.

Also if you use the command rpi-eeprom-update -a right, some OS for example Ubuntu 23.10 doesnā€™t deliver the most current bootloader. The only possible way to have a chance to match the bootloader version to the regarding changelog is the release date. So please doesnā€™t say ā€œthe latestā€, use the release date instead. Someone can read this just a day after new bootloader was released, and that maybe canā€™t behave same like yours.

some OS for example Ubuntu 23.10 doesnā€™t deliver the most current bootloader

Unfortunately neither does the just released Ubuntu 24.04. Ubuntu users can go straight to the source for the rpi-eeprom package for current firmware:

I restored my Home Assistant on my old Pi4b, so I can play around a bit more with the Pi5 + Argon Neo 5 + Lexar NM790 2TB SSD.
I loaded the latest Raspberry Pi OS on the SSD, and here are the resultsā€¦

Here is my Pi OS version (freshly updated)

fbsdmon@pi5:~ $ uname -a
Linux pi5 6.6.29-v8-16k+ #1760 SMP PREEMPT Mon Apr 29 14:44:20 BST 2024 aarch64 GNU/Linux

fbsdmon@pi5:~ $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

rpi-update is the latest as well

fbsdmon@pi5:~ $ sudo rpi-update
 *** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom
 *** Performing self-update
 *** Relaunching after update
 *** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom
FW_REV:42312de156405e637a0a171d0cd58b1b2980df11
BOOTLOADER_REV:2bfd7cb74e6bc16559e040d0f5d788a4411819e4
 *** Your firmware is already up to date (delete /boot/firmware/.firmware_revision and /boot/firmware/.bootloader_revision to force an update anyway)

Here is my eeprom config and version

fbsdmon@pi5:~ $ sudo rpi-eeprom-config -e
Updating bootloader EEPROM
 image: /lib/firmware/raspberrypi/bootloader-2712/default/pieeprom-2024-02-16.bin
config_src: blconfig device
config: /tmp/tmp3d_ubb0m/boot.conf
################################################################################
[all]
BOOT_UART=1
WAKE_ON_GPIO=0
POWER_OFF_ON_HALT=1
BOOT_ORDER=0xf416
PCIE_PROBE=1


################################################################################

*** To cancel this update run 'sudo rpi-eeprom-update -r' ***

*** CREATED UPDATE /tmp/tmp3d_ubb0m/pieeprom.upd  ***

   WARNING: Installing an older bootloader version.
            Update the rpi-eeprom package to fetch the latest bootloader images.

   CURRENT: Sat 20 Apr 10:53:30 UTC 2024 (1713610410)
    UPDATE: Fri 16 Feb 15:28:41 UTC 2024 (1708097321)
    BOOTFS: /boot/firmware
'/tmp/tmp.FgVMwmasNb' -> '/boot/firmware/pieeprom.upd'

UPDATING bootloader.

*** WARNING: Do not disconnect the power until the update is complete ***
If a problem occurs then the Raspberry Pi Imager may be used to create
a bootloader rescue SD card image which restores the default bootloader image.

flashrom -p linux_spi:dev=/dev/spidev10.0,spispeed=16000 -w /boot/firmware/pieeprom.upd
UPDATE SUCCESSFUL

Here is my /boot/firmware/config.txt config

[all]
usb_max_current_enable=1
dtparam=nvme
dptaram=pciex1_gen=3

Here is the /boot/firmware/cmdline.txt config

fbsdmon@pi5:~ $ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=323b86fe-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles cfg80211.ieee80211_regdom=DE pcie_aspm=off

Looks like I have the same errors as with the Home Assistant OS

fbsdmon@pi5:~ $ journalctl -b | grep -i nvme 
May 03 10:59:42 pi5 kernel: nvme nvme0: pci function 0000:01:00.0
May 03 10:59:42 pi5 kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 03 10:59:42 pi5 kernel: nvme nvme0: missing or invalid SUBNQN field.
May 03 10:59:42 pi5 kernel: nvme nvme0: allocated 32 MiB host memory buffer.
May 03 10:59:42 pi5 kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 03 10:59:42 pi5 kernel:  nvme0n1: p1 p2
May 03 10:59:42 pi5 kernel: EXT4-fs (nvme0n1p2): mounted filesystem fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 ro with ordered data mode. Quota mode: none.
May 03 10:59:42 pi5 kernel: EXT4-fs (nvme0n1p2): re-mounted fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 r/w. Quota mode: none.
May 03 10:59:43 pi5 systemd-fsck[370]: /dev/nvme0n1p1: 378 files, 38294/261115 clusters
May 03 11:00:15 pi5 kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
May 03 11:00:15 pi5 kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
May 03 11:00:15 pi5 kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
May 03 11:00:15 pi5 kernel: nvme0n1: I/O Cmd(0x2) @ LBA 3219696, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
May 03 11:00:15 pi5 kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1263704, 256 blocks, I/O Error (sct 0x3 / sc 0x71)
May 03 11:00:15 pi5 kernel: nvme0n1: I/O Cmd(0x2) @ LBA 9687424, 192 blocks, I/O Error (sct 0x3 / sc 0x71)
May 03 11:00:15 pi5 kernel: I/O error, dev nvme0n1, sector 1263704 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
May 03 11:00:15 pi5 kernel: I/O error, dev nvme0n1, sector 3219696 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
May 03 11:00:15 pi5 kernel: I/O error, dev nvme0n1, sector 9687424 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 2
May 03 11:00:15 pi5 kernel: nvme0n1: I/O Cmd(0x2) @ LBA 10288352, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
May 03 11:00:15 pi5 kernel: I/O error, dev nvme0n1, sector 10288352 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
May 03 11:00:15 pi5 kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1263960, 256 blocks, I/O Error (sct 0x3 / sc 0x71)
May 03 11:00:15 pi5 kernel: I/O error, dev nvme0n1, sector 1263960 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
May 03 11:00:15 pi5 kernel: nvme0n1: I/O Cmd(0x2) @ LBA 9802816, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
May 03 11:00:15 pi5 kernel: I/O error, dev nvme0n1, sector 9802816 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
May 03 11:00:15 pi5 kernel: nvme0n1: I/O Cmd(0x2) @ LBA 5452704, 256 blocks, I/O Error (sct 0x3 / sc 0x71)
May 03 11:00:15 pi5 kernel: I/O error, dev nvme0n1, sector 5452704 op 0x0:(READ) flags 0x80700 phys_seg 6 prio class 2
May 03 11:00:15 pi5 kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 03 11:00:15 pi5 kernel: nvme nvme0: 4/0/0 default/read/poll queues

The No UUID available providing old NGUID dissapeared though. So thatā€™s interesting.

fbsdmon@pi5:~ $ journalctl -b | grep -i NGUID
fbsdmon@pi5:~ $

I tried all combinatios of dptaram=pciex1_gen=2 and dptaram=pciex1_gen=3, with nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off (separate and together). Same results :frowning:
It seems a bit more stable, as in not so frequent errors. Iā€™ll let it run for a while to see.

Maybe @demyers is right, and itā€™s a power problem.
Iā€™m struggling to decide what to do next. Return the Lexar and order/try a different SSD, or return the Neo 5 and find a ssd hat + case separately.

Any last words of advice ?

Yes, your current hardware combination has some issues. But it could be every part of it, also the RPi5 itself in the worst case.
You can only isolate this, if you change one part step by step and test it again. Or you have the luck to replace the trouble maker with your first attempt.

@demeyers objection is correct. As far as I know there is a power limit of 5W with the flexible pcb (PCIe). The current NVMe model you chose, is specified to 3.7W max. So it should be below this limit and not be the main reason.
Only if your USB-C plug of the power supply doesnā€˜t fit/settled right into RPi5 USB-C socket, there is a remaining little risk that you have a contact issue there.

The Argon ONE V3 tries to avoid this limit with the separate pogo pins connection. But with this case there also issues with NVMe were reported here.

The HATs available from other companies fight against the same technical limitations and some of them have changed their hardware revisions multiple times, so there is no guarantee that it work too.

I have the official Pi 5 27W PS, and I havenā€™t noticed any loose connection on the usb-c port. I would rule that out, but as you say, who knows which part/combo is the problem.

Iā€™m off to changing the nvme hat first, wish me luck :laughing:

FOR THE RECORD
Raspberry Pi 5 + Argon Neo 5 m.2 NVME + Lexar NM790 2TB SSD + Official Power Supply = Does NOT Work!

Some NVMe HATs draw power from the GPIO connector and some donā€™t. You might need the extra power with your device.

I ordered the GeeekPi N07. Itā€™s a bottom hat that connects to the GPIO pins for power. Letā€™s see how this one fair.

I did some research on the Lexar NM790, and it looks like one of the better ssdā€™s in terms of power management, especially in low power consumption. So, Iā€™m hoping itā€™s just the nvme hat that is the issue here.

I really like the Argon Neo 5 form factor and design, especially the bottom hat. The Argon One v3 is overkill for me, as my pi sits in the pantry :slight_smile: and I donā€™t need full size ports, or anything really.
It looks like Iā€™ll end up with a bare pi, with a bottom hat and a cooler on top. But who cares, if it works.

Maybe you can your FOR THE RECORD content move/copy to here?

1 Like

To continue the saga, I bought a Kingston KC3000 PCIe 4.0 NVMe M.2 SSD, as it was noted in this post that it works [Q] Does Argon NEO 5 Case supports 2TB or more storage?

But it doesnā€™t even fit :slight_smile:
The KC3000 2TB SSD has chips on both sides on the board, and the Neo 5 case has a plastic bump that doesnā€™t allow you to install an SSD if it has chips on the back side. So, it cannot even be installed.

I ended up using the KC3000 2TB SSD with a GeeekPi N07, which has been running for a week now, without any issues.