NVMe unstable with Argon ONE V3 case

aalborger · April 14, 2024, 4:26pm

Hi,

I recently pruchased an RPi5 and an Argon ONE V3 M.2 NVMe PCIe case to be used as a home automation server. Installation and boot from the NVMe went smoothly. After 2 days of uptime, the server suddenly became unresponsive. After rebooting, I found several kernel messages saying “nvme cntroller is dow, will reset” (see example below)

Apr 09 01:00:07 iotserver kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Apr 09 01:00:07 iotserver kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Apr 09 01:00:07 iotserver kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Apr 09 01:00:07 iotserver kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Apr 09 01:00:07 iotserver kernel: nvme nvme0: 4/0/0 default/read/poll queues
Apr 09 01:04:19 iotserver kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Apr 09 01:04:19 iotserver kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Apr 09 01:04:19 iotserver kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Apr 09 01:04:19 iotserver kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Apr 09 01:04:19 iotserver kernel: nvme nvme0: 4/0/0 default/read/poll queues

Hooking the server up to a screen, I was able to capture the error messages printed on screen, but not captured in the logs on a later crash due to the file system being unmounted (hand-typed from a photo, may contain some “spelling errors”):

[118555.548795] EXT4-fs error (device nvme0n1p2) in ext4_reserve_inode_write:5764: Journal nas aborted
[118555.548807] EXT4-fs error (device nvme0n1p2) in ext4_reserve_inode_write:5764: Journal nas aborted
[118555.550145] EXT4-fs error (device nvme0n1p2): ext4_dirty_inode:5968: inode #120009: comm systemd-journal: mark_inode_dirty error
[118555.551476] EXT4-fs error (device nvme0n1p2): ext4_dirty_inode:5968: inode #545366: comm python3: mark_inode_dirty error
[118555.552829] EXT4-fs error (device nvme0n1p2) in ext4_dirty_inode:5969: Journal has aborted
[118555.552927] EXT4-fs error (device nvme0n1p2): ext4_journal_check_starting:84: comm systemd-journal: Detected aborted journal
[118555.617595] EXT4-fs (nvme0n1p2): Remounting filesystem read-only

I use the “official” RPi 5 power adapter. The NVMe card is a Kingston NV2 M.2 500GB. I did follow the instructions on setting up the eeprom config and installing the Argon scripts.

Sometimes the error happens after a few days, other times it will happen within 10 seconds of booting. The error does not seem related to the temperature of the device - it has happened within 30 seconds of booting after a 30 minute period of being turned off.

I have tried the recommended “nvme_core.default_ps_max_latency_us=0 pcie_aspm=off” kernel parameters without success.

Having spent a disordinate amount of time troubleshooting the device, I’m seriously regretting not paying €50 more and getting a NUC at this point

I hope someone here might be able to give me some advice before it all goes into the trash.

Thank you so much in advance!

HarryH · April 14, 2024, 6:03pm

Do you have tried with ‘dtparam=pciex1_gen=2’ in your config.txt to get it stable? Because “…gen=3” isn’t official supported by the Pi5.

aalborger · April 15, 2024, 8:56pm

Hi HarryH,

Thank you for your response.

I am currently running without any dtparam=pciex1_gen=xxx parameter in config.txt. My understanding is that it defaults to gen2? I can try to set it explicitly to see if it makes a difference.

I remember I saw some rather counter-intuitive posts somwhere on the internet that someone had managed to get an nvme card working when changing from gen2 to gen 3. Will try that as well.

Lately, I have not been able to boot the RPi at all. Seems the problem is getting worse considering I could run for 24-48 hours before it crashed. Had to pull out the NVMe card and change the boot order in order to boot and check the config.txt file.

In the mean time, any other ideas or tips that I could try out?

Again, thank you so much for your help and input!

HarryH · April 15, 2024, 9:05pm

Yes, dtparam=pciex1_gen=2 is default if you not specify this line. So it should make no different.
Additional it should be possible to slow down the PCIe bus with dtparam=pciex1_gen=1 for troubleshooting purposes.
dtparam=pciex1_gen=3 it’s like a kind of overclocking, but you can’t loose more than now. Normally this speed should be more critical, because of higher frequencies at the flexible pcb. But if the firmware of the used NVMe has trouble with lower speeds, perhaps it helps.

Because it’s some kind of fiddly, do you have new inserted the flexible pcb to ensure it fits right? The system was running 2 days, so another power supply could als be an option to try.
Do you know if a firmware update is available for your Kingston NVMe?

aalborger · April 15, 2024, 9:18pm

Hi again!

And thanks for the ultra-fast reply.

I managed to boot with gen3. Seeing that this is the first NVMe boot I’ve managed to complete without a crash before the login prompt in more than 20 attempts, it is a good start. But let’s see. It’s been runing for 10 minuts now …

While counter-intuitive, it could verry well be that my Kingston NV2 actualy struggles with the lower speed PCIe speeds. I’ve seen some reviews indicating that the NV2 cards are really low-end, with cheap components and even different controllers and NAND flash from drive to drive. Would not be surprised if the card does properly support low-speed PCIe generations no longer used in mainstream setups.

However, if the firmware is the issue (rather than the hardware), it might be worth checking if the firmware can be upgraded. Will check that as well.

If gen3 turns out to be as unstable as gen2 and a firmware update does not help, I will try gen1 as well.

Crossing my fingers …

Again. Thank you!

aalborger · April 18, 2024, 6:59pm

Quick update: The RPi has been running without issues with gen3 for 72 hours now. Way too early to call it a win, but a good sign nevertheless. Still crossing my fingers …

graemev · January 30, 2025, 3:16pm

OK , this rather a “METOO” .

It looks like I’ve had the Neo case + Pi5 fro a year (based on boot logs) . It’ been little used as I’ve been working on some software to use it as a NAS on another box (A pi4 [cm4] waveshare box)
and I’m now ready to move the code to the Argon Neo Pi5 . Doing compiles I’ve started getting these errors and the system freezes.

root@argon:/home/graeme# journalctl  -g NVME
-- Boot eae0decfa0cc48ad8c6aef8befd85f00 --
-- Boot 7d0c4b8e1d774485a20179f228f016dd --
-- Boot cbf3d450e0e34ad88a24c962bda84268 --
-- Boot 168735eabe0540d6af01dbb60da27c80 --
-- Boot 89085c99fe7b4d5085fd8193c476d097 --
-- Boot 0255ee913fb24aaaa8ca7c795ec94656 --
-- Boot c8d9db64bde14c10a62d97da899dd707 --
-- Boot 34afa4b412d04edd8c95d2690495a221 --
-- Boot 8c997e80dbcc49288d97aa92c83c2ffd --
-- Boot 29c3434594d94ac8a4d415ee905ea9ec --
-- Boot 461c0f2311554777abf4721ab752daf3 --
-- Boot 288ef7b83f7346b3a083ada5a88c11fa --
-- Boot 1b7e889499f341a0b847ba4766358fe2 --
-- Boot 3896febec3db4bd6be755e7aaae42025 --
-- Boot e788473d854e4794a47e2030a4affd71 --
-- Boot 0c6e995188804ec0950f22077a777ff7 --
-- Boot 903d8a2be2464c97b6a79b06ede4af53 --
-- Boot d8a153810d1542cea0aebf1aecad9d03 --
-- Boot 1a385b3b60ef42b18fa48d8503fd206e --
-- Boot 00593181ff854475adcf054fa6d9af5b --
-- Boot 41f9f34ccc4b4fc6b785bafeff2a0a25 --
-- Boot a242187c71ed4cef87d6f9f697dfe88c --
-- Boot 246f3905369d47f095760c6029b9f126 --
-- Boot 84a93707ec8843f495eb38fb1336436d --
-- Boot 8c47bb7e1af445659d02068490e2eb8f --
-- Boot b85b43d5b76d462a88269c6129969899 --
-- Boot 254925c6ae6f4eddb348c38e529f5079 --
-- Boot b9f55dcf810c4074aaf064be9d1bd5ca --
-- Boot 2395af9dabdf486283d83886ca66a33b --
-- Boot eca6f38a03724ba5b870cbced7ef573b --
-- Boot c58f2a810e1c4c53929218a82f868d2a --
-- Boot c18cae0f5043460192e5292697fbbb68 --
-- Boot ed42558253734a05b908315e4462cdb2 --
-- Boot fd080c5dc5b84c7ebec15d40853c4d89 --
-- Boot c4add7f1315e4d1ba93b50c8ba468247 --
-- Boot 6dff526e04c34fb8a721a76b94741d15 --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 347e18980ad94fe381739bc493bdeec7 --
-- Boot a0d3f7e9ebb24b3caddaf45891a22066 --
-- Boot 0e4d994962794327b01fd1ce40941823 --
-- Boot 1aeffadee4934217a4c8857a467ce082 --
-- Boot 93d496c6b7b14716bd7a09f69008ff42 --
-- Boot 018adf4e74a046d2bb872ac048abc93c --
-- Boot 7a1cfe1f266d4c92be79acbebd303d25 --
-- Boot dc59d3dccfa042cfa442033bc923e1c1 --
-- Boot b392bfa0c8f4477f84eb41e8deb9c5cc --
-- Boot b5696bc182eb4b68a7964ebffefdbdce --
-- Boot fb7c20bbcefc493f8a40a864ef37d4f1 --
-- Boot fc74917aa63e4a6ab366efb6f3b9afa9 --
-- Boot 1e4d29324c5c48a0bfa0583f187b485d --
Jan 30 14:15:26 argon systemd[1]: nvmefc-boot-connections.service - Auto-connect to subsystems on FC-NVME devices found during boot was skipped because of an unmet condition check (ConditionPathExists=/sys/class/fc/fc_udev_device/nvme_discovery).
root@argon:/home/graeme# journalctl  -g nvme
-- Boot eae0decfa0cc48ad8c6aef8befd85f00 --
-- Boot 7d0c4b8e1d774485a20179f228f016dd --
-- Boot cbf3d450e0e34ad88a24c962bda84268 --
-- Boot 168735eabe0540d6af01dbb60da27c80 --
-- Boot 89085c99fe7b4d5085fd8193c476d097 --
Jan 26 21:48:36 argon kernel: nvme nvme0: pci function 0000:01:00.0
Jan 26 21:48:36 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jan 26 21:48:36 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
Jan 26 21:48:36 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
Jan 26 21:48:36 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
Jan 26 21:48:36 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
May 16 18:07:39 argon sudo[9207]:   graeme : user NOT in sudoers ; TTY=pts/1 ; PWD=/home/graeme/src ; USER=root ; COMMAND=/usr/sbin/fdisk -l /dev/nvme0n1
-- Boot 0255ee913fb24aaaa8ca7c795ec94656 --
-- Boot c8d9db64bde14c10a62d97da899dd707 --
May 17 21:36:35 argon kernel: nvme nvme0: pci function 0000:01:00.0
May 17 21:36:35 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 17 21:36:35 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
May 17 21:36:35 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
May 17 21:36:35 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 17 21:36:35 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
-- Boot 34afa4b412d04edd8c95d2690495a221 --
May 17 21:47:54 argon kernel: nvme nvme0: pci function 0000:01:00.0
May 17 21:47:54 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 17 21:47:54 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
May 17 21:47:54 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
May 17 21:47:54 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 17 21:47:54 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
-- Boot 8c997e80dbcc49288d97aa92c83c2ffd --
-- Boot 29c3434594d94ac8a4d415ee905ea9ec --
-- Boot 461c0f2311554777abf4721ab752daf3 --
May 18 22:09:20 argon kernel: nvme nvme0: pci function 0000:01:00.0
May 18 22:09:20 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 18 22:09:20 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
May 18 22:09:20 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
May 18 22:09:20 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 18 22:09:20 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
-- Boot 288ef7b83f7346b3a083ada5a88c11fa --
-- Boot 1b7e889499f341a0b847ba4766358fe2 --
May 19 13:56:30 argon kernel: nvme nvme0: pci function 0000:01:00.0
May 19 13:56:30 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
May 19 13:56:30 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
May 19 13:56:30 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
May 19 13:56:30 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
May 19 13:56:30 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers

many rows - elided--

a1f5-1b5927e6c5f9 ro with ordered data mode. Quota mode: none.
Jun 20 20:38:26 argon kernel: EXT4-fs (nvme0n1p2): re-mounted fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 r/w. Quota mode: none.
Jun 20 20:38:26 argon systemd-fsck[415]: /dev/nvme0n1p1: 374 files, 20930/130812 clusters
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
Jun 20 20:39:31 argon kernel: nvme nvme0: pci function 0000:01:00.0
Jun 20 20:39:31 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jun 20 20:39:31 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
Jun 20 20:39:31 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
Jun 20 20:39:31 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
Jun 20 20:39:31 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
Jun 20 20:39:31 argon kernel:  nvme0n1: p1 p2 p3
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
Jun 20 20:39:31 argon kernel: EXT4-fs (nvme0n1p2): mounted filesystem fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 ro with ordered data mode. Quota mode: none.
Jun 20 20:39:31 argon kernel: EXT4-fs (nvme0n1p2): re-mounted fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 r/w. Quota mode: none.
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --
-- Boot 99386174e13f4e9cb2044baee8704fa6 --
-- Boot 125f4ded3d824bd9b2b0465ee2b9f2bf --

--many rows elided---

-- Boot b392bfa0c8f4477f84eb41e8deb9c5cc --
Jan 27 17:52:48 argon kernel: nvme nvme0: pci function 0000:01:00.0
Jan 27 17:52:48 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jan 27 17:52:48 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
Jan 27 17:52:48 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
Jan 27 17:52:48 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
Jan 27 17:52:48 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
Jan 27 17:52:48 argon kernel:  nvme0n1: p1 p2 p3
Jan 27 17:52:48 argon kernel: EXT4-fs (nvme0n1p2): mounted filesystem fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 ro with ordered data mode. Quota mode: none.
Jan 27 17:52:48 argon kernel: EXT4-fs (nvme0n1p2): re-mounted fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 r/w. Quota mode: none.
Jan 27 17:52:49 argon systemd-fsck[397]: /dev/nvme0n1p1: 374 files, 20951/130812 clusters
Jan 27 17:52:49 argon kernel: EXT4-fs (nvme0n1p3): mounted filesystem 08870205-494a-4ccd-a77b-5d0acea2400f r/w with ordered data mode. Quota mode: none.
Jan 27 19:06:12 argon kernel: EXT4-fs (nvme0n1p3): unmounting filesystem 08870205-494a-4ccd-a77b-5d0acea2400f.
-- Boot b5696bc182eb4b68a7964ebffefdbdce --
Jan 27 19:06:11 argon kernel: nvme nvme0: pci function 0000:01:00.0
Jan 27 19:06:11 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jan 27 19:06:11 argon kernel: nvme nvme0: missing or invalid SUBNQN field.
Jan 27 19:06:11 argon kernel: nvme nvme0: allocated 16 MiB host memory buffer.
Jan 27 19:06:11 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
Jan 27 19:06:11 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
Jan 27 19:06:11 argon kernel:  nvme0n1: p1 p2 p3
Jan 27 19:06:11 argon kernel: EXT4-fs (nvme0n1p2): mounted filesystem fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 ro with ordered data mode. Quota mode: none.
Jan 27 19:06:11 argon kernel: EXT4-fs (nvme0n1p2): re-mounted fc7a1f9e-4967-4f41-a1f5-1b5927e6c5f9 r/w. Quota mode: none.
Jan 27 19:06:11 argon systemd-fsck[434]: /dev/nvme0n1p1: 374 files, 20951/130812 clusters
Jan 27 19:06:12 argon kernel: EXT4-fs (nvme0n1p3): mounted filesystem 08870205-494a-4ccd-a77b-5d0acea2400f r/w with ordered data mode. Quota mode: none.


Up to this point (1 year) no NVMe errors, yet ...and then:


Jan 28 14:50:33 argon kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Jan 28 14:50:33 argon kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jan 28 14:50:33 argon kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jan 28 14:50:33 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jan 28 14:50:33 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
Jan 28 14:50:33 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
Jan 28 14:54:37 argon kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Jan 28 14:54:37 argon kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jan 28 14:54:37 argon kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jan 28 14:54:37 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jan 28 14:54:37 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
Jan 28 14:54:37 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
Jan 28 16:15:20 argon kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, 

...many rows elided, beacaue website rejected longer text (can I upload a file, elsewhere?)


Jan 30 14:22:14 argon kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Jan 30 14:22:14 argon kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jan 30 14:22:14 argon kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Jan 30 14:22:14 argon kernel: nvme0n1: I/O Cmd(0x2) @ LBA 4766720, 256 blocks, I/O Error (sct 0x3 / sc 0x71) 
Jan 30 14:22:14 argon kernel: I/O error, dev nvme0n1, sector 4766720 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 2
Jan 30 14:22:14 argon kernel: nvme 0000:01:00.0: enabling device (0000 -> 0002)
Jan 30 14:22:14 argon kernel: nvme nvme0: 4/0/0 default/read/poll queues
Jan 30 14:22:14 argon kernel: nvme nvme0: Ignoring bogus Namespace Identifiers
root@argon:/home/graeme#

The system is running cool:

It got up to the giddy heights of 29C while I was running a find / > /dev/null just to exersise it . AFYI this was running under watch(1) and it go an interrupted systrem call around the tiime of the error.

graemev · February 9, 2025, 6:43pm

For anybody following this. I added to /boot/firmware/cmdline.txt:

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off

Which gets rid of the errors. In fact all you need is:

nvme_core.default_ps_max_latency_us=0

This turns off “power saving” on the NVMe (it was going to sleep)

I also tried: nvme_core.default_ps_max_latency_us=15500

Which should have allowed all but the deepest power saving.

All these remove the “problem” messages, however the Neo is now running around 60C.

…so no use to me as a “always on” server.

graemev · March 15, 2025, 3:41pm

Following this , the errors came back.

In the end I had to simply discard the Faxiang NVMe stick and instead got a WD-Green (2TB) which seems find (and oddly is faster)