How do I diagnose this persistant issue - missing file system

DuncanJ · April 22, 2022, 10:47am

Hi everyone, a bit of background first, I am very new to OMV and am using the Argon Eon to learn about NAS drives and OMV. I have setup OMV 6 on the pi with 4 drives and a simple RSYNC mirror function between them as a backup (RAID is a bit beyond me right now and probably overkill for my needs anyway).

Onto the actual problem, everything appears to work fine but overnight at some point my file systems go missing. I don’t think its the RSYNC function because that is scheduled to run only on a friday. Once I reboot the system everything is fine and the next morning the same thing has happened. When I say ‘File Systems Missing’ that is what the dashboard is telling me, but when I dig deeper and go into storage, the disks are not available almost like they have gone to sleep and a reboot wakes them again.

I have temporarily fixed the issue by rebooting every night but that seems like a bandaid more than an actual fix and occasionally I still have the same problem.

I don’t have the skill or knowledge to diagnose this any deeper than that, could someone help me by giving me some tips or guiding me to the correct system log maybe or some insight into what is going on. There is a very high likelyhood that I have done something wrong during config, I would just like to learn what that is and try fixing it.

Thank you

BlackRose67 · April 23, 2022, 1:46am

While I don’t have your exact problem, I am having issues where my EON will just stop responding as a result of what I believe is OMV 6 having some sort of issue.

The other day I added a new shared drive to my OMV configuration, and OMV lost all of my drives.
I had to reboot to resolve that.

It seems every time I update OMV, something goes wrong with it.

I am running other pi4 devices on 64-bit Bullseye lite and they don’t have any issues, so I believe OMV is the source of my issues.

Just prior to writing this message, I had to power off my EON using the power button because the device stopped responding to any network requests.

NHHiker · April 27, 2022, 1:59pm

@DuncanJ

Well, if it were me, I would connect via ssh when the system was having an issue and run

$ lsblk

You should see the devices (something like this:)

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 931.5G  0 disk 
sdb      8:16   0 465.8G  0 disk 
├─sdb1   8:17   0   256M  0 part /boot
└─sdb2   8:18   0 465.5G  0 part /
sdc      8:32   0 894.3G  0 disk 
sdd      8:48   0 894.3G  0 disk 
sde      8:64   0 931.5G  0 disk

Then run:

dmesg -T

And look for errors.

If for some reason you can’t connect into the box, reboot. When the system is working properly connect again over ssh, and monitor dmesg, i.e.

dmesg -Tw

This will display messages, and update as they come in. You may actually capture the issue when it happens.

Just a thought … is anything connected to the external USB connectors, other than the little dongle? I read someplace that some folks have problems with RPI’s when using both USB 3.0 connectors.

DuncanJ · April 27, 2022, 2:58pm

Thank you @NHHiker that’s really useful, I will give that a bash. Nothing is connected to the second USB port so all good there.

Something I might try is removing all the HDDs and letting it stand idle like that to see if it’s a hardware triggered issue or a software problem. One of my HDDs is showing a warning that it has a few bad sectors currently, and i’m thinking that might be creating some kind of issue.

NHHiker · April 27, 2022, 3:37pm

@DuncanJ

A few bad sectors on a drive? You could install smartmontools, and run:

smartctl -d sat -A /dev/sda

replace /dev/sda with the actual device that’s reporting. Harddisk’s have reallocation sectors, so when a failure is detected, it will automatically replace that location with a new sector the next time the same location is written.

An issue that happens with drives is when they are out of reallocation sectors, and need one they do one of two things:

Take the drive offline
Leave the drive online, BUT switch to read only.

If you use the above smartctl command and post the result, I’ll interpret it for you.

DuncanJ · February 21, 2023, 9:31am

SOLVED UPDATE FOR FUTURE READERS:
This issue was solved by Argon very kindly replacing the SATA Board on my EON under warranty. For some reason the board was not delivering enough power to the drives and the system has now been running perfectly for months.
I have heard of one other person having this issue but nothing beyond that so it is not a common problem but there is a chance.