-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
ISSUE: teaparty file system read only for about nine hours (0012Z-0931Z 19-04-2025).
SUBSYSTEMS AFFECTED: anything that requires changing the contents of the file system. This includes, but is not limited to, receiving email and logging the sending of email. It did not seem to affect teaparty's ability to serve static content, so DNS should have been unaffected.
DETAILS: teaparty has been essentially read-only for about nine hours.
At 0012Z today the root file system ran into the inability to store a directory leaf checksum, so to preserve the integrity of the stored data it decided to remount its own file system read-only, which prevented any further changes beong made to the file system.
It turns out this is harder to detect than one might have thought (though you can be sure I'll be adding a monitoring check for it shortly) so I didn't spot it for about half an hour. We're currently in the US, and it was very late at night on the 18th when the issue was noticed, so to avoid doing anything precipitate that might mean the whole thing was down until we get back in two weeks, we decided to sleep on the problem.
This morning (local time) I ran a command that compacts all the existing directory structures, which is the recommended fix, held my breath. and rebooted.
It came back fully read-write at 0931Z. Any email that people were trying to send should have queued up in the interim, and will now be delivered as other mail servers notice that teaparty is back. As far as I can tell no email was lost, and nearly no data; the exception is one user who I will contact directly, who lost about 200 mail files when the file system was fixed.
I'm really sorry about this, and will be reading more about the problem today as I have time, to find out if there's any routine maintenance that can be done to prevent a recurrence.
fellow-users@lists.teaparty.net