-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
I now have the first of the high-speed discs (an SSD, or solid-state device) to swap in. That means teaparty has to go down again, so I can open up the case. This will likely happen this afternoon, though I can't yet say exactly when.
Teaparty will be down initially for about 30 minutes while I remove one of the existing Hard Disc Drives (HDDs) and insert the SSD. I'll then do some speed testing, probably with teaparty running. If this device is indeed faster (ie, it's the discs that are the problem, not some other aspect of the machine on the path to the discs) I will then have to take teaparty down for a further hour or so, while I shrink the existing filesystems by about 10% so they'll fit on the new SSD (the HDDs are 2TB, but annoyingly enterprise-grade SSDs are 1.92TB).
At that point teaparty will come back up, and run slow while the new SSD syncs up, and at some point next week we'll have to repeat this to replace the other HDD.
If the SSD is no faster, it means the bottleneck is elsewhere, and the second hour of downtime won't happen. There will instead be a brief second outage while I put the old disc back, then teaparty will run slow for another 24 hours while it resyncs the old second HDD.
Hopefully all this is clear. Call me on my mobile if this is going to be a disaster for anyone.
- Tom
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
If the SSD is no faster, it means the bottleneck is elsewhere, and the second hour of downtime won't happen. There will instead be a brief second outage while I put the old disc back, then teaparty will run slow for another 24 hours while it resyncs the old second HDD.
So the outage happened, and the SSD was temporarily installed. Interestingly, tests revealed it was no faster than the old HDD, and in some cases slower. So the discs themselves aren't the bottleneck.
I took the opportunity of being at the colo to test another hypothesis, which was that UEFI systems are now much more common than they used to be, and the new kernel is basically designed to work with systems that booted via UEFI (which isn't how teaparty is installed) rather than old-fashioned BIOS (which it is).
This seems to be the root of the problem. The current secondary disc, sdb, writes at about 10-15MB/s when booted under BIOS, but 150MB/s when booted UEFI.
There is a process for turning a BIOS-booting system into a UEFI-booting one, but it's quite fiddly, and (naturally) involves more downtime. At the moment, I expect this will happen this Wednesday, 20/3, and will probably last most of the morning, and possibly most of the day.
Again, if this will be a disaster for you, please let me know asap.
- Tom
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
There is a process for turning a BIOS-booting system into a UEFI-booting one, but it's quite fiddly, and (naturally) involves more downtime. At the moment, I expect this will happen this Wednesday, 20/3, and will probably last most of the morning, and possibly most of the day.
OK, well that didn't go very well. When I said "fiddly" I should have said "fiddly and incomplete". We're currently running off a single hard drive, so data must be considered at-risk. I hope to have a procedure tested for the BIOS-to-UEFI change by Friday, but that will depend on how much spare time I can muster. At the very least, I'll go back to the colo on Friday and reattach the second HDD (assuming I can make the mirroring work with it on, which is the current problem).
If I can't get a tested procedure, I'll have to reinstall the system from scratch. That should be quicker than the first time, since now all the CentOS to Debian migration is done, but it won't be instantaneous.
Sorry about all this.
- Tom
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
At the very least, I'll go back to the colo on Friday and reattach the second HDD (assuming I can make the mirroring work with it on, which is the current problem).
If I can't get a tested procedure, I'll have to reinstall the system from scratch.
After careful consideration, and running through the options and probabilities, Caroline and I have decided that the course of action that minimises both recovery time and the risk of data loss is to reinstall.
So fairly early on Friday, I'll down teaparty, back it up to removable media, and reinstall cleanly under UEFI booting. This means it'll be much like last Monday-Wednesday, in that services will come back one at a time. However, things should go faster than last time, because the time cost of translating all the service configs to Debian has already been paid; I'm hopeful that by the end of Friday we'll have email back, at least.
I'm really sorry about more extended downtime, but we think this is the option that has the best chance of success in bounded time. If it's going to be a disaster for you, please let me know asap.
- Tom
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
So fairly early on Friday, I'll down teaparty, back it up to removable media, and reinstall cleanly under UEFI booting. This means it'll be much like last Monday-Wednesday, in that services will come back one at a time. However, things should go faster than last time, because the time cost of translating all the service configs to Debian has already been paid; I'm hopeful that by the end of Friday we'll have email back, at least.
Well, that reinstall happened, and email was indeed back by Friday night (I hope it's working for you all). Websites aren't back yet, but other than that we're where we were on Wednesday.
Sadly, we're *exactly* where we were. The performance problems haven't gone away. I'm out of ideas, and I must say, out of enthusiasm at the moment. I suspect I'll just leave it as-is for a while, so we can have some stability, then at some point shotgun the entire system (ie, replace all the hardware). But not now.
- Tom
fellow-users@lists.teaparty.net