[Matrix server] Upcoming maintenance and backup restore July 31

posted 1 year ago

Milan@discuss.tchncs.deM

tchncs@discuss.tchncs.de

Well hello again, I have just learned that the host that recently had both nvme drives fail upon drive replacement, now has new problems: the filesystem report permanent data errors affecting the database of both, Matrix server and Telegram bridge.

I have just rented a new machine and am about to restore the database snapshot of the 26. of july, just in case. All the troubleshooting the recent days was very exhausting, however, i will try to do or at least prepare this within the upcoming hours.

Update

After a rescan the errors have gone away, however the drives logged errors too. It’s now the question as to whether the data integrety should be trusted.

Status august 1st

Well … good question… optimizations have been made last night, the restore was successful and … we are back to debugging outgoing federation :(

The new hardware also will be a bit more powerful… and yes, i have not forgotten that i wanted to update that database. It’s just that i was busy debugging federation problems.

References

federation issues after restore: https://github.com/matrix-org/synapse/issues/16025
why we had to restore initially: https://text.tchncs.de/tchncs/about-the-matrix-incident-on-july-26-2023

Sort:

Hot Top Controversial New Old

[ - ]

erAck@discuss.tchncs.de

0 points

1 year ago

Milan, in compensation for all this hassle you should take the next three Mondays off.

permalink

report

[ - ]

Milan@discuss.tchncs.deOPM

1 point

1 year ago

i like that idea

permalink

report

parent

[ - ]

Haui@discuss.tchncs.de

0 points

1 year ago

Thanks for putting in the work. Is there anything we can help you with? From what I understood the domain is german, is the server in germany as well? I‘m located in germany and do sysadmin work. Fighting with hosting companies is part of my job. ;) let me know if I can do anything. Have a good one!

permalink

report

[ - ]

Milan@discuss.tchncs.deOPM

1 point

1 year ago

Thank you :) Well i am not sure if there was something to fight over except maybe some sort of refund… for now it seems to be fine one the new machine. – yes, i am from germany, however i think its a helsinki dc from hetzner.

permalink

report

parent

[ - ]

Haui@discuss.tchncs.de

0 points

1 year ago

You’re very welcome. Hetzner is generally a good host afaik. It does depend on the configuration I suppose. Are you using the shared vps or something else? If the storage is guaranteed (as in not custom hardware) they are technically responsible for its condition. A host I‘m working with (also located at hetzner but in falkenstein) does 2 backups a day which also prevents having to revert far back.

permalink

report

parent

[ - ]

Milan@discuss.tchncs.deOPM

1 point

1 year ago

on hetzner its all dedicated servers – out goes an ax51-nvme, in comes an ax102. they have tried a connector cable swap in order to try to bring the nvme(s) back to life, i was wondering if this could have something to do with the smart errors logged and the temp zpool errors, however i think the cpu upgrade now at least is very welcomed by the matrix server 😅

permalink

report

parent

[ - ]

jasondaigo@discuss.tchncs.de

0 points

1 year ago

I feel for you. Is #tchncs:tchncs.de still the correct toon when it’s running fine again ?

permalink

report

[ - ]

Milan@discuss.tchncs.deOPM