[Matrix server] Upcoming maintenance and backup restore July 31

posted 2 years ago

Milan@discuss.tchncs.deM

tchncs@discuss.tchncs.de

Well hello again, I have just learned that the host that recently had both nvme drives fail upon drive replacement, now has new problems: the filesystem report permanent data errors affecting the database of both, Matrix server and Telegram bridge.

I have just rented a new machine and am about to restore the database snapshot of the 26. of july, just in case. All the troubleshooting the recent days was very exhausting, however, i will try to do or at least prepare this within the upcoming hours.

Update

After a rescan the errors have gone away, however the drives logged errors too. It’s now the question as to whether the data integrety should be trusted.

Status august 1st

Well … good question… optimizations have been made last night, the restore was successful and … we are back to debugging outgoing federation :(

The new hardware also will be a bit more powerful… and yes, i have not forgotten that i wanted to update that database. It’s just that i was busy debugging federation problems.

References

federation issues after restore: https://github.com/matrix-org/synapse/issues/16025
why we had to restore initially: https://text.tchncs.de/tchncs/about-the-matrix-incident-on-july-26-2023

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

Milan@discuss.tchncs.deOPM

1 point

1 year ago

Thank you :) Well i am not sure if there was something to fight over except maybe some sort of refund… for now it seems to be fine one the new machine. – yes, i am from germany, however i think its a helsinki dc from hetzner.

permalink

report

parent

[ - ]

Haui@discuss.tchncs.de

0 points

1 year ago

You’re very welcome. Hetzner is generally a good host afaik. It does depend on the configuration I suppose. Are you using the shared vps or something else? If the storage is guaranteed (as in not custom hardware) they are technically responsible for its condition. A host I‘m working with (also located at hetzner but in falkenstein) does 2 backups a day which also prevents having to revert far back.

permalink

report

parent

[ - ]

Milan@discuss.tchncs.deOPM

1 point

1 year ago

on hetzner its all dedicated servers – out goes an ax51-nvme, in comes an ax102. they have tried a connector cable swap in order to try to bring the nvme(s) back to life, i was wondering if this could have something to do with the smart errors logged and the temp zpool errors, however i think the cpu upgrade now at least is very welcomed by the matrix server 😅

permalink

report

parent

..:: tchncs ::..

!tchncs@discuss.tchncs.de

Create post

Your friendly https://tchncs.de community! Discuss whats happening in the tchncs world and/or just use it as a community forum.

German and english allowed.

If you are looking for a way to support tchncs, please check out https://tchncs.de/donate

Community stats

1
Monthly active users
91
Posts
218
Comments

Community moderators

Milan@discuss.tchncs.de