Well hello again, I have just learned that the host that recently had both nvme drives fail upon drive replacement, now has new problems: the filesystem report permanent data errors affecting the database of both, Matrix server and Telegram bridge.

I have just rented a new machine and am about to restore the database snapshot of the 26. of july, just in case. All the troubleshooting the recent days was very exhausting, however, i will try to do or at least prepare this within the upcoming hours.

Update

After a rescan the errors have gone away, however the drives logged errors too. It’s now the question as to whether the data integrety should be trusted.

Status august 1st

Well … good question… optimizations have been made last night, the restore was successful and … we are back to debugging outgoing federation :(


The new hardware also will be a bit more powerful… and yes, i have not forgotten that i wanted to update that database. It’s just that i was busy debugging federation problems.

References

0 points

Milan, in compensation for all this hassle you should take the next three Mondays off.

permalink
report
reply
1 point

i like that idea

permalink
report
parent
reply
0 points

Thanks for putting in the work. Is there anything we can help you with? From what I understood the domain is german, is the server in germany as well? I‘m located in germany and do sysadmin work. Fighting with hosting companies is part of my job. ;) let me know if I can do anything. Have a good one!

permalink
report
reply
1 point

Thank you :) Well i am not sure if there was something to fight over except maybe some sort of refund… for now it seems to be fine one the new machine. – yes, i am from germany, however i think its a helsinki dc from hetzner.

permalink
report
parent
reply
0 points

You’re very welcome. Hetzner is generally a good host afaik. It does depend on the configuration I suppose. Are you using the shared vps or something else? If the storage is guaranteed (as in not custom hardware) they are technically responsible for its condition. A host I‘m working with (also located at hetzner but in falkenstein) does 2 backups a day which also prevents having to revert far back.

permalink
report
parent
reply
1 point

on hetzner its all dedicated servers – out goes an ax51-nvme, in comes an ax102. they have tried a connector cable swap in order to try to bring the nvme(s) back to life, i was wondering if this could have something to do with the smart errors logged and the temp zpool errors, however i think the cpu upgrade now at least is very welcomed by the matrix server 😅

permalink
report
parent
reply
0 points

I feel for you. Is #tchncs:tchncs.de still the correct toon when it’s running fine again ?

permalink
report
reply
1 point

what do you mean?

permalink
report
parent
reply

..:: tchncs ::..

!tchncs@discuss.tchncs.de

Create post

Your friendly https://tchncs.de community! Discuss whats happening in the tchncs world and/or just use it as a community forum.

German and english allowed.

If you are looking for a way to support tchncs, please check out https://tchncs.de/donate


Community stats

  • 1

    Monthly active users

  • 91

    Posts

  • 218

    Comments

Community moderators