Guidelines how to handle the recent network instability

rothbardian · July 24, 2020, 1:30pm

Today, as those slow nodes were disabled, it seems other changes were made to the newtork (or did it get overloaded?)

Whatever the reason, now a lot of nodes are in “error” state (what does that even mean?) while some still manage to hang onto their 100% uptime.

Couple hours ago only 15-16 nodes were online, some for several hours.

What should a node owner do if their node ends up in “error” state? Reboot? Restart service? Wait for it to recover on its own?

lonewolf · July 24, 2020, 2:19pm

You can restart services to resume processing rounds and getting back “online”. But it may revert recurrently to “error” until the team fixes problems caused by yesterday’s updates.

jota · July 24, 2020, 5:30pm

Today I checked my node and it was with error.
Rebooted and was ok but a few hours later it’s again with Error

rothbardian · July 25, 2020, 1:08am

Right, so rinse & repeat until you see your version updated (if auto-update is ON as per the handbook) as lonewolf suggested.

Keith · July 25, 2020, 9:09am

The yellow ERROR indicates that a node has not participated in a round in 5 minutes.

I’ve received varying reports from people but one common report is that the node doesn’t crash and therefore doesn’t restart the node process.

We are trying to determine why this is happening.

In the meantime, one remedy for not participating in rounds is to restart the node and gateway services. However, as stated earlier, this is not expected to be a solution to the ERROR state.