What are the requirements for November? I hate downtime at all!
What can I do to prevent this from happening, I trust a comforting message from xxnetwork but I have to control that?
You used just slightly more than 13h out of your 180h downtime for November. Relax and monitor that your node process is always running.
The process was running, thats the whole point
Likely the wrapper process, not the node process.
The node process exits when it encounter an unexpected error, and the wrapper doesnât restart it in such case âby designâ.
Very strange when checking
sudo systemctl status xxnetwork-node.service
this was the message
Active: active (running)
Thatâs the wrapper. If you look at your node.log (if you still have it) youâll probably see that for several hours there was no activity.
Is there a smart way to check the node.log automatically on specific terms that indicate something is really wrong, a lot of messages in between that do not indicate the node process is stopped just waiting for something, and the whole process needs to get restarted?
âWhat are the requirements for November?â
Re: Requirements - Each month you can find the requirements which need to be met at the bottom of the page to the right. https://xx.network/nodes/run
Re: November Requirements - https://xx.network/nodes/uptime-policy-nov-2020.pdf
âWhat can I do to prevent this from happening, I trust a comforting message from xxnetwork but I have to control that?â
We are in BetaNet and things donât always work as expected. We have anticipated these kinds of things which is why we have set the requirements we have. We feel theyâre reasonable and shouldnât be too much of a burden.
watch -n1 "ps -A | grep xxnet"
Will show you that state of the processes rather than the service. It will switch to <defunct>
if there is a round failure, but the services usually do a good job of restarting the processes. If it gets stuck in the <defunct>
for more than a few minutes, that indicates the process has crashed.
For a better understanding check out âŚhttps://staging-forum.xx.network/t/what-defunct-is-wrong/1945 and https://staging-forum.xx.network/t/the-xxnetwork-logs-and-useful-tools/1958
You may also want to look into a monitoring solution that another user proposed. https://staging-forum.xx.network/t/monitor-your-node-online-status/2399
Thanks for the hints and the help. Searching for FATAL, PANIC and ERROR in the logs apart from checking if the process is running are possibilities as I understand. As the node process can stop and be restarted with success by the wrapper does not make it that easy to just search for stopped states because a confirmed ERROR state or FATAL does not mean it can not be restarted, but you do not know for sure. Those states are frequently appearing in the log. Did not find PANIC now.
When directly checking you can watch but it needs to check automatically Idea is to make a script to check on states ERROR or FATAL for the last 10 lines of the log or so and in time interval check again when such a state occurs and repeat that for a couple of times and after that check if the node process is running because than you are sure the wrapper script did not restart it.
Does state PANIC mean the node process can not be restarted by the wrapper script? So in that case just restart.
Whatâs the average time it takes to complete the restart process once the wrapper script detects it needs to restart?
Itâd be probably easier and more accurate to check if the log file modification time is older than 3 minutes.
Right, easy and clear
So if thatâs the case the node process needs to be restarted?
To not be marked as offline, yes. You should also notify the xx team, though, because I suppose that was the rationale behind this âdesignâ.