Network Dashboard is on Error

Happening the same here What do I have to do? Dashboard is completely wrong Do not want to lose any uptime or offline Looks like

The issue is on our side with the permissioning server. Once it has been resolved, your node will connect.

is not true So am I punished now for a remote mistake? Would be nice to have 0% downtime instead of 55.72% Timeout was about 1% instead of 14%

What is your node ID?

vj+z74EX070KbLWlozZ6zZgjKGcZXiaRCruUMJkp8oEC

Please be sure you are looking at the correct time period. Your uptime is 84% for the month of November. And the current average of just a 4-5 days will average out over the next 25 days.

I apologize the node didn’t recover from the error with permissioning and you had to reboot your computer.

I am sorry, can not go along with just your remark about the errors and apologies. The consequences are for me. You have to adjust my figures Why? Because you posted this message:

There is nothing you need to do. The issue is on our side with the permissioning server.

Once it has been resolved, your node will connect.

Sorry for the inconvenience.

Before the offline permissioning server my offline time was 0% and your message comforted me, I am the one who brought the error up first and now you punish me by this? Not acceptable

If at the end of November, your node does not meet the Uptime requirements, please contact me and I will ensure you are not punished for the downtime you accrued due to my mistake.

What are the requirements for November? I hate downtime at all!
What can I do to prevent this from happening, I trust a comforting message from xxnetwork but I have to control that?

You used just slightly more than 13h out of your 180h downtime for November. Relax and monitor that your node process is always running.

The process was running, thats the whole point

Likely the wrapper process, not the node process.

The node process exits when it encounter an unexpected error, and the wrapper doesn’t restart it in such case “by design”.

Very strange when checking

sudo systemctl status xxnetwork-node.service

this was the message

Active: active (running)

That’s the wrapper. If you look at your node.log (if you still have it) you’ll probably see that for several hours there was no activity.

Is there a smart way to check the node.log automatically on specific terms that indicate something is really wrong, a lot of messages in between that do not indicate the node process is stopped just waiting for something, and the whole process needs to get restarted?

“What are the requirements for November?”
Re: Requirements - Each month you can find the requirements which need to be met at the bottom of the page to the right. https://xx.network/nodes/run
Re: November Requirements - https://xx.network/nodes/uptime-policy-nov-2020.pdf

“What can I do to prevent this from happening, I trust a comforting message from xxnetwork but I have to control that?”

We are in BetaNet and things don’t always work as expected. We have anticipated these kinds of things which is why we have set the requirements we have. We feel they’re reasonable and shouldn’t be too much of a burden.

1 Like

watch -n1 "ps -A | grep xxnet"

Will show you that state of the processes rather than the service. It will switch to <defunct> if there is a round failure, but the services usually do a good job of restarting the processes. If it gets stuck in the <defunct> for more than a few minutes, that indicates the process has crashed.

For a better understanding check out …https://staging-forum.xx.network/t/what-defunct-is-wrong/1945 and https://staging-forum.xx.network/t/the-xxnetwork-logs-and-useful-tools/1958

You may also want to look into a monitoring solution that another user proposed. https://staging-forum.xx.network/t/monitor-your-node-online-status/2399

Thanks for the hints and the help. Searching for FATAL, PANIC and ERROR in the logs apart from checking if the process is running are possibilities as I understand. As the node process can stop and be restarted with success by the wrapper does not make it that easy to just search for stopped states because a confirmed ERROR state or FATAL does not mean it can not be restarted, but you do not know for sure. Those states are frequently appearing in the log. Did not find PANIC now.

When directly checking you can watch but it needs to check automatically Idea is to make a script to check on states ERROR or FATAL for the last 10 lines of the log or so and in time interval check again when such a state occurs and repeat that for a couple of times and after that check if the node process is running because than you are sure the wrapper script did not restart it.

Does state PANIC mean the node process can not be restarted by the wrapper script? So in that case just restart.

What’s the average time it takes to complete the restart process once the wrapper script detects it needs to restart?

It’d be probably easier and more accurate to check if the log file modification time is older than 3 minutes.

Right, easy and clear
So if that’s the case the node process needs to be restarted?

To not be marked as offline, yes. You should also notify the xx team, though, because I suppose that was the rationale behind this “design”.