Sudden start Unable to start comms server: listen tcp 0.0.0.0:11420: bind: address already in use

The node server is running and suddenly it looks like it restarts and gives these errors, port 11420 is not free, the server was running and port has not been released …

… Transmission Token: [] Reception Token: [185 50 93 68 135 248 104 68 158 234 135 89 18 22 213 238 235 31 116 190 101 75 212 253 144 22 66 39 68 24 212 145] EnableAuth: true MaxRetries: 100 ConnState: IDLE TLS ServerName: xx.network TLS ProtocolVersion: TLS SecurityVersion: 1.2 TLS SecurityProtocol: tls

}, Failed to authenticate id: vj+z74EX070KbLWlozZ6zZgjKGcZXiaRCruUMJkp8oEB

INFO 2020/07/02 09:36:19 Log Filename: /opt/xxnetwork/node-logs/node.log

INFO 2020/07/02 09:36:19 Config Filename: /opt/xxnetwork/node.yaml

WARN 2020/07/02 09:36:19 Could not get CPU usage info: no CPU time progression since last call

INFO 2020/07/02 09:36:19 Loaded params: &{KeepBuffers:false UseGPU:true RngScalingFactor:10000 SignedCertPath: SignedGatewayCertPath: RegistrationCode:rvYzmLX2wImApDBSKnPaciDdNNfSYgwIRDIBouSC15s= Node:{Paths:{Idf:/opt/xxnetwork/node-logs/nodeIDF.json Cert:/opt/xxnetwork/creds/node_cert.crt Key:/opt/xxnetwork/creds/node_key.key Log:/opt/xxnetwork/node-logs/node.log} ListeningAddress:0.0.0.0 Port:11420} Database:{Name:nodedb Username:node Password:Bassieenadriaan24 Address:127.0.0.1:5432} Gateway:{Paths:{Idf: Cert:/opt/xxnetwork/creds/gateway_cert.crt Key: Log:}} Permissioning:{Paths:{Idf: Cert:/opt/xxnetwork/creds/permissioning_cert.crt Key: Log:} Address:permissioning.prod.cmix.rip:11420} Metrics:{Log:/opt/xxnetwork/node-logs/metrics.log} GraphGen:{minInputSize:4 defaultNumTh:16 outputSize:4 outputThreshold:0} PhaseOverrides:[] OverrideRound:-1 RecoveredErrPath:/opt/xxnetwork/node-logs/node-err.log}

INFO 2020/07/02 09:36:19 Initalizing the backend

INFO 2020/07/02 09:36:19 Using database backend for UserRegistry!

INFO 2020/07/02 09:36:19 Converting params to server definition

INFO 2020/07/02 09:36:19 Creating server instance

INFO 2020/07/02 09:36:19 Initializing GPU maths, CUDA backend, with memory size 268435456

FATAL 2020/07/02 09:36:20 Unable to start comms server: listen tcp 0.0.0.0:11420: bind: address already in use

gitlab.com/elixxir/comms/connect.StartCommServer

/root/go/pkg/mod/gitlab.com/elixxir/[email protected]/connect/comms.go:129

gitlab.com/elixxir/comms/node.StartNode

/root/go/pkg/mod/gitlab.com/elixxir/[email protected]/node/handler.go:33

gitlab.com/elixxir/server/internal.CreateServerInstance

/builds/elixxir/server/internal/instance.go:143

gitlab.com/elixxir/server/cmd.StartServer

/builds/elixxir/server/cmd/node.go:135

gitlab.com/elixxir/server/cmd.glob..func2

/builds/elixxir/server/cmd/root.go:69

github.com/spf13/cobra.(*Command).execute

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:846

github.com/spf13/cobra.(*Command).ExecuteC

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:950

github.com/spf13/cobra.(*Command).Execute

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:887

gitlab.com/elixxir/server/cmd.Execute

/builds/elixxir/server/cmd/root.go:99

main.main

/builds/elixxir/server/main.go:13

runtime.main

/usr/lib/go-1.13/src/runtime/proc.go:203

runtime.goexit

/usr/lib/go-1.13/src/runtime/asm_amd64.s:1357

panic: Unable to start comms server: listen tcp 0.0.0.0:11420: bind: address already in use

gitlab.com/elixxir/comms/connect.StartCommServer

/root/go/pkg/mod/gitlab.com/elixxir/[email protected]/connect/comms.go:129

gitlab.com/elixxir/comms/node.StartNode

/root/go/pkg/mod/gitlab.com/elixxir/[email protected]/node/handler.go:33

gitlab.com/elixxir/server/internal.CreateServerInstance

/builds/elixxir/server/internal/instance.go:143

gitlab.com/elixxir/server/cmd.StartServer

/builds/elixxir/server/cmd/node.go:135

gitlab.com/elixxir/server/cmd.glob..func2

/builds/elixxir/server/cmd/root.go:69

github.com/spf13/cobra.(*Command).execute

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:846

github.com/spf13/cobra.(*Command).ExecuteC

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:950

github.com/spf13/cobra.(*Command).Execute

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:887

gitlab.com/elixxir/server/cmd.Execute

/builds/elixxir/server/cmd/root.go:99

main.main

/builds/elixxir/server/main.go:13

runtime.main

/usr/lib/go-1.13/src/runtime/proc.go:203

runtime.goexit

/usr/lib/go-1.13/src/runtime/asm_amd64.s:1357

goroutine 1 [running]:

log.(*Logger).Panicf(0xc0002f6690, 0xd77df0, 0x21, 0xc00069b870, 0x1, 0x1)

/usr/lib/go-1.13/src/log/log.go:219 +0xc1

gitlab.com/elixxir/comms/node.StartNode(0xc0002f9170, 0xc00048e040, 0xd, 0xeb46e0, 0xc000365600, 0xc00010aa80, 0x815, 0xa15, 0xc00064c000, 0xcc4, …)

/root/go/pkg/mod/gitlab.com/elixxir/[email protected]25-1f2aa35414c0/node/handler.go:36 +0x13d

gitlab.com/elixxir/server/internal.CreateServerInstance(0xc0008cec00, 0xda6630, 0xc00048f18c, 0xc00087f600, 0xc0008dbb10, 0xda6528, 0xc0008dbb20, 0xda6530, 0xc0008dbb30, 0xda6538, …)

/builds/elixxir/server/internal/instance.go:143 +0x531

gitlab.com/elixxir/server/cmd.StartServer(0xc0000d6a20, 0xc000121d30, 0x1)

/builds/elixxir/server/cmd/node.go:135 +0x87f

gitlab.com/elixxir/server/cmd.glob..func2(0x1608ac0, 0x168db10, 0x0, 0x0)

/builds/elixxir/server/cmd/root.go:69 +0x55

github.com/spf13/cobra.(*Command).execute(0x1608ac0, 0xc0000321d0, 0x0, 0x0, 0x1608ac0, 0xc0000321d0)

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:846 +0x2aa

github.com/spf13/cobra.(*Command).ExecuteC(0x1608ac0, 0xc000000180, 0xc000121f50, 0x4078bf)

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:950 +0x349

github.com/spf13/cobra.(*Command).Execute(…)

/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:887

gitlab.com/elixxir/server/cmd.Execute()

/builds/elixxir/server/cmd/root.go:99 +0x31

main.main()

/builds/elixxir/server/main.go:13 +0x20

Had to manually stop and start it to solve this

We’re pushing a fix for this today, most likely.

The issue is, when the server shuts down/crashes, it can take a couple minutes for the port to free so it can bind again.

Will you fix it by setting the socket flag (as I suggested) or by adding a delayed retry?