4k prime update

Today we are releasing an immediate update to the network which significantly increases security in exchange for substantial increases in computation requirements on off-spec machines.

This change, moving from a 2048 bit prime (RFC 3526-3) to a 4096 bit prime (RFC 3526-5), increases the difficulty of modular exponentiation, the most expensive and ubiquitous operation in precomputation, by 691% on CPU by while causing only a 7% difficulty increase on GPU.

This work has always been in the pipeline to ensure the security of the network is future proofed and was requested due to the payload size which better supports Praxxis transactions and the updated User Discovery key negotiations. This update also addresses an issue with the network where many node operators are taking advantage of the current low utilization of specified hardware.

Over the last week, community members observed that a portion of the network (~⅓) are running on cheap VPS servers which do not meet the xx BetaNet spec. Ultimately, these will not be sufficient or cost effective for the network. They also add significant variability to our testing, as well as impacting the decentralization of the network.

This release is being done as an urgent release per the handbook and is a mandatory update. Node operators who compile updates themselves will have the same grace period as normal. It will be automatically distributed to nodes who have automatic update enabled.

GPUMaths

MR: https://gitlab.com/elixxir/gpumathsgo/-/merge_requests/41

ChangeLog:

  • This update was made to allow us to better benchmark variations in different primes.

Primitives

MR: https://gitlab.com/elixxir/primitives/-/merge_requests/118

ChangeLog:

  • Moved to the new message format already designed for the xx messenger release. It is more flexible and allows for variable prime sizes.

Crypto

MR: https://gitlab.com/elixxir/crypto/-/merge_requests/224

ChangeLog:

  • Updated Internal Dependencies
  • Updated for changes to message structure

Gateway

MR: https://gitlab.com/elixxir/gateway/-/merge_requests/175/diffs

ChangeLog:

  • Updated Internal Dependencies
  • Updated for changes to message structure
  • Dummy messages are generated in the 4096 bit space

Server

MR: https://gitlab.com/elixxir/server/-/merge_requests/594

ChangeLog:

  • Updated internal dependencies
  • Removed some unused code
  • Modified tests to better test variable configurations of the GPU library

Network Configuration

  • Move to the 4096 bit prime as described in RFC 3526-5
  • Round timeout set to 2 minutes from 1 minute

Expected Outcomes

Based upon testing, we expect the Precomputation on CPU nodes to become 3~5x slower, with a negligible change for GPU nodes.

On top of computation, moving to a prime which is twice as large also doubles the payload size and the amount of data communicated across the network. As a result, latency due to communication is expected to increase.

After a few days of testing, the team will begin disabling nodes which are not performing until network performance is brought inline with expectations.

We will make sure those nodes who need it have adequate time to upgrade before we make any final decisions, and we will make sure to fully address the community and give time for community input before any decisions are finalized.

For those nodes who have met the specs since launch, we will make sure they are treated fairly and will announce our approach in the near future.

Next Steps

This update will cause Realtime operations to be 1.5~2x slower for all nodes. Realtime is currently being done on the CPU on all nodes. A future update will have it operate on GPU instead, bringing Realtime execution time back to on par with previous numbers.

Once these updates fully stabilize, the team will look into next steps. Our goal will be to increase the batch size of the network and network team size. Increasing the batch size will increase throughput, anonymity, and latency. Increasing team size will improve the security of the network while decreasing throughput and increasing latency.

The team is also hard at work on the Arrow SDK, the xx messenger, and xx consensus. We are happy to report that messages have been sent over internal test nets with the rewritten Arrow SDK and we are excited to bring more news very soon.

9 Likes

Can we not go to the maximum possible prime for a RTX 2070 (minimum requirement) ? In my opinion this would make sense until there are FPGA’s.