Why the 100Mbit Bandwidth Requirement?

Last year, we published the “Elixxir BetaNet Nodes Guide” which stated nodes would require a “500 Megabit+” internet connection, but really leaning towards 1Gbit. Back when we published those specifications, there was concern because those numbers are not achievable for many, so the team worked to find a solution .

The previous requirement was a result of how nodes communicate so the Elixxir developers found a better way. Node to node comms were previously written to use unary communications rather than streaming communications. Now nodes communicate via streaming comms, greatly reducing the bandwidth requirements. We realize some people are still not able to meet these requirements but a reduction from 1Gbit to 500Mbit+ to 100Mbit allows for many more people to participate.

Ben recently published the revised node requirements. Today I’d like you to see how and why we require 100Mbit internet connection.

First the how. In the NodeLab I used a team of 5 nodes to complete 11 rounds. A round consists of the precomutation and the real time phases to deliver 10,000 messages, with the precomputation taking the bulk of the time. Each set of 11 rounds was conducted using 1Gbit, 500Mbit, 250Mbit, 100Mbit, 50Mbit and 10Mbit network speeds. The network in the NodeLab is 1Gbit LAN so I used Linux Traffic Control and Hierarchical Token Bucket to limit the bandwidth of the ports which the nodes communicate over.

The results can be seen in the following graph.


Now for the why 100Mbits.

Like most mix networks, the Elixxir protocol operates sequentially. A batch of messages must be processed by each node in a team order. By moving to streaming comms, a node can start on a batch before the entirety has been received. This means that as long as the transmission of data from one node to the other is faster than the cryptography, the time it takes to transfer the data does not factor at all into how long the round takes.

From our testing we discovered that for a 10k batch with 4096 bit messages, that limit is roughly 100 megabits.

xx network was able to solve the problem of slow secure messaging by not doing everything in real time and by moving most of the cryptographic work to precomputation. Unfortunately there is nothing we can do about how quickly it actually takes to send data across the internet.

I hope this helps everyone understand how we came to the decision and why we have the requirements we do.

10 Likes

Thanks so much for keeping us updated! I just wanted to throw my two cents in that, at some point, we do hit a point where it’s OK to leave some would-be node operators behind due to slow internet or other hardware considerations. I think I’m not alone in saying that while we operators genuinely want to include as many participants as possible, we also trust the team to define minimum specifications that incentivize a strong, performant network. I could see some folks getting heated about the topic, so I figured I’d throw out there that “we in the silent majority” understand the balance you’re looking to strike and respect how you’re going about it.

4 Likes

Thanks for the nice explanation and statistics !

Can you tell me what the maximum total data transfer requirements are per month for the BetaNet? Total data transfer (tiers) is a common colocation pricing method. Colocation pricing structure for internet connectivity varies but I’m guessing that total data transfer is the most appropriate for xx network nodes (i.e. vs “95% burstable”).
Thanks!

p.s. Really looking forward to what you come up with for the 1U server. Let me know if I can be of assistance.

This could be the difference for the viability of cloud based instances. If it turns out to be 100 Mbps * 30 * 86400, it’s hard to imagine how it could be competitive vs home-based setups.

I haven’t carefully calculated, but based on a quick look 10 MB/s egress costs around $2,000 per month.

Cloud based egress like AWS or GCP is definitely a no-go. But that is not what I’m talking about here.

Data plans often contain a cap per month. For example, one of my available home plans had a monthly cap of 5TB and they would have penalty charges for going over. Some data center colocation plans also have a cap (in TB), particularly if a fixed speed like 100Mbps or 1Gbps is requested. Data center bandwidth plans are much cheaper than cloud egress charges. The most common plan seems to be a 5-100Mbps base rate that can be exceeded without penalty 5% of the time (usually up to 1 or 10 Gbps) called a “95% burstable” plan. For 95% burstable plans I have not seen a cap but they get too expensive at high base rates vs fixed rate capped plans. At least one data center I talked to only offered the 95% burstable plan and I think it is the common/preferred/cheapest plan by the market. I do not have an understanding of how the xx network data usage profile would work with a 95% burstable plan (i.e. what base rate would be required?)

At 100Mbps constant flow full duplex (both ways) I calculate usage of 64.8 TB / month.

1 Like

Fair enough. I was looking for it from a cloud vs. home angle, but no doubt many folks have the concern you have. Let’s hope that the real-time part of each round involves comparatively little time relative to processing time.

100mb download is ok, but service providers gives 5mb upload speed for standart tariff. And charge almost 1500 usd/month for 100mb upload speed in Turkey. Do you know any other way to upgrade it to 100mb upload?

@Selo Some people were looking into using co-location to run their servers from data centers.

1 Like

@Keith may i ask to do the same test again and add simulated latency on each node? In your local network you have sub 1ms latency and latency has a huge impact…

You could use tc to do that… (latency between 10-100ms and 25% packet drop max)

tc qdisc change dev eth0 root netem delay 100ms 10ms 25%

Would love to see the results in this “more” realistic scenario.

Best regards

Unfortunately, not as realistic as it may seem. Nodes of a team are selected randomly. If all the nodes in a team are in the same geographic bin the latency between nodes is low (maybe 5-10 ms). If all nodes in a team are in different geographic bins then the latency between nodes could be anything (1+ ms) and the round trip time could be short or long.

The only result would be determining the average latency between nodes and again that’s not realistic because a round could timeout even if two nodes in Asia have very low latency and one node in North America slows the whole thing down.

This causes the added delay to be 100ms ± 10ms with the next random element depending 25%
on the previous one.

I’ve been looking at colocation also. I’m looking at 100Mps with 10TB a month. Would that be sufficient?

As things stand, yes.

1 Like

Ok cool! Going to start with a home setup if I get chosen. Then as things progress I might deploy a dedicated server in a data center.