Speed and Resiliency through Forward Error Correction

Tim87 · March 3, 2017, 3:59pm

Speed as in lower latency*and resiliency in a geographical sense.

* i.e. not higher bandwidth; we’re already good on that front

The problem

When a small enough geographical region (e.g. a country, through political intervention) gets isolated from the rest of the network, it will lose access to a lot of files simply because they will have blocks with none of the 4 mandatory copies* stored within the isolated area, and thus inaccessible.

* Is that still how it works?
I did a small experiment recently, where I randomly placed a million nodes across the world based on population density*, and then I connected them at random. If I remember correctly, the average latency between two nodes were 45ms, but that would probably double after adjusting for IP routing overhead and the lower lightspeed in fiber or copper.

* I used a 1/4 degree grid to select the general area, then added some noise with 1/2 degree sd.

What this means is that each SAFE routing hop will add about 0.2s to the roundtrip latency, so we’ll have to wait a full second even if we’re only 5 hops away from the vault that host the block we want. Well, the good thing is, we can request a bunch of stuff parallel, so the bandwidth will be quite high. Still, a one-second (on average! it can easily be 4x as much) pause feels long.

How could FEC help?

Let’s say we split up each block into 8 pieces (i.e. 125 KB each), then generate 8 more pieces in a way that any 8 of the 16 pieces are sufficient to restore the whole data. Now we can lose more than 2.67x the number of blocks (i.e. 8 vs 3) without losing the block, so it’s much more likely that an isolated area has access to enough to put it together.

What may not be obvious at first is that we can also lower the average distance between a client and the destination, we just have to pick those 8 blocks out of the 16 that are closest to one of our close nodes, and request them instead of just a random selection. (When some of them isn’t returned, we can start requesting the pieces farther away from us.)

Of course, the 8-of-16 setup is just an example. It could be 4 of 12 or 8 of 32, or any other combination.

The good

Lower latency, happier users.
Data is transferred for fewer hops, so bandwidth is better utilized.
Less storage is required for the same (or, actually, much higher) redundancy.
If part of the network gets isolated, it would still have access to much more of the data.

The bad

More requests are needed for the same amount of data. (But we may still end up with similar, or not much more, requests network-wise because of the lower number of hops. Just to avoid confusion, these requests are for 128 KB partial blocks, not the 1 MB full blocks.)

happybeing · March 3, 2017, 4:34pm

Great stuff. What is the expected “average” number of his being 5 based on? Can you elaborate the calculation. Thanks.

Tim87 · March 3, 2017, 4:47pm

Just a guess; it may be 3 or 7 for all I know (probably not higher than that), but 5 seemed like a conservative lower bound for the mean. Maybe @dirvine or someone else from the team can give us a more educated estimate.

draw · March 3, 2017, 5:42pm

What/how did you test exactly? A million nodes across the world doesn’t sound like a small experiment to me.

dirvine · March 3, 2017, 5:49pm

What we do, replicate So there are 8 copies of each chunk all geographically distributed due to xor space etc.

At the moment
You request a chunk, 7 nodes give you a small message with hash of chunk and who is sending it. The sender node sends the data.

Better

All 8 send small message with hash of data. You select the first one to answer and get the whole chunk from it. Then you get the data at fastest network speed but at cost of an extra message.

Tim87 · March 3, 2017, 5:52pm

It isn’t exactly big either. You just get a population grid from e.g. Columbia Uni’s website, then use the numbers as the weights for a multinomial distribution. Whichever cell is picked for a given node, you put down the coordinates next to it. Then, you pick pairs at random, compute the great circle distance between them, and divide it by the speed of light to get the possible smallest time that it takes to send a packet between the two nodes around the globe. THE END.

Tim87 · March 3, 2017, 6:08pm

Are there any stats for how many XOR hops it is between the two nodes usually? Extra messages especially can introduce extra latency.

Based on my simulation (for when SAFE is adopted across the globe), a single hop already takes around 60-100 ms on average, twice as long for the roundtrip, so too many hops can introduce significant delay. Twice as much if there’s an extra message exchange involved. (I can’t remember the variance, and I’m not even sure the delays were normally distributed, so I’m not sure how multiple hops fared.)

I admit latency is not important in many cases (e.g. movies, live streams, etc), but it could be very annoying at the few where it is (I’m reminded of Google’s experiment about search page load delay.)

neo · March 4, 2017, 12:13am

@dirvine, are “archive” nodes included in the “8 copies” or are they separate again and thus more than 8 copies over time? Or are they part of teh 8 copies and inactive chunks move into archive nodes over time?

digipl · March 4, 2017, 8:59am

As I understand, with Datachain the archives nodes, possibly 3 in each group (sector),will be those with longer chains.

The nodes will compete to have longer chains, and in the end become archive nodes, because with longer chain you have more options to win Safecoin.

And, for the moment, I don’t know how one group (sector) can determine between active or inactive chunks.

Tim87 · March 7, 2017, 8:42pm

I revisited this subject today in a new thread. Lots of stats. Hopefully not completely wrong

Topic		Replies	Views
Speed of the safe network Features	47	1511	February 10, 2020
Low-level filesystem integration Features	13	1566	January 25, 2016
The SAFE Network's Resilience Marketing	15	2737	August 5, 2015
Network efficiency and performance improvement algorithm proposal Beginners	25	751	June 3, 2021
Pre-RFC Suggestion: Adding a memory persistence model to SAFE Network Development	7	998	January 6, 2016

Speed and Resiliency through Forward Error Correction

The problem

How could FEC help?

The good

The bad

Related Topics