Update May 26, 2022

What prevents the client from GETting from a different number of elders? Like, if it chooses to only ask 1 elder, or all the elders, how could the client be punished?

Maybe the client has to send a number specifying which subchunk to request from each elder, and then some spam-suppression can be in place for requesting from too many elders?

4 Likes

I have seen that sporadically too

9 Likes

:point_up_2: awesome

is it a mountain or a mole hill to implement? next week, next month or next year? yes I know , when is not really permitted. :slight_smile:

:partying_face:

18 Likes

My understanding is the client would not be able to do that as the logic is handled at the section level. So the client requests a chunk, and all the elders (or three currently) receive the request and then decide how to proceed.

5 Likes

Data being spread out geographically is obviously good for making sure data is always available / never lost. When retrieving data response times are also important, asking all nodes to provide a part of the data seems like it it would be slow and detecting failures at the point the data is required.

You could just ask the closest node for the data?

To validate all nodes have the chunk you can periodically ask them all too hash a random section with a random integer. (This can also be used to keep statistics of who is closest to that that elder).

Ideally you want to encourage the nodes that hold a chunk to be as far away from each other as possible, but also know the fastest path to retrieve the data. Makes me think of the Babel Routing Protocol

9 Likes

It does sound fantastic.

But yeah, is this potentially a year fix?

Overall, well done team

5 Likes

There are already libs for this. So it’s a matter of the Adults encoding the data and giving the correct piece to each elder who then returns these to the client. So much more mole hill than a mountain. It should be 1-2 weeks man hours I think. At least of that order.

18 Likes

I’m sorry i’m confused. How do you break a file into 7 pieces then reassemble it with only 5/7 of those pieces? That doesn’t seem logically possible. Where does the receiver get the 2/7 missing data from? I must have missed something.

3 Likes

Basically think of it as the file gets made larger by circa 40% and split in such a way that you can lose some pieces and still have the file reconstituted. Here’s a quick and easy paper on it https://www.eecs.harvard.edu/~michaelm/TALKS/RabinIDA.pdf Note we don’t have a large file issue and a normal Rabin ID A(or even reed Solomon) algo is enough

12 Likes

I appreciate the link but this is supposed to be easy reading? This is on par with reading a physics paper.

6 Likes

Thanks, but no thanks David. I stopped reading (due to sudden onset of migraine) at page 3, Abstract. An Information Dispersal Algorithm (IDA) is developed… :sunglasses: Southside! Beer Me!

4 Likes

For a quick read I recommend the Erasure Code page on Wikipedia:-

Skip the maths! I am not a mathematician and I refuse to try to understand the weird squiggles.

The key part of the explanation is:-

Alice wants to send her telephone number (555629) to Bob using err-mail. Err-mail works just like e-mail, except

  1. About half of all the mail gets lost.[1]
  2. Messages longer than 5 characters are illegal.
  3. It is very expensive (similar to air-mail).

Instead of asking Bob to acknowledge the messages she sends, Alice devises the following scheme.

  1. She breaks her telephone number up into two parts a = 555, b = 629, and sends 2 messages – “A=555” and “B=629” – to Bob.
  2. She constructs a linear function, f(i) = a + (b-a)(i-1), in this case f(i) = 555 + 74(i-1), such that f(1) = 555 and f(2) = 629.

Code d'effacement optimal 1.gif

  1. She computes the values f (3), f (4), and f (5), and then transmits three redundant messages: “C=703”, “D=777” and “E=851”.

Bob knows that the form of f ( k ) is f(i) = a + (b-a)(i-1), where a and b are the two parts of the telephone number. Now suppose Bob receives “D=777” and “E=851”.

Code d'effacement optimal 2.gif

Bob can reconstruct Alice’s phone number by computing the values of a and b from the values ( f (4) and f (5)) he has received. Bob can perform this procedure using any two err-mails, so the erasure code in this example has a rate of 40%.

The key point is that in order to transmit (in our case store) the information data has been created that is not the information but can be used to reconstruct the information.

Note that none of the fragments Alice sent:-
C=703", “D=777” and “E=851”

are the actual information - her actual phone number:-
555629

or even contain any of the same data!

The downside is that more data has to be transmitted or stored to guarantee the integrity of the information.

I work in IT and have to explain object storage which uses erasure codes to protect data rather than RAID. In fact, RAID is a subset of erasure coding apparently. If you do the maths…

And talking of doing the maths - constructing this synthetic data to store the information and then recreating the information from it uses needs more processing and takes longer than just storing it. And that is why it is slower and generally has to have a cache layer in front of it to get any kind of performance.

The attraction is that it is more space efficient than using RAID and definitely more space efficient than replication. When the data is being accessed across the internet the speed of reconstructing the information isn’t generally an issue.

14 Likes

But if I’m understanding this correctly you aren’t actually reconstructing the information you’re just creating a map to where the information is located. It’s like transmitting torn map fragments or gps coordinates to where the data is hidden in XOR space. And since SAFE stores everything ad infinitum it’s just a matter of pointing to the right file location in order to retrieve the data. So perhaps the confusion lies in that you aren’t actually having the end user reconstruct the file at all but rather the coordinates for the file location. Or am I again missing something?

1 Like

I had thought that the system was to send a request packet to the adults and they respond saying they have the chunk ready and then the elders tell one adult to send the data.

Basically that was similar to what is described but only for the request/check packet to the adults and then the fastest adult is requested to actually send the chunk.

Why the change to sending the chunk immediately.

4 Likes

Great update Ants! Interesting that Erasure Codes are being considered. There has been a lot of discussion on the forum in the past on this and I’d recommend this thread for people wanting a refresher:

Always interesting things happening. Thanks Maidsafe team!

12 Likes

Spectacular update! My favorite in a long time. This change is next level tech!:alien: :+1:

10 Likes

Perhaps it help if phrasing it as 7 overlapping pieces.

(how exactly overlapping is done is then more complex to explain, but the simple story of “slicing into pieces” should appear logical, I hope)

14 Likes

Thanks team. We are getting close to testnet!

9 Likes

Thx 4 the update Maidsafe devs

Amazing all the steps taken to ensure data permanence.
Do wonder how this works with archive nodes?

Really love it how everything is being optimised

keep hacking super ants

6 Likes

Thank you for the heavy work team MaidSafe! I add the translations in the first post :dragon:


Privacy. Security. Freedom

7 Likes