My understanding is the client would not be able to do that as the logic is handled at the section level. So the client requests a chunk, and all the elders (or three currently) receive the request and then decide how to proceed.
Data being spread out geographically is obviously good for making sure data is always available / never lost. When retrieving data response times are also important, asking all nodes to provide a part of the data seems like it it would be slow and detecting failures at the point the data is required.
You could just ask the closest node for the data?
To validate all nodes have the chunk you can periodically ask them all too hash a random section with a random integer. (This can also be used to keep statistics of who is closest to that that elder).
Ideally you want to encourage the nodes that hold a chunk to be as far away from each other as possible, but also know the fastest path to retrieve the data. Makes me think of the Babel Routing Protocol
There are already libs for this. So it’s a matter of the Adults encoding the data and giving the correct piece to each elder who then returns these to the client. So much more mole hill than a mountain. It should be 1-2 weeks man hours I think. At least of that order.
I’m sorry i’m confused. How do you break a file into 7 pieces then reassemble it with only 5/7 of those pieces? That doesn’t seem logically possible. Where does the receiver get the 2/7 missing data from? I must have missed something.
Basically think of it as the file gets made larger by circa 40% and split in such a way that you can lose some pieces and still have the file reconstituted. Here’s a quick and easy paper on it https://www.eecs.harvard.edu/~michaelm/TALKS/RabinIDA.pdf Note we don’t have a large file issue and a normal Rabin ID A(or even reed Solomon) algo is enough
Instead of asking Bob to acknowledge the messages she sends, Alice devises the following scheme.
She breaks her telephone number up into two parts a = 555, b = 629, and sends 2 messages – “A=555” and “B=629” – to Bob.
She constructs a linear function, , in this case , such that and .
She computes the values f (3), f (4), and f (5), and then transmits three redundant messages: “C=703”, “D=777” and “E=851”.
Bob knows that the form of f ( k ) is , where a and b are the two parts of the telephone number. Now suppose Bob receives “D=777” and “E=851”.
Bob can reconstruct Alice’s phone number by computing the values of a and b from the values ( f (4) and f (5)) he has received. Bob can perform this procedure using any two err-mails, so the erasure code in this example has a rate of 40%.
The key point is that in order to transmit (in our case store) the information data has been created that is not the information but can be used to reconstruct the information.
Note that none of the fragments Alice sent:-
C=703", “D=777” and “E=851”
are the actual information - her actual phone number:-
or even contain any of the same data!
The downside is that more data has to be transmitted or stored to guarantee the integrity of the information.
I work in IT and have to explain object storage which uses erasure codes to protect data rather than RAID. In fact, RAID is a subset of erasure coding apparently. If you do the maths…
And talking of doing the maths - constructing this synthetic data to store the information and then recreating the information from it uses needs more processing and takes longer than just storing it. And that is why it is slower and generally has to have a cache layer in front of it to get any kind of performance.
The attraction is that it is more space efficient than using RAID and definitely more space efficient than replication. When the data is being accessed across the internet the speed of reconstructing the information isn’t generally an issue.
But if I’m understanding this correctly you aren’t actually reconstructing the information you’re just creating a map to where the information is located. It’s like transmitting torn map fragments or gps coordinates to where the data is hidden in XOR space. And since SAFE stores everything ad infinitum it’s just a matter of pointing to the right file location in order to retrieve the data. So perhaps the confusion lies in that you aren’t actually having the end user reconstruct the file at all but rather the coordinates for the file location. Or am I again missing something?
Great update Ants! Interesting that Erasure Codes are being considered. There has been a lot of discussion on the forum in the past on this and I’d recommend this thread for people wanting a refresher:
Always interesting things happening. Thanks Maidsafe team!