Nice work @mav and @JoeSmithJr
The big thing for me from an architect perspective is this really (always open to improvements).
When any bit of data is lost then the network must recover it quickly and definitely. My take has always been if you can lose a bit you can lose it all. So our scheme is not replication alone, it is chunking and replication. So each chunk can be considered the whole file in terms of loss. We lose one chunk and the file is dead. For erasure coding it could be you can lose say 20 chunks and the file may still survive.
That sounds great, however the big if really does matter.
For each piece you lose, does the network realise that and when? Lets ignore rebuild times for erasure codes and say the chunk can just be created somewhere, there is a copy somewhere we can get. So the time from loss to recover is hugely important, lets call it
T (I realised after writing this that I do not refer to T again directly, but left it here as it is important)
For replication we have
R replicants arranged in a common group, or network address space. In that space we have nodes that are keen to maintain their chunks (for farming) and not be penalised. A node could lose a single chunk (weird) and recover it by asking? More likely a node is lost. So we have nodes waiting to take it’s place as soon as possible (its a high priority event). The nodes to take its place already have the chunks they need.
So we set
R or group size at the level where we cannot lose all holders in the time it takes to replace a single holder. This is similar to kademlia, although it uses a magic number for the selection of
R (well in kad its not
R it is called
k, but irrelevant).
So we try and recover as fast as possible, but set the safety factor higher than we envisage the total number of copies can vanish. So
R is used for a particular reason.
If we had Erasure Codes then we would have
p parts of
n is the file. So the network can lose
n-p parts say. So the same logic should apply is we want to not lose data at all. This would mean the architecture would need to know when we lose
n-p parts as that is when the file is gone. This to me means when we lose a single
p then we must start that process, whatever it is. This is the crunch part though.
So for us we have sections/groups that are responsible for the single file part and they do not know the whole file. This is for security and efficiency (mainly the latter as rainbow lists etc. etc. for the former). Therefore the group knows a part is missing and acts immediately, We cannot go faster than that and its the network minimum refresh setting (in kad talk).
In EC then when a
p goes missing, there are no replicants, group members do not know or care it’s gone. However the other
n holders must start acting (even if that means waiting on more
p going missing). So now there is a mesh of RPCs connecting all
n parts and the health of each
n part. That links all holders of that file together (intake of breath time for privacy). So even if we could do that then a holder of X parts of
X files will need that mesh of connections/RPCs for each part. That will mean multiplexing connections and knowing the IP address of a huge number of nodes (again security, DOS attack etc.) and that is prohibitive for us.
There is a lot more to it, but erasure codes as far as we know from the architecture side would be very heavy on comms and affect on network architecture. There could be things done like have nodes know what neighbors hold in terms of what
n parts they have, but then the web gets huge, fast. So I am interested in how people solve this.
In terms of speed, then replication will be faster AFAIK, even simplify to RAID where all things are local, chunking and replication (striping) are significantly faster to read and write than erasure code data (parity).
If you compare erasure codes to whole file replication then its much faster, but we do not do that, we do chunking and replication, it is a huge difference. I showed the NHS in Scotland this, where we have a 300Mb/s disk array to store data, I said, we can compress, encrypt and store faster than 300Mb/s, or we can do all these operations on a file and write it faster and we did. We had over 900Mb/s, because of chunking. This is where the speed improvements come from, not erasure codes, but smaller file parts. This is why bittorrent is fast, it’s part. If you then have cpu complexity putting the parts together like erasure code algorithms, then that slows it down.
To be clear, we do not just put parts back, we do decrypt, decompress etc. as well as hash check each part. So we are slower than just striping, but not too much, erasure codes will be slower that striping as well, but not too much, both will be faster than whole file retrieval.
However the hash check is vital, otherwise it would be significant work to find a bad part, say a bit flipped
p or chunk. If it were pure erasure codes then this part is critical in a hostile Internet, not so much in a hard drive though ;-). So I think erasure codes will require more work than just implementing a reed solomon type algorithm. I would love to see folks attack these problems though and Storj seems to be looking at least, so it will be great to see how it all pans out. We can all learn all the time.