Are Erasure Codes (Storj) better than Replication for the SAFE network?


#62

Not necessarily. To be honest, almost all arguments against forward error correction were against certain implementation decisions instead of problems with the actual method. Exceptions are arguments about the additional complexity it would bring.

For example, to address the complaint about low-end devices, FEC can be implemented in a way that the first 4 blocks of a 4-of-16 (or similar) scheme would contain the chunk unencoded, in an append-and-read way, so low-end devices would just need to request those first 4 blocks and “decode” them without computation overhead. With caching, this would not place high burden on the sections responsible for those blocks, and the storage cost would still be just 4x for a 13x redundancy. Higher-end devices could request a random selection of 5-6 out of the 16 blocks to achieve a much lower latency.

Anyway, this is a bit off-topic for this thread so I’ll stop here. (Thanks @draw for moving the posts.)


What’s up today?
#63

Erasure coding did require more CPU time, that’s true. Still, a reasonable erasure encoding library can generate encoded data at at least 650 MB/s, which is unlikely to be the major throughput bottleneck over a wide-area network with unreliable storage nodes.

https://storj.io/blog/2018/11/replication-is-bad-for-decentralized-storage-part-1-erasure-codes-for-fun-and-profit/

Yes it needs more CPU time, and we would need to benchmark it and see if the real cost is higher than the gains.


#64

We looked at erasure codes a few years back for data transfers, sending parts of files across groups as opposed to the whole file. The issues were bad part detection and repair if that happened. The answer was hash checks, these added huge overhead to the scheme as a link to the file was no good, you needed another data element to store the hashes and link them to the file. It quickly became a never-ending spiral downwards.

This is of course after looking deeply at them in the initial design, there are significant problems as I posted above. These are significant issues I believe nobody has resolved. Are Erasure Codes (Storj) better than Replication for the SAFE network?

To be clear though, reed-solomon or IDA (Info dispersal algorithm) are not that cpu heavy though. In fact quite light, but when you get a bad part, it becomes poly hard to compute what part is bad (it’s like you need to test 2^(p+1) parts and hope there is only one bad part. So you are forced in a hostile environment to add many more data structures. Locally in RAID devices it is not a problem or not so much of a problem.


#65

I see what you are saying, kinda move to a raid10 type pattern. It makes sense, but I see these as opposing patterns in many ways. I could be wrong/proved wrong though. If loss recovery and bad part detection was in place then I can see FEC etc. working but then we would not need replication at all. The devil is in the network knowing when a file is in danger (or a single replicated chunk) and how much danger to act appropriately. Praying for nodes to stay up is not gonna hack it IMHO. They will churn and we need to handle that.

How much churn verses how many replicants and what group size (relates to data stored) or what n and p you set etc. is where much of the issues are in my opinion. These all directly relate to churn levels and those being within the limits the network can handle. Faster churn means faster loss detection. So we go for the fastest possible loss detection and then say, whatever churn is we need to cope. I suspect doing it the other way around is problematic. So optimising churn work is essential as then we can cope with higher outage issues which is essential. Incentivising a node to stay up is great, but will not help with global outages and we have those :wink: So we need to solve the fundamental issue of huge churn and coping with that.


#66

Seems like optional file “prechunking” client-side would offer an easier route to extra data integrity via Erasure codes without needing to change anything network-side.


#67

I agree you could do that, sort of make raid 10 but a bit backwards :wink: So replicate each part of an erasure code scheme.

To what purpose though? If we say no chunk will be lost and you will get the file back faster than erasure codes (which you will), what would we win?


#68

That is all well and good. But storj is not doing that.

Also for both the problems really arise when there are say 4 missing out of your 16. Say 1 4 8 and 12 then there is the requirement for a better device than the minimum needed for SAFE. And Storj have said that the repairs (on the nodes) don’t happen immediately or quickly because they rely on (virtually) always on servers.


#69

Interesting thread! I feel like this is akin to the old CISC versus RISC debate in hardware. Both have advantages, but to really grok the benefit of either advantage you have to know how it’s going to be used in advance … which you really can only guess at.

So it’s good that Storj and Maidsafe are taking different approaches and we will learn in time what is better (or maybe we will learn that they are both good for different groups/applications). Software can always be changed down the track. Personally I like to follow the KIS principle when venturing into the unknown, so I think I like MAIDSAFE’s replication approach better for now.


#70

Oh, well. I’m more interested in what would do best for the Safe Network than in what Storj is doing, and that sounds like a good match for the title of this thread :wink: Inspirations are great, mindless copying is stupid.

I thought a bit more and I think a “concat first 4 blocks => chunk, rest of blocks => distributed parity” type of scheme would be the best of both worlds.

In the simplest implementation, clients would generate chunks as they do now (with encryption and so on), but then they would slice them up into 4 pieces, store the hash for both the chunk and the pieces in the data map, and then PUT the chunks as usual.

The vault responsible for the chunk (based on its XOR distance like now) would also slice it up into 4 blocks to match the client’s data map, then it would generate 12 additional parity blocks, put their hashes into a “minimap” and send the 16 blocks to the sections responsible for storing them based on their hash, together with the minimap so those vaults can restore the block if it gets lost.

Clients would never have to see any of the parity blocks or the chunk minimaps – they would just request the 4 primary (unencoded) blocks they stored in the data map, concatenate them, and everything is the same as now from that point on. As you can see, low-end clients would not be affected in any way.*

* High-end clients could request the minimaps so they could request more blocks than necessary to achieve a lower latency; this is purely optional and should not be part of the official client library.

The huge win is unprecedented redundancy (any 12 of the 16 blocks can be lost, mixed with Safe’s “instant loss detection” feature) at a comparatively low (4x) storage cost without complexity in (potentially low-end) clients. Only vaults, which are higher-end nodes by definition, would need to ever worry about the more computationally demanding erasure code decoding.

I’m not sure I understand this. I wouldn’t think FEC requires more nodes to stay up than replication; quite the contrary, at least in the schemes I was considering.

Could it be we’re thinking about using FEC in fundamentally different ways?


Possible implementation:

  • M-of-N with M=4 and N=16 – 2 MB chunks, 16 x 512 KB blocks, any 12 of which can be lost at the same time
  • this requires 2 requests / 1 MB data, up from the current 1 request (if we have 1 MB chunks now); this is the primary trade-off for the higher comparative redundancy
  • chunks are “minimaps” that point to the 16 blocks that make it up; the vault that “stores” the chunk stores only the minimap
  • the first 4 blocks are the primary blocks and they are just 4 unencoded slices of the (already encrypted) chunk data; these are the blocks that the data maps point to (that is, clients neither know about, nor request any of the parity blocks)

#71

It was more the storj approach I was meaning, not FEC in general. Sry for any confusion there.

I am not sure, but the thread above I posted about when a bit gets lost and what happens is a huge issue, the traffic to maintain disparate copies is unworkable AFAIK. I have never seen a solution to it and all the rest of the discussion is about potential benefits if we could. So it is more I have not got past the huge elephant in the room as I described above. If that has no fix then FEC IDA etc. all are unworkable.


#72

This question probably has a very simple answer that has been explained elsewhere, but I can’t find it. (well, it’s 2 questions actually).
I’m a primary school teacher at heart so I’m going to keep this simple.
A file saved onto the SAFE network is split into 3 pieces, A,B and C. For the sake of redundancy it is replicated, so we now have A1, A2, A3, B1, B2, B3, C1, C2, C3.
Ax, Bx and Cx are required to rebuild the file where x is any number.

The network refreshes and finds that A2, B3 and C1 nodes are offline and the redundancy falls below what is considered an acceptable level, so Ax, Bx and Cx are copied and distributed to new nodes, creating A4, B4 and C4.
If the nodes hosting the files that weren’t available at last refresh come back online, will A2, B3 and C1 be deleted from those nodes as a punishment for not being available, thereby reducing farming income, or will there simply be 4 copies of each file chunk now available? Will A5, B5 and C5 be produced if necessary? Is there an upper limit to the number of copies of a chunk the network will generate? (Think Agent Smith from the Matrix!)

Also, from my reading (this is my second question, the extra ones above didn’t really count!) about Erasure Codes and replication it seems completely obvious to me that replication will lead to a faster and more reliable network, unless I’m missing something glaringly obvious. Simply because the network says “this bit is missing, get it from another node and copy it” during replication, doesn’t it?

Erasure codes says - “this bit is gone, get the other bits, decode them, use that to rebuild the missing part, re-encode it and put it back on the nodes” which surely requires more time and processing than the simpler copy and paste method of replication.

Sure, this means that replication will use more storage space per file, but storage is crazy cheap compared to processing.

Am I wrong? Am I over-simplifying?

(Also, for anyone wondering why I’m writing this post during work hours, it’s because my Windows machine crapped out and is currently re-installing Windows 10 so I’m on another computer with no admin rights and none of my work so far to continue with!) :joy:


#73

If the nodes pass validation when coming back online then the extra chunks will not be deleted and those chunks will reduce the requirement to replicate if another node goes offline.

Yes. Storj though says that replication requires more bandwidth and the ISPs will not allow it.

Its still not certain just how much the difference in bandwidth between them in practice especially if their nodes are not online as much as they plan for.


#74

I think that’s roughly it.

The EC people would say that corruption/loss is very rare, so we save a lot of space and bandwidth as we don’t have to make so many copies … but they don’t really know as their platform isn’t live.

The replication people would say that space is cheap and our way will be faster, and while it will increase the bandwidth usage of the network, that is manageable … but we don’t really know as our platform isn’t live yet …

lol They are both interesting ways and it’s a great experiment. So I am happy that both are being tried and when the results are in, it won’t be about ego’s over here I think, if EC clearly works better, then it’ll be adopted in some form I suspect.


#75

@TylerAbeoJordan and @neo thanks for your replies.

As far as bandwidth issues and ISPs not allowing the increased network traffic I’d be interested to see how they know what the data being passed actually is. AFAIK the chunks are encrypted and passed through random ports so shutting it down should prove extremely difficult? Unless it’s passed using a different protocol and that can be shut down, but I thought it was TCP/UDP?

If ISPs try to shut down decentralized networks then the whole concept may really backfire and we end up with ultra-strict ISPs with ridiculously tight firewalls, basically a restricted version of the internet, just like China. Realistically ISPs should be cashing (not a typo or a pun!) in on the action by shifting the content of their own data centres to the network and using the free security, then offering up their now redundant servers as vaults!


#76

ISP’s won’t be able to censor the network directly - currently and depending on where you live, they may restrict your bandwidth if you use more and a certain amount of data up/down per some period of time. For the average person, this means that if they try to farm they will likely be restricted by their ISP’s or hit with a large unexpected bill.

Over time I expect more and more bandwidth to come online as more fiber is laid and more satellites go up … so ultimately I don’t think it will be a problem. In the short run we can still pay for more up/down data and/or use data centers to do the job. Some centralization yes, but way less than with something like bitcoin IMO.


#77

I agree that it’s important for us to consider ISP behavior and network infrastructure.
This is a good talk on the topic: https://youtu.be/kW6e1GCpqpE?t=755

More or less, Marco is encouraging the decentralized space to think about how to align our goals with those incentives that will cause ISPs to invest in P2P network behavior.

He proposes general guides, some of which stuck out to me:

  • Solve problems for enterprises, as they tend to drive infrastructure and can pay the premium for ISP services.
  • Participation in network operations community, such as NANOG, to receive insight and provide feedback to network operators.
  • Target market trends that ISP’s are already planning to invest billions of dollars in, such as data privacy solutions, infrastructure for IoT, and inference for machine learning, to name only a few of many.
    • For inference, our RDF efforts may be the foundation for this trend.

#78

Honestly though, that should be a different battle no? Because thats just bullshit regardless what way you look at it. Im sure theres developing countries with better options. (I live in Spain, which has been going through heavy economic crisis for a decade at least and I get 600 symetric fibre for like 60 euro a month, no cap.)


#79

Replication does requires more bandwidth for the same level of redundancy for the same reason it needs more storage space for the same level of redundancy. It’s beyond obvious.

The question is whether it’s worth the complexity (including that around maintaining a heterogeneous set of blocks) for that gain.

@neo has already pointed out, and I had to concede, that the “very simple” answer (where having to decode erasure coded blocks is a given) punishes low-end nodes.

I do believe erasure coding would be a great addition at the vault level, but clients should be served unencoded blocks. See my post just before yours for more details.

Please, not this again. EC in general (ignoring certain approaches about its implementation) is about making the store tolerant to a much higher level of losses at the cost of a comparatively lower storage cost so, assuming the same storage space, EC is more tolerant to losses not less, that is, literally the opposite of what you wrote.


#80

This though requires definition

Very basic version

  1. To store (Put) the original data erasure codes are more bandwidth efficient.
  2. To download (Get) chunked file or full file replication is more bandwidth efficient

It is very likely data is read more than it is stored, the values will change for differing data types though, but in general reads are significantly more frequent than writes.

Maintenance and costs

  1. Erasure codes will need to send RPC’s to other nodes to register where each part is, plus have some kind of monitoring of the parts to know there are still there (so decreases bandwidth efficiency) (but many parts could be lost before the network tries to recover the file)
  2. Chunks will always be replaced when lost, as they are lost as it does constant recovery (so decreases bandwidth efficiency) (but all parts are constantly monitored almost free of charge)

So subtle differences in management overhead may be more tricky to analyse in terms of bandwidth used verses network efficiency.

I could not agree more, the schemes do need a lot more info to figure which works best in which network architecture. Maintenance and reads are where I suspect most work will be and where perf testing should focus.

I agree, but with caveats, if the EC scheme replaced missing parts as quick as chunked parts are replaced in replication then it can be more tolerant for the same space, but both can be equally tolerant to loss, but replication is less space efficient for sure.

Its all very subtle and does need to be looked at in depth in the architecture they are aimed at, i.e. local disks or internet etc.


#81

I don’t understand why. For example, the implementation I outlined a few post ago, chunks would be split into 4 blocks and then 12 additional parity blocks would be generated for them to achieve a 12-fold redundancy. However, clients would still just get just the first 4 blocks which is just the original chunk, so it’s neither more nor less bandwidth (or computation, which may be important for low-end nodes), except for a few bytes due to sending more requests.

Only if implemented wrong. Just consider the following.

Sections would store a) chunks, b) blocks. The chunks, however, are just the “minimaps” containing the addresses of the blocks.

The chunk. When a new chunk is PUT, the section that the chunk belongs to would: a) split up the chunk into the 4 primary blocks, b) generate the 12 parity blocks, c) send each of these 16 blocks to their own sections (in XOR distance for their hash), together with their chunk’s minimap. The chunk’s section can forget about those blocks at this point because they are taken care of by their own respective sections. There’s no coordination between the chunk and the blocks beyond this point.

The blocks. The block’s section now has not only a) the block, which it assigns to one of the vaults, but also b) the “minimap” with all the information necessary to restore it if necessary. So, it will never have to contact the chunk about it. There’s no need for coordination from the block to the chunk, either. The minimap would of course need to be stored separate from the actual block, on the data chain of the section, I think is how it works.

So, from the section’s point of view:

  • there’s no RPC call for the list of other blocks; it’s always at hand
  • it’s not considered with the other blocks of the chunk; other sections look after those
  • side note: it doesn’t even have to know which chunk the block belongs to, kind of like a section doesn’t care which file a chunk belongs to
  • if a block is lost, the section can request any 4 of the blocks from the minimap it has and restore it

From the client’s side, the file data maps would a) store the chunk and its decryption key as now, b) cache the addresses of the 4 primary blocks to speed things up and avoid extra roundtrips: the client can request the blocks directly from the section where they are stored (like it does now with the chunks). The client doesn’t need to know about the parity blocks.

As I wrote above, the section that stores the block would always have the minimap (the list of all the 16 blocks) so it could restore them from a random 4 of those right away.