AND I was not disagreeing. Just because I say what someone else says does not say I am disagreeing. Any disagreement I have is stated where-ever but only on some specifics, like I suggested reading is 90% of the bandwidth usage where storj are writing as if churn is almost all and erasure codes is ever so much less. But if reading is 75 or 80 or 90% then churn is miniscule compared to the reading bandwidth.
It is really because you do not know which blocks exist, you may need a parity block to make up the file and therefore have to either try 4 and wait (if any of these 4 are missing, which is why you have erasure codes in the first place) and then get a parity block or more or just ask for more to start with. So you could ask for just 4 and if none were missing then both schemes are equal, but you have erasure as you think some could be missing, so you need to get more pieces.
Yes, I see what you are saying here (which is beyond the typical EC pattern, which is great to see), but I think we still have issues.
- It will expose the whole file (the minimap is the whole file I think?)
- The single vault holding the block will go offline, so another vault will need to get the other parts and recreate that one. This is a double bad as we a: never trust a single vault and b: cannot trust a single vault will rebuild the file. We cannot trust a single vault even has recreated the block and gave it to the new holder, so that needs a few RPC’s to ack to the group or similar.
You could have vaults recreate the missing chunk, agree on who should keep it and ask the vault selected to ack back to the whole section it got it from the one we decided should give it. It might be a bit more complex, but when this is all happening some client could have asked for the chunk and will need to go elsewhere or we tell him to wait etc. (more bandwidth, slower response etc.).
I see what you are looking at here, but I am not sure it buys anything more than potentially saving space (which is good btw) but I think (I need to write the process in detail) at either the cost of bandwidth and/or possible time lags. The vaults will have to hold maps of minimaps -> holders of the data as well. Then means when a node goes offline with a lot of blocks gone then the section needs a pattern to get those other blocks and recreate those files to get new blocks to store on new vaults and update those maps, possibly while a client is asking for the blocks. So we might need a wait RPC or reply fail, try another block etc. I suspect there are a few more things to consider here though, but I do need to think more.
Ah. I think this is the cause of the confusion. I love the chunks, I truly do; I would not want to do away with them, not ever.
So, no: the minimap is the chunk, not the file. More precisely, it’s the list of the 16 blocks: the 4 primary blocks (that are just the chunk sliced into 4 pieces; I called them primary blocks) and the 12 parity blocks that provide redundancy. Clients always request the 4 primary blocks: they concatenate them to get the full chunk and decrypt them. It would be useful to store the references to these 4 blocks in the file datamap alongside the respective chunks because it would be silly and wasteful to have to download the minimaps for each chunk every time.
The 16 blocks that make up a chunk are stored in different sections based on the XOR distance of the block’s hash, each with a copy of the minimap, which contains the list of the 16 corresponding blocks, which is all the required information to restore the block when it’s lost (that is, there’s no need to even know which chunk the block belongs to.)
- the minimap (16 x 256 bits = 512 bytes, I think?), stored for the chunk and then a copy for each block
- potentially more requests / bytes, but that depends on block and chunk size
It’s also not an absolute requirement that there can be no multiple copies of the same block (trusting one vault is not wise indeed). It would still give a much higher redundancy for the same storage space with non-identical blocks scattered over the network than with simple replication.
Ah thanks for that.
Makes much more sense now. I still see a big overhead in keeping these mapped for the chunks (not files ) So the minimap per section is fine (all nodes will have this plus the map). The 16 sections though where these parts are stored do need to be monitored somehow, this part is the elephant in the room part I was talking about earlier.
If each of these 16 is a single copy in a remote section then how will we know when one vanishes?
Then how do we monitor 16 nodes for every chunk on the network?
I hope you see what I mean a bit more now. (nice ideas here though).
I think you’ve misunderstood … my full statement again:
I wasn’t disagreeing with the first part at all … I was agreeing actually … What I am implying (apparently not very aptly) is that there may be other considerations beyond corruption/loss that may negatively impact … but we cannot know as it’s not tested …
I then went on to say that replication has the same general problem.
One is simpler than the other however and when testing out potential solutions, IMO it’s best to start with the simpler one, before going to the more complex one.
I think it would be technically called MAID-41 . I was thinking more of how to mix just a little bit of parity and replication in order get a lot of ROI with regard to fault tolerance. A Raid 41 scheme ie. the network replicates (8x) copies of a raid 4 chunk set makes the truly paranoid feel safe. Following the self-encryption process, for every normal N self encrypted chunks, a single parity P chunk could be computed using simple XOR and appended to the data map. AFAIK this simple 4N+1P raid 4 scheme at the chunk level can effectively double the network replication effect for a 25% increase in PUT cost and storage space. When the data is read, there is no need to send the parity chunks to the client nor do any extra computation unless all 8 copies of 1 of the chunks in the 4N+1P chunk set were somehow destroyed. I suppose the datamap itself doesn’t benefit though.
The fanboy marketing war.
I may have veered off-topic a bit from the storj comparison. My basic opinion with regard to the OP question is “NO, Storj erasure codes are not better than SAFE replication.” I think it’s more a question of pareto optimality with regard to minimizing computation, bandwidth, and storage complexity/cost for while maximizing data security, privacy, and freedom. The storj folks raise some interesting points about bandwidth being the limiting factor which I think do have some merit. A simple Maid-41 scheme seems like a low cost way to boost effective replication levels for negligible bandwidth increase. A MAID-61 would be efficient on SSE/AVX processors (only uses XOR and bit shifting, analogous to Linux Raid-6) and would seem to offer a 3X multiplier to whatever the base network replication factor is. Since the erasure code in this case doesn’t form the backbone of the network, it doesn’t suffer from all the issues storj will need to deal with. Instead it’s just a worst case scenario insurance policy to help keep things safe.
Yes I would also want that, but we need to monitor every part of a file/chunk/part and I cannot see a way to do that efficiently.
I think people are getting me wrong, EC is more space efficient and I completely agree, but in a hostile network that needs to know when a bit goes missing, the maintenance is huge and I cannot see a fix.
I don’t care who has the fix or best pattern and am glad many folk are looking at in different ways. I think EC is clearly space efficient for storage but costly or impossible to maintain efficiently
It seems we kept talking past each other
If the addresses are 256-bit hashes, it’s 512 byte/chunk, stored at 17 separate sections: the chunk’s own section (though it’s not even strictly necessary!) and the sections that store each of the 16 blocks.
The chunk’s own entry in its own section is solely for retrieving the minimap. If the 4 primary blocks are always (i.e. not just optionally) stored in the client data maps, then the chunk doesn’t even need to be stored on the network. I still favor storing it though.
This is where I’m confused. Why would you think so? To refer to a similar situation, do sections monitor chunks they are not responsible for? Once the blocks are stored by their respective sections, they are handled independently from both each other and the chunk that they encode. When a block is lost, the section responsible for it can just send 4 GET requests for a random selection of blocks from the minimap and the sections responsible for those will send them; there’s no coordination or monitoring in place.
To expand on what I just wrote, this situation is in all important aspects identical to how chunks relate to files. There’s no place in the network that monitors if the chunks that belong to a file are still there, especially because that isn’t a matter of public record to start with. We entrusted the chunks to their respective sections and now it’s their job to store them. It’s the same with the blocks in the scenario I outlined with the exception that the minimaps are public (they don’t disclose anything secret about the chunk.)
Let me note that as you previously wrote it’s not okay to trust a single vault within a section to store a block, so let’s do either two vaults or let’s check up on the vaults whether they actually have the block they are responsible for (I’m sure this is done for chunks as well.) As the blocks are content-addressed, it’s really easy to verify if the returned block is really the block asked for.
There would an important positive consequence of distributing each chunk across 16 sections. To perform a directed attack against a single chunk, the attacker would need to take control of not just one but 13 sections. It would provide unprecedented robustness against such directed attacks, enough so that maybe even the section size could be lowered and still have a higher security than without it. Less nodes in a section would also mean less intra-section communication.
I stand corrected, my apologies.
Yes that is automatic really. The group will all hold the same chunks, one member vanishes and another is given all the same chunks it had from the other members (in SAFE this adult would already have the chunks as it needs them to join teh section). It is a very important aspect, otherwise you need huge meshes of communications across sections and it goes very far down a rabbit hole.
This is the point, how will it know the chunk is lost and then, who
Gets the parts? (we do not trust individual nodes) and then who decides who will store it?
This is the start of the rabbit hole 2 is not enough for a secure quorum.
Who checks, for security the whole section must and come to consensus if there is a fault. Otherwise it is not secure (again).
As I say though this is automatic, we know they had the chunk and we know they must deliver it if they do not another does immediately and that vault is punished, however, the client gets the data, there is no waiting while we reconstitute etc.
That is nice for sure, but maybe beyond what is required if you see what I mean. If a section can be DOS’ed then there is more at stake than losing a chunk.
The security is way beyond data protection numbers though (i.e. replication value), it covers data manipulation (amending existing data, safecoin transaction etc.), so we must secure sections against such attack. Its done by having a load of nodes ready to take place of Elders, merge and more.
It relates to your above point as well really when you look at it.
In the case the section got killed there would be no section responsible for the block anymore, if that makes sense.
I suspect when you push this to the level of security we need that it will be more data, more management, more bandwidth and actually slower. The complexity is much more than it sees at first.
So you take a chunk (in maidsafe terms) then we split into 16 bits. Store each in 16 sections (more management as we need to know when these vanish and replace them). However we have 4 blocks that the client knows the address of (this would be a group address to allow the client to get to the nodes closest to the address and then they can tell which node holds the data and confirm that node sends the data. Otherwise, they need to reply
nack and the client tries the next group, while this one must reconstitute the chunk (so they all need the minimap or similar, that means minimap per group that holds any part). I think it goes much deeper faster and the space savings dimish or actually vanish as the 1Mb chunk now has 8Mb management data in terms of minimaps (if there is a 512b minimap) if you see what I mean.
I may be missing something though, perhaps we can somehow draw out the whole process start to finish of 1 chunk replicated 8 times verses the scheme you are talking about (or even a slightly different version if that is better).
But hey. What if a network does BOTH like RAID 10?!
You don’t understand the draw back of this and you seriously want to create your own network? God help us…
Sorry for getting back so late, I was away for some time.
I think I have misunderstood something about how chunks were stored. As I gather, sections make decisions about things, but they aren’t responsible for storing the blocks as I had thought. Instead, that job belongs to the close groups. In which case my idea doesn’t quite work
The storjv3 whitepaper linked at the top of this topic is fantastic, really a great read. The ideas and design are fun and innovative. I read it right through three times and some sections probably twice that. The bibliography is also full of great material.
I’ve taken some notes about aspects of the whitepaper that relate to SAFE. I could write twice as much again about the specifics of storj but it wouldn’t really belong in this forum.
In summary, Storj is about storage more than about a new internet.
p7 “With an anticipated 44 zettabytes of data expected to exist by 2020 and a market that will grow to $92 billion USD in the same time frame” - ie about data storage (source)
p10 “Cloud computing is estimated to be a $186.4 billion dollar market in 2018, and is expected to reach $302.5 billion by 2021” - ie about cloud computing (source)
Interesting figures, storage accounts for about half to a third of the total value of cloud computing.
p10 “We have found that in aggregate, enough small operator environments exist such that their combination over the internet constitutes significant opportunity and advantage for less-expensive and faster storage.”
I guess these findings are from the prior storj networks, since there’s no source for this. But sounds positive for both storj and SAFE viability.
p11 “Fixed costs are born by the network operators, who invest billions of dollars in building out a network of data centers and then enjoy significant economies of scale. The combination of large upfront costs and economies of scale means that there is an extremely limited number of viable suppliers of public cloud storage (arguably, fewer than five major operators worldwide). These few suppliers are also the primary beneficiaries of the economic return.”
This indicates a problem of all decentralized storage - the competitive advantage that centralized services can gain from economies of scale. This advantage is achieved by their ability to organize themselves efficiently.
Hopefully the SAFE network allows efficient enough organization of decentralized entities that it can provide similar economies of scale but without the centralization.
p13 “decentralized systems are susceptible to high churn rates where participants join the network and then leave for various reasons… Rhea et al. found that in many real world peer-to-peer systems, the median time a participant lasts in the network ranges from hours to mere minutes” (source, for some reason not linked in the whitepaper bibliography)
This churn rate is for an altruistic network, not an economically incentivised network. That would probably make a big difference to the participant behaviour.
Diving deeper into the source for this statistic, section “3.1 Emperical studies” says “Elsewhere we have surveyed published studies of deployed file-sharing networks” which links to this paper that says they present a DHT “able to function effectively for median node session times as short as 1.4 minutes, while using less than 900 bytes/s/node of maintenance bandwidth in a 1000-node system. This churn rate is faster than that observed in real file-sharing systems such as Gnutella, Kazaa, Napster, and Overnet.”
So the short duration time is observed for four different altruistic networks. I don’t think this prior research into high churn rates is necessarily applicable here.
p13 “any distributed system intended for high performance applications must continuously and aggressively optimize for low latency not only on an individual process scale but also for the system’s entire architecture.”
But not at the cost of geographical centralization. A tough balance to meet but one that’s ultimately calculable. It feels like decentralized storage will need to be a two step UX, where the user initially uploads to their ‘closest’ node for best speed and and latency, and the upload appears essentially complete to the user at that time. But in the background the network geographically distributes the data for redundancy (which takes time and should not affect performance from the client perspective). This is just my guess about the future direction of ux for decentralized storage. ‘Uploaded’ will probably come to mean ‘to the nearest point’ rather than ‘as finally distributed’. Like the surface of an ocean vs the undercurrents.
p14 “access to highbandwidth internet connections is unevenly distributed across the world”
I wonder if this assumption will break in the near future. I suspect it may. I suspect networks such as SAFE and storj will be the motivation for the changes that lead to that assumption breaking.
It’s a bit like saying ‘bitcoin works because cpus are evenly distributed across the world’ - well, that assumption broke a few years later because bitcoin itself intivised asic chips and now they’re not evenly distributed as per the original assumption. The network modified the world it exists in.
p15 “…we classify a “large” file as a few megabytes or greater in size”
“The initial product offering by Storj Labs is designed to function primarily as a decentralized object store for larger files.”
“We made protocol design decisions with the assumption that the vast majority of stored objects will be 4MB or larger. While smaller files are supported, they may simply be more costly to store.”
“Users can address this [ie managing lots of files smaller than a megabyte] with a packing strategy by aggregating and storing many small files as one large file.”
“The protocol supports seeking and streaming, which will allow users to download small files without requiring full retrieval of the aggregated object.”
The seeking and streaming is cool. It only adds a little complexity to the retrieval metadata. Could be nice to have a standard for this considering the optimum chunk size in SAFE is 1MB so it will likely want to have a similar packing feature.
I would have to ask why chunks in SAFE are 1MB (and not, say, 2MB or 512KB), and likewise why objects in storj are 4MB or larger (rather than, say, 1MB or larger). This doesn’t seem to be justified via calculations in either network.
I mainly wonder this with respect to possible future bandwidth developments. Will these chunk sizes seem short sighted? Can they be upgraded later? Is the chunk size going to be like IPv4 short-sightedness?
p16 “Note that creating a system that is robust in the face of Byzantine behaviour does not require a Byzantine fault tolerant consensus protocol—we avoid Byzantine consensus. See sections 4.9, 6.2, and appendix A for more details”
Important to understand storj is not really trying to detect malice in a distributed manner. The details get a bit specific to storj so I’ll leave it there.
This difference leads to significant impacts on the structure of the storj nodes and it functions at a different level of trust and security to SAFE. Not necessarily more or less trust and security, just very different.
p17 “To get to exabyte scale, minimizing coordination is one of the key components of our strategy.”
Exabyte scale is a nice target. I’m impressed they have such a tangible goal.
p19 “Storage nodes are selected to store data based on various criteria: ping time, latency, throughput, bandwidth caps, sufficient disk space, geographic location, uptime, history of responding accurately to audits, and so forth.”
“node selection is an explicit, non-deterministic process in our framework. This means that we must keep track of which nodes were selected for each upload via a small amount of metadata”
This is a really important aspect to understand about the storj network and one of the major differences to SAFE.
Clients choose their storage destination (maybe via automatic decision algorithms).
This means the structure of the storj network ends up in two distinct layers - a metadata layer and a storage layer.
SAFE combines both these layers using XOR space.
Because storj has a metadata layer it can more easily track files for repair via erasure coding.
SAFE can’t do it as easily since the file metadata is not available in the first place, and if it were it would be distributed across xor space.
The secure messaging algorithm for traversing xor space makes it much less practical to track and repair files via erasure coding.
For this reason I think erasure codes are fundamentally unsuited to being used at the network layer of the SAFE network. However they may still be useful at the client / app layer.
p19 “provides peer reachability, even in the face of firewalls and NATs where possible. This may require techniques like STUN , UPnP , NAT-PMP , etc.”
Equivalent of the crust project within maidsafe.
I’m not sure the exact intended use of STUN but one thing I’ve always been wary about (from when I was exploring webrtc) is “the protocol requires assistance from a third-party network server (STUN server) located on the opposing (public) side of the NAT, usually the public Internet.” (source). This seems like a potential privacy leak or DOS target etc.
p19 “provides authentication as in S/Kademlia, where each participant cryptographically proves the identity of the peer with whom they are speaking to avoid man-inthe-middle attacks.”
Equivalent of the MaidSafe-DHT project.
p19 “3.4 Redundancy”
p35 “4.7 Redundancy”
p63 “6.1 Hot files and content delivery”
p65 “7.1 Object repair costs”
p69 “7.3 Choosing erasure parameters”
These sections cover the main points being discussed in this topic about erasure codes. Quite cool that they use it at the network layer but I think it isn’t practical for SAFE due to differences in network structure.
I looked at the Blake paper that’s used to justify the redundancy scheme. It’s a great paper with valuable insights and ideas. But it bases the real world examples on altruistic networks rather than incentivised networks - “We apply a simple resource usage model to measured behavior from the Gnutella file-sharing network to argue that large-scale cooperative storage is limited by likely dynamics and cross-system bandwidth — not by local disk space.” (source, for some reason not linked in the whitepaper).
The table on p5 for hardware trends is really interesting. It shows 15 years of data, with disk increasing much more rapidly than bandwidth. Would be good to extend it with the next 13 years of data that have become available since then.
1990 - 60 MB Disk and 9.6 Kbps home access bandwidth
2005 - 0.5 TB Disk and 384 Kbps home access bandwidth
p24 “Encryption should use a pluggable mechanism that allows users to choose their desired encryption scheme.”
Great to have a pluggable mechanism.
p26 “Storage nodes in our framework should limit their exposure to untrusted payers until confidence is gained that those payers are likely to pay for services rendered.”
This is going to be a limiting factor to the ability to scale.
Either scale happens fast and trust is assumed, or scale is slow and trust is earned.
It’s probably not a big deal in real life but I feel this is one of those edges which is ripe for social engineering, causing uproar and damage to confidence due to deliberately negligent trust of payment.
p26 “While we intend for the STORJ token to be the primary form of payment, in the future other alternate payment types could be implemented, including Bitcoin, Ether, credit or debit card, ACH transfer, or even physical transfer of live goats.”
The ‘transfer of live goats’ comment indicates there are out-of-band ways to make payments, so trust is involved.
It’s worth clarifying some missing context - there are two independent payment flows. One from the client and a second to the storage nodes. Client pays with goats [to the middleman] and the storage nodes receive payment [from the middleman] in storj tokens. This relationship is (to my perception) extremely dubious. The protocol is interesting but timing and trust factors seem to present too many edge cases for my tastes.
p28 “Users have accounts on and trust specific Satellites [ie metadata handlers]. Any user can run their own Satellite, but we expect many users to elect to avoid the operational complexity and create an account on another Satellite hosted by a trusted third party such as Storj Labs, a friend, group, or workplace.”
A satelite can be interpreted as part of the user client software or as part of the broader distributed network ecosystem, both are valid. This makes storj both a trusted and a trustless system at the same time, depending how the client uses satelite infrastructure. It’s a really interesting design.
p30 “there are three major actors in the network: metadata servers, object storage servers, and clients.”
This is a good starting point (as well as the related projects GFS and Lustre file systems) for anyone wanting to understand the structure of storj.
p31 “Storage nodes can choose with which Satellites to work.”
Another difference from SAFE. Vaults do not get to choose which parts of the network they interact with. Clients do not get to choose which vaults they interact with. But on storj, clients and storage nodes get to choose which metadata services they interact with.
This has pros and cons, but is getting a bit specific to storj so I’ll leave it at that.
p31 “Storage nodes are not paid for the initial transfer of data to store (ingress bandwidth). This is to discourage storage nodes from deleting data only to be paid for storing more, which became a problem with our previous version.”
Same as SAFE - pay for retrieval (GET) not for storage (PUT). Nice to see some precedent from real life tests on this concept.
p40 “The most trivial implementation for the metadata storage functionality we require will be to simply have each user use their preferred trusted database, such as MongoDB, MariaDB, Couchbase, PostgreSQL, SQLite, Cassandra, Spanner, or CockroachDB, to name a few.”
To me this removes a lot of the benefit of the storage network. Having to track metadata in a trusted non-distributed way is a substantial barrier. The whitepaper has a good list of justifications for the pros (Control, Simplicity, Coordination Avoidance) and cons (Availability, Durability, Trust) of this design, and are actively trying to improve it - “We expect and look forward to new systems and improvements specifically this in component of our framework”. And p64 “we plan to architect the Satellite out of the platform”.
p47 “The second subsystem slowly allows nodes to join the network.”
Would be interesting to do some rough calculations about how much time would be required to reach the goal of exabyte scale based on this slowness aspect.
p77 “B.4 Honest Geppetto. In this attack, the attacker operates a large number of “puppet” storage nodes on the network, accumulating reputation and data over time”
Interesting (and I think preferable) name for what has been labelled “The Google Attack” on the SAFE network.
p79 “The previous version of the Storj network had over 150,000 independently operated nodes”
Valuable bit of insight about the market.
@mav thanks for the information! It was very interesting
Nodes on the Storj network are limited to 1 per processor.
So if you have 10 virtual machines each with 4 processors you have 40 nodes… I personally tested this configuration and it worked for several months …
Thanks @mav for doing all that reading and for writing such a useful and referenced summary. Very interesting to read.
Stepping back from the technical differences, it highlights the difference in motivations between Storj and SAFE.
Storj aims to create a market, by enabling new players to participate in the growing cloud storage market, and that necessitates decentralisation for obvious reasons. Their primary objective has been to deliver that viable market for purely business reasons it seems to me, and they have made technical and pragmatic decisions around trust, participation access, privacy and so on, that deliver a very different product, with very different characteristics.
SAFE has very clear technical goals that David and Maidsafe have worked incredibly hard, and taken the time, to try and achieve with little or no compromise, because the goals demand that, and here the market has been introduced in order to support those goals rather than as an aim in itself. Maidsafe too have business goals, but again, they do not override the underlying values and goals of the Maidsafe Foundation. These all fit together in a more stable configuration it seems to me - much less vulnerable to human weakness and the hostile business environment.
A bit off topic perhaps, but worth considering the context and motivations, because they have a significant impact on technical choices, implementation, and end results.
I suspect so, but linked with increased file sizes also over time. So small chunks now for smaller files, but increase for new files later. This is all doable as the chunk size is irrelevant to the data map (or whatever key set folk use). So having a mix of chunk sizes is OK. The problem is when in the future folk try and upload old data that is already there, but with smaller chunks. This can be mitigated in part with public files, but lost on private files.
Yes STUN like is what we have, so encrypted STUN if you like. It is a relay node (more like introduction node) kinda thing that will know your IP and the recipient IP. So we build this functionality into the nodes themselves, so every node is a potential STUN like server, but they appear and disappear randomly. The routing layer tells the crust layer where they are when they are found.
This is why we talk about secure signaling for webrtc etc. we really mean we secure STUN and obfuscate where those “servers” are. STUN/TURN as specced by IETF are not secure and do leak privacy.
This part I am unsure of, disk size has increased, the issue used to be transfer rates (limited to 30Mb/s) but with SSD and newer bus tech transfer rates are not so much an issue. It was like when IBM came out with 25Mb/s ATM networks and ATM cards for ps2/30 machines. I queried at a conference that the ps2/30 had an 8Mb/s bus so 25Mb/s transfer would not be possible However new bus tech and threaded SSD etc. resolve much of that, but it’#s always worth considering bus speeds.
The increase in disk vs bandwidth seems unresolved though, but again when you are getting chunks or parts in parallel you can serve much faster than both of these numbers. Receipt of them is important though.
I think that also relates to the intended use with filecoin, but not 100%. It is an area I have issues with though. I would love to see this debated on a wider scale as it is probably quite important.
I agree with this, if all projects posted thier own version of the network fundamentals in a short summary then it would be much easier to see/understand the motivation/vision etc. and folk can choose easier.
FYI, interesting info from a few years back.
I’ve only skimmed the material, but one interesting aspect of OceanStore was that they used both replication and erasure codes. Instead of “chunks” they used “fragments”. They proposed floating replicas of active objects, in addition to a “Deep Archival Storage” that was an immutable sub-layer and used erasure codes to boost durability by orders of magnitude in case all replicants were destroyed. They also destroyed active copies that had not been accessed for a long time, leaving only the erasure coded archival copies intact.