Are Erasure Codes (Storj) better than Replication for the SAFE network?

You don’t understand the draw back of this and you seriously want to create your own network? God help us… :sweat:

2 Likes

Sorry for getting back so late, I was away for some time.

I think I have misunderstood something about how chunks were stored. As I gather, sections make decisions about things, but they aren’t responsible for storing the blocks as I had thought. Instead, that job belongs to the close groups. In which case my idea doesn’t quite work :sweat_smile:

2 Likes

The storjv3 whitepaper linked at the top of this topic is fantastic, really a great read. The ideas and design are fun and innovative. I read it right through three times and some sections probably twice that. The bibliography is also full of great material.

I’ve taken some notes about aspects of the whitepaper that relate to SAFE. I could write twice as much again about the specifics of storj but it wouldn’t really belong in this forum.

In summary, Storj is about storage more than about a new internet.


p7 “With an anticipated 44 zettabytes of data expected to exist by 2020 and a market that will grow to $92 billion USD in the same time frame” - ie about data storage (source)

p10 “Cloud computing is estimated to be a $186.4 billion dollar market in 2018, and is expected to reach $302.5 billion by 2021” - ie about cloud computing (source)

Interesting figures, storage accounts for about half to a third of the total value of cloud computing.


p10 “We have found that in aggregate, enough small operator environments exist such that their combination over the internet constitutes significant opportunity and advantage for less-expensive and faster storage.”

I guess these findings are from the prior storj networks, since there’s no source for this. But sounds positive for both storj and SAFE viability.


p11 “Fixed costs are born by the network operators, who invest billions of dollars in building out a network of data centers and then enjoy significant economies of scale. The combination of large upfront costs and economies of scale means that there is an extremely limited number of viable suppliers of public cloud storage (arguably, fewer than five major operators worldwide). These few suppliers are also the primary beneficiaries of the economic return.”

This indicates a problem of all decentralized storage - the competitive advantage that centralized services can gain from economies of scale. This advantage is achieved by their ability to organize themselves efficiently.

Hopefully the SAFE network allows efficient enough organization of decentralized entities that it can provide similar economies of scale but without the centralization.


p13 “decentralized systems are susceptible to high churn rates where participants join the network and then leave for various reasons… Rhea et al. found that in many real world peer-to-peer systems, the median time a participant lasts in the network ranges from hours to mere minutes” (source, for some reason not linked in the whitepaper bibliography)

This churn rate is for an altruistic network, not an economically incentivised network. That would probably make a big difference to the participant behaviour.

Diving deeper into the source for this statistic, section “3.1 Emperical studies” says “Elsewhere we have surveyed published studies of deployed file-sharing networks” which links to this paper that says they present a DHT “able to function effectively for median node session times as short as 1.4 minutes, while using less than 900 bytes/s/node of maintenance bandwidth in a 1000-node system. This churn rate is faster than that observed in real file-sharing systems such as Gnutella, Kazaa, Napster, and Overnet.”

So the short duration time is observed for four different altruistic networks. I don’t think this prior research into high churn rates is necessarily applicable here.


p13 “any distributed system intended for high performance applications must continuously and aggressively optimize for low latency not only on an individual process scale but also for the system’s entire architecture.”

But not at the cost of geographical centralization. A tough balance to meet but one that’s ultimately calculable. It feels like decentralized storage will need to be a two step UX, where the user initially uploads to their ‘closest’ node for best speed and and latency, and the upload appears essentially complete to the user at that time. But in the background the network geographically distributes the data for redundancy (which takes time and should not affect performance from the client perspective). This is just my guess about the future direction of ux for decentralized storage. ‘Uploaded’ will probably come to mean ‘to the nearest point’ rather than ‘as finally distributed’. Like the surface of an ocean vs the undercurrents.


p14 “access to highbandwidth internet connections is unevenly distributed across the world”

I wonder if this assumption will break in the near future. I suspect it may. I suspect networks such as SAFE and storj will be the motivation for the changes that lead to that assumption breaking.

It’s a bit like saying ‘bitcoin works because cpus are evenly distributed across the world’ - well, that assumption broke a few years later because bitcoin itself intivised asic chips and now they’re not evenly distributed as per the original assumption. The network modified the world it exists in.


p15 “…we classify a “large” file as a few megabytes or greater in size”
“The initial product offering by Storj Labs is designed to function primarily as a decentralized object store for larger files.”
“We made protocol design decisions with the assumption that the vast majority of stored objects will be 4MB or larger. While smaller files are supported, they may simply be more costly to store.”
“Users can address this [ie managing lots of files smaller than a megabyte] with a packing strategy by aggregating and storing many small files as one large file.”
“The protocol supports seeking and streaming, which will allow users to download small files without requiring full retrieval of the aggregated object.”

The seeking and streaming is cool. It only adds a little complexity to the retrieval metadata. Could be nice to have a standard for this considering the optimum chunk size in SAFE is 1MB so it will likely want to have a similar packing feature.

I would have to ask why chunks in SAFE are 1MB (and not, say, 2MB or 512KB), and likewise why objects in storj are 4MB or larger (rather than, say, 1MB or larger). This doesn’t seem to be justified via calculations in either network.

I mainly wonder this with respect to possible future bandwidth developments. Will these chunk sizes seem short sighted? Can they be upgraded later? Is the chunk size going to be like IPv4 short-sightedness?


p16 “Note that creating a system that is robust in the face of Byzantine behaviour does not require a Byzantine fault tolerant consensus protocol—we avoid Byzantine consensus. See sections 4.9, 6.2, and appendix A for more details”

Important to understand storj is not really trying to detect malice in a distributed manner. The details get a bit specific to storj so I’ll leave it there.

This difference leads to significant impacts on the structure of the storj nodes and it functions at a different level of trust and security to SAFE. Not necessarily more or less trust and security, just very different.


p17 “To get to exabyte scale, minimizing coordination is one of the key components of our strategy.”

Exabyte scale is a nice target. I’m impressed they have such a tangible goal.


p19 “Storage nodes are selected to store data based on various criteria: ping time, latency, throughput, bandwidth caps, sufficient disk space, geographic location, uptime, history of responding accurately to audits, and so forth.”
“node selection is an explicit, non-deterministic process in our framework. This means that we must keep track of which nodes were selected for each upload via a small amount of metadata”

This is a really important aspect to understand about the storj network and one of the major differences to SAFE.

Clients choose their storage destination (maybe via automatic decision algorithms).

This means the structure of the storj network ends up in two distinct layers - a metadata layer and a storage layer.

SAFE combines both these layers using XOR space.

Because storj has a metadata layer it can more easily track files for repair via erasure coding.

SAFE can’t do it as easily since the file metadata is not available in the first place, and if it were it would be distributed across xor space.

The secure messaging algorithm for traversing xor space makes it much less practical to track and repair files via erasure coding.

For this reason I think erasure codes are fundamentally unsuited to being used at the network layer of the SAFE network. However they may still be useful at the client / app layer.


p19 “provides peer reachability, even in the face of firewalls and NATs where possible. This may require techniques like STUN [29], UPnP [30], NAT-PMP [31], etc.”

Equivalent of the crust project within maidsafe.

I’m not sure the exact intended use of STUN but one thing I’ve always been wary about (from when I was exploring webrtc) is “the protocol requires assistance from a third-party network server (STUN server) located on the opposing (public) side of the NAT, usually the public Internet.” (source). This seems like a potential privacy leak or DOS target etc.


p19 “provides authentication as in S/Kademlia, where each participant cryptographically proves the identity of the peer with whom they are speaking to avoid man-inthe-middle attacks.”

Equivalent of the MaidSafe-DHT project.


p19 “3.4 Redundancy”
p35 “4.7 Redundancy”
p63 “6.1 Hot files and content delivery”
p65 “7.1 Object repair costs”
p69 “7.3 Choosing erasure parameters”

These sections cover the main points being discussed in this topic about erasure codes. Quite cool that they use it at the network layer but I think it isn’t practical for SAFE due to differences in network structure.

I looked at the Blake paper that’s used to justify the redundancy scheme. It’s a great paper with valuable insights and ideas. But it bases the real world examples on altruistic networks rather than incentivised networks - “We apply a simple resource usage model to measured behavior from the Gnutella file-sharing network to argue that large-scale cooperative storage is limited by likely dynamics and cross-system bandwidth — not by local disk space.” (source, for some reason not linked in the whitepaper).

The table on p5 for hardware trends is really interesting. It shows 15 years of data, with disk increasing much more rapidly than bandwidth. Would be good to extend it with the next 13 years of data that have become available since then.

1990 - 60 MB Disk and 9.6 Kbps home access bandwidth
2005 - 0.5 TB Disk and 384 Kbps home access bandwidth


p24 “Encryption should use a pluggable mechanism that allows users to choose their desired encryption scheme.”

Great to have a pluggable mechanism.

MaidSafe is also considering a pluggable hash structure. Variable encryption schemes may be something that can be added to self_encryption or safe_crypto.


p26 “Storage nodes in our framework should limit their exposure to untrusted payers until confidence is gained that those payers are likely to pay for services rendered.”

This is going to be a limiting factor to the ability to scale.

Either scale happens fast and trust is assumed, or scale is slow and trust is earned.

It’s probably not a big deal in real life but I feel this is one of those edges which is ripe for social engineering, causing uproar and damage to confidence due to deliberately negligent trust of payment.


p26 “While we intend for the STORJ token to be the primary form of payment, in the future other alternate payment types could be implemented, including Bitcoin, Ether, credit or debit card, ACH transfer, or even physical transfer of live goats.”

The ‘transfer of live goats’ comment indicates there are out-of-band ways to make payments, so trust is involved.

It’s worth clarifying some missing context - there are two independent payment flows. One from the client and a second to the storage nodes. Client pays with goats [to the middleman] and the storage nodes receive payment [from the middleman] in storj tokens. This relationship is (to my perception) extremely dubious. The protocol is interesting but timing and trust factors seem to present too many edge cases for my tastes.


p28 “Users have accounts on and trust specific Satellites [ie metadata handlers]. Any user can run their own Satellite, but we expect many users to elect to avoid the operational complexity and create an account on another Satellite hosted by a trusted third party such as Storj Labs, a friend, group, or workplace.”

A satelite can be interpreted as part of the user client software or as part of the broader distributed network ecosystem, both are valid. This makes storj both a trusted and a trustless system at the same time, depending how the client uses satelite infrastructure. It’s a really interesting design.


p30 “there are three major actors in the network: metadata servers, object storage servers, and clients.”

This is a good starting point (as well as the related projects GFS and Lustre file systems) for anyone wanting to understand the structure of storj.


p31 “Storage nodes can choose with which Satellites to work.”

Another difference from SAFE. Vaults do not get to choose which parts of the network they interact with. Clients do not get to choose which vaults they interact with. But on storj, clients and storage nodes get to choose which metadata services they interact with.

This has pros and cons, but is getting a bit specific to storj so I’ll leave it at that.


p31 “Storage nodes are not paid for the initial transfer of data to store (ingress bandwidth). This is to discourage storage nodes from deleting data only to be paid for storing more, which became a problem with our previous version.”

Same as SAFE - pay for retrieval (GET) not for storage (PUT). Nice to see some precedent from real life tests on this concept.


p40 “The most trivial implementation for the metadata storage functionality we require will be to simply have each user use their preferred trusted database, such as MongoDB, MariaDB, Couchbase, PostgreSQL, SQLite, Cassandra, Spanner, or CockroachDB, to name a few.”

To me this removes a lot of the benefit of the storage network. Having to track metadata in a trusted non-distributed way is a substantial barrier. The whitepaper has a good list of justifications for the pros (Control, Simplicity, Coordination Avoidance) and cons (Availability, Durability, Trust) of this design, and are actively trying to improve it - “We expect and look forward to new systems and improvements specifically this in component of our framework”. And p64 “we plan to architect the Satellite out of the platform”.


p47 “The second subsystem slowly allows nodes to join the network.”

Would be interesting to do some rough calculations about how much time would be required to reach the goal of exabyte scale based on this slowness aspect.


p77 “B.4 Honest Geppetto. In this attack, the attacker operates a large number of “puppet” storage nodes on the network, accumulating reputation and data over time”

Interesting (and I think preferable) name for what has been labelled “The Google Attack” on the SAFE network.


p79 “The previous version of the Storj network had over 150,000 independently operated nodes”

Valuable bit of insight about the market.

23 Likes

@mav thanks for the information! It was very interesting :slight_smile:

Nodes on the Storj network are limited to 1 per processor.

So if you have 10 virtual machines each with 4 processors you have 40 nodes… I personally tested this configuration and it worked for several months …

6 Likes

Thanks @mav for doing all that reading and for writing such a useful and referenced summary. Very interesting to read.

Stepping back from the technical differences, it highlights the difference in motivations between Storj and SAFE.

Storj aims to create a market, by enabling new players to participate in the growing cloud storage market, and that necessitates decentralisation for obvious reasons. Their primary objective has been to deliver that viable market for purely business reasons it seems to me, and they have made technical and pragmatic decisions around trust, participation access, privacy and so on, that deliver a very different product, with very different characteristics.

SAFE has very clear technical goals that David and Maidsafe have worked incredibly hard, and taken the time, to try and achieve with little or no compromise, because the goals demand that, and here the market has been introduced in order to support those goals rather than as an aim in itself. Maidsafe too have business goals, but again, they do not override the underlying values and goals of the Maidsafe Foundation. These all fit together in a more stable configuration it seems to me - much less vulnerable to human weakness and the hostile business environment.

A bit off topic perhaps, but worth considering the context and motivations, because they have a significant impact on technical choices, implementation, and end results.

9 Likes

I suspect so, but linked with increased file sizes also over time. So small chunks now for smaller files, but increase for new files later. This is all doable as the chunk size is irrelevant to the data map (or whatever key set folk use). So having a mix of chunk sizes is OK. The problem is when in the future folk try and upload old data that is already there, but with smaller chunks. This can be mitigated in part with public files, but lost on private files.

Yes STUN like is what we have, so encrypted STUN if you like. It is a relay node (more like introduction node) kinda thing that will know your IP and the recipient IP. So we build this functionality into the nodes themselves, so every node is a potential STUN like server, but they appear and disappear randomly. The routing layer tells the crust layer where they are when they are found.

This is why we talk about secure signaling for webrtc etc. we really mean we secure STUN and obfuscate where those “servers” are. STUN/TURN as specced by IETF are not secure and do leak privacy.

This part I am unsure of, disk size has increased, the issue used to be transfer rates (limited to 30Mb/s) but with SSD and newer bus tech transfer rates are not so much an issue. It was like when IBM came out with 25Mb/s ATM networks and ATM cards for ps2/30 machines. I queried at a conference that the ps2/30 had an 8Mb/s bus so 25Mb/s transfer would not be possible :smiley: However new bus tech and threaded SSD etc. resolve much of that, but it’#s always worth considering bus speeds.

The increase in disk vs bandwidth seems unresolved though, but again when you are getting chunks or parts in parallel you can serve much faster than both of these numbers. Receipt of them is important though.

I think that also relates to the intended use with filecoin, but not 100%. It is an area I have issues with though. I would love to see this debated on a wider scale as it is probably quite important.

I agree with this, if all projects posted thier own version of the network fundamentals in a short summary then it would be much easier to see/understand the motivation/vision etc. and folk can choose easier.

10 Likes

FYI, interesting info from a few years back.

https://www.researchgate.net/publication/220831946_Efficient_Replica_Maintenance_for_Distributed_Storage_Systems

This paper was written by developers involved with the OceanStore and PlanetLab projects.

I’ve only skimmed the material, but one interesting aspect of OceanStore was that they used both replication and erasure codes. Instead of “chunks” they used “fragments”. They proposed floating replicas of active objects, in addition to a “Deep Archival Storage” that was an immutable sub-layer and used erasure codes to boost durability by orders of magnitude in case all replicants were destroyed. They also destroyed active copies that had not been accessed for a long time, leaving only the erasure coded archival copies intact.

3 Likes

How works reputation in Storj network v.3
https://storj.io/blog/2019/01/reputation-matters-when-it-comes-to-storage-nodes/

I tought that each block is 4MB so if each of 95 erasures blocks have 4MB than they store only145% of original 256MB file ?

Storage Systems Reed Solomon codes


Linux RAID 6 = RS(10,8)
Google File System II (Colossus) = RS(9,6)
Quantcast File System = RS(9,6)
Intel & Cloudera HDFS = RS(9,6)
Yahoo Cloud Object Store = RS(11,8)
Backblaze’s online backup = RS(20,17)
Facebook’s BLOB storage system = RS(14,10)
Baidu’s Atlas Cloud Storage = RS(12, 8)

References:
H. Dau et al, “Repairing Reed-Solomon Codes with Single and Multiple Erasures,” ITA, 2017,
San Diego.
P. Vijay Kumar, “Codes for Big Data: Erasure Coding for Distributed Storage” The 3rd Annual Storage Developer Conference Bengaluru May 25-26, 2017

Anywhere I can go read into technical detail(also an ELI5 version too for newbs) about how replication will work in an active safe network for mvp? I have bene reading SIA is facing many issues with hosts only being around a short while so the replication goes below 1(meaning not enough hosts have data) and people are losing their files. In a decentralized network such as SAFE, how will the network recognize nodes that have gone away and replicate the chunks that node stored somewhere else to maintain high replication factor on the network. And the other question is how quickly can the network detect and remediate(re-replicate) data on these nodes that decide to perma disappear because I think we will see a lot of this occurring early on in the network with short lived interest, I would say expect as much as 60-70 % of nodes to drop off in given times likely(new nodes will come but will need to fully expect large amounts of churn, maybe not all at once but still large churn as a % over days/weeks/months). Or is this another one of those unsolved questions(and a big one at that if so :stuck_out_tongue: )?

2 Likes

Gamification on SAFE is much better than SIA. The following things exert strong pressure not to exit:

  • to enter the network you are waiting for n-amount of time
  • to get money - you’re waiting to become an adult
  • if you are of the first participants on the network you are more likely to get parts of valuable popular things (all movies and music created to date will be uploaded in the first few months without a shadow of doubt)

So I’m sure there are many people like me who have very well considered their strategy to get the most out of the newly created SafeCoin…

Edit: I had a few SIA nodes and a Obelisk ASIC SC1 miner and I can guarantee you that SIA is a sad story compared to SAFE…

4 Likes

Child nodes are also farming

It is infant nodes that do not farm.

4 Likes

I agree with what you are saying around the technical incentives to keep a node up from farmer perspective, they are nice and all and help somewhat. I want to drop that for a minute and consider what would happen on the network when say in a given week 30-40% of the nodes churned(where one node was considered good but perma left, and a different person decided to join the network which essentially maintains total network capacity never changed). I would like the network to be somewhat resilient enough to actually be smart and say I have detected ____ nodes that have this bit of data, this is ___ less that I had before and under the ___ threshold safe considers good. Obviously this level of reconciliation can’t happen globally all the time or the network otherwise it would be too bogged down with trying to maintain data replication at all times. Or maybe by design the network can’t really figure out how many nodes share a piece of data, in which case just hoping that just the incentives in place to attempt to keep farmers using the network will be sufficient is a false premise I think. I bet in the wild that would come back to bite you during a crypto winter or if people are not really making enough “income” from being a farmer. IMO there has to be a way that even if we went to 50% of the farmers safe network saw, the network checked on replication and still tries to move files around accordingly so the farmers that remain can take the extra burden from nodes that left. Finding a balance around how often that has to occur I think is the key to a quality decentralized storage network(one aspect of how safe can be utilized).

3 Likes

This concept helps, but my understanding is that sections are fluid in nature too where nodes may enter and leave a section as well. Will nodes that enter an existing section have data pushed on them that is shared among other section nodes if those older nodes leave the section or cut off then? And when a node moves from one section to another does the section they just joined push the data shared amongst the section onto the new node too or will the node that left the prior section drop its old data(and how does this work with safecoin because I thought puts are directly what cause storage growth, what accounts for re-replicating existing data since thats outside the scope of a direct put action and more of a network reconcile feature)? As long as maidsafe feels really good about existing design lets see how it plays out :slight_smile: . I could sit and ask questions about this network all day long probably heh because every high level feature has so many edge cases and underlying requirements and things that must be satisfied to be stable and well engineered.

2 Likes

As indicated by @Antifragile, the invariant that is maintained in the network is that each data chunk is stored in 8 nodes. It is easier to reason by referring to this invariant.

These nodes are the 8 nearest nodes to the address of the chunk and they necessarily belong to the same section (this a property of XOR distant). When a node leaves a section, it is not expected to manage its old chunks anymore. Copies of chunks it used to manage is added to remaining nodes in the section.

This is the reason why when a section falls below 8 nodes it is merged with its neighbor: keep a minimum of 8 nodes in a section that so that it can maintain this invariant locally in the section.

When a node joins a section, it receives some chunks, but not all the chunks managed by the section, only those that are near the new node (when the node is among the 8 nearest nodes of the chunk).

5 Likes

These comments in the source code clarify the intended behaviour for chunk redundancy (which is the responsibility of the Data Manager Persona of the vault).

Worth reading the whole lot but here’s some snippets

[Data Managers are] responsible for chunks whose names are close to it in the network address space.

[Data Managers that have recently joined or relocated] will also try to retrieve any chunks for which it is responsible from any close peer which is not currently busy

Chunk replication to this Vault continues repeatedly until it holds all chunks for which it is responsible.

Note these comments were last updated 3 years ago (July 2016) so still reference the old Structured Data which is now replaced by Mutable Data (and possibly in the future replaced again by Appendable Data), but the mechanisms described by the comments seem to still be accurate.

I think ‘the incentive structure’ is perhaps a more accurate / stronger phrase than ‘gamification’. And the incentives are the key for any permissionless network to succeed so you’ve definitely picked the right point to focus on :slight_smile:

My understanding of the age categories were infant, adult, elder (outlined in the first post of the dev topic on Data Chains - Deeper Dive) - no child category.

To add some links for those that want to dive deeper, in the code this is defined as min_section_size. See the declaration in routing and the usage in vaults for determining which vaults are part of the close group.

7 Likes

They have been mentioned, maybe incorrectly by David, but David saw the child node as a very important part

2 Likes

My bad I have done that in the past child == infant to me really. Age takes over so we will have younger Adults as younger elders but only age determines that.

4 Likes

I don’t know how SIA works or who it’s audience is, so I might be off-base here - perhaps SIA has the same target audiences/users and it’s still not enough… but I imagine that on Safe, the big uploaders will themselves be providing a lot of storage to the network - you wouldn’t go through the trouble as a business to provide a lot of content only to lose it down the track - you’d work to insure that there will be enough storage.

I wonder what kind of tools will be available to get metrics on the network’s various capacities once it’s going. Anyone have any ideas on that? Being such a private anonymous system, I think some metrics might be difficult to ascertain.

2 Likes

Let’s hope we can find a copy of this somewhere! That’s not how the Safe Network does it. The network always knows which chunk is stored by whom, so when one too many of those vaults goes offline, it just creates a new copy.

5 Likes