Farming Pools and InfiniteGB Nodes

Iirc it’s a defense mechanism. Consider an adversary that spins up 1M vaults. If those were all accepted immediately, the might could swamp a network section. Instead they go in a waiting pool. This gives time for many other potential vault operators to join the waiting pool/queue. When the network actually needs a new vault resource enough time has passed for the pool to have accumulated another 1M non-malicious nodes. The probability that it will randomly select a malicious node from the pool is now only 50% instead of 100% under the direct join scenario.

4 Likes

I thought that’s the case to make the network resilient against vaults trying to game the network by storing no data at all. By making GETs when they need to serve a GET themselfs and acting as a mere “proxy” to other nodes.

1 Like

This pokes at a lot of different ideas…

Let’s say there’s no join limit. Anyone can join any time.

This doesn’t mean join is instant. It still takes time to redistribute chunks to the new nodes.

Let’s say a section would split if the two new sections would have 100 nodes each (so probably splitting when there are about 200-250 nodes). If there’s a section with 150 nodes, then suddenly 1000 new nodes join the section all at once, should the section split now then redistribute chunks, or should it split only after all chunks have been redistributed?

I’m not going for a binary ‘this or that’ answer on this, I’m just trying to conceptualize the relationship between chunk stability and section membership stability. When is a node “a node”? When is a node “queued”? When is a chunk “stored”? When is a chunk “at risk” or “lost”?

Looking at the waiting pool idea by @jlpell, I’ll call it a queue, would I be able to get a hundred new nodes from my laptop into the queue? Or a million? Or only a few? What’s the limit? Presumably being ‘in a queue’ is not a resource intensive action, otherwise it’s not queuing, it’s joining. Having more nodes in the queue increases my chance to be selected for the next join. So I’m not really too clear on how the queue mechanic would function as opposed to simply joining. Maybe I’m not looking carefully enough into the queue and there’s a simple way for it to work?

(If we call the queue a membership pool we can stir up some confusion by calling it mempool, which in bitcoin is short for memory pool! No let’s not do that :slight_smile: )

A disallow rule or a join limit etc, it’s sorta naturally going to happen anyway since chunk redistribution isn’t instantaneous. But a disallow rule is also sorta naturally not going to happen because resources spent managing a queue is a wasteful-type-of-joining.

Perhaps this all just adds a lot of mud to an already murky pool…


Let’s look at the idea of chunk redistribution when new nodes join, since this seems like a key part of whether or not new nodes are disallowed.

Maybe chunks do not need to be redistributed? This is sorta natural anyhow since if vaults can be full then chunks must be ‘near’ their xor address, not ‘closest’ to it.

We could have nodes be part of the network without them having all the closest chunks. The new node missing a chunk would not be surprising since it happens anyway with full nodes. (I find this idea unappealing but it seems like a necessary consequence of allowing full nodes; I prefer the strictness of all redundant chunks being at the actual closest nodes, not just nearby).

This raises a question of how new nodes might be filled. Some options for how to do the filling:

  • The node doesn’t store any historical chunks, only new chunks, and fills up as new PUTs arrive. There is no redistribution process when a node joins (redistribution only happens when nodes depart or sections split). I’m not sure if this is feasible or not, I’d need to explore it further.
  • Elders give the new node a list of all the close chunks to that node (ie all the chunk names that the node is required to store to satisfy the minimum chunk redundancy). The node is responsible for filling itself by doing a normal GET for all those chunks. Periodic audits ensure the vault has done the work and the level of redundancy is correct.
  • Nearby nodes could push chunks to the new node, rather than have the new node pull them.
  • Maybe some other ways are possible?

Zooming way out, it seems like the disallow rule and join queue etc originate from ‘responsibility for redundancy’. How can the network ensure redundancy and detect failures of redundancy? If joining is too rapid the degree of redundancy becomes unclear and might put data at risk when chunks are poorly distributed (there could be any number of reasons for chunks being poorly distributed - maybe lots of nodes are suddenly departing, maybe some nodes become overloaded and laggy, maybe redirection due to full nodes becomes extreme).

It feels to me like elders are in the best position to manage the redundancy. Maybe that’s not necessarily true, perhaps redundancy can be managed in a less strict or controlled way?

Maybe instead of looking at the potential damage from farming pools and how to avoid the damage, maybe we can say if farming pools are a natural consequence of unavoidable network friction how can we incorporate them into an intentional network mechanism (btw I don’t think farming pools are inevitable).

6 Likes

Don’t attribute this one to me :sweat_smile:. It’s just my current understanding of what the intent was for a defense mechanism based on bits and pieces I picked up here on the forum from dirvine’s descriptions and others. The ‘queue’ might just be nodes at age zero, but I’m just guessing.

1 Like

This is actually how it works now, since
earlier this year.

I’m also a bit ambivalent to it. On one hand chunks are not at the closest node, on the other hand there’s no need to sync data when joining.

Wrt join queue, naively it looks to me like the rate of inflow to network is controlled, but it does not prevent the queue from being flooded, which means the network is deprived of inflow of good nodes. Anything it adds will be an attacker.
I haven’t looked at that area though, for detailed solutions.

6 Likes

The idea of allowing new nodes only when X% of nodes are full seems like it could possibly give us some grief. Filecoin has provided us with a very useful experiment demonstrating why there may be some troubles.

In the first 50 days of the filecoin network ~1 PiB has been stored (source). ~1 EiB is available for storage (source).

There are currently 786 filecoin nodes.

This would give 1.3 TiB storage per node (1024 TiB / 786, assuming the 1 PiB figure includes all redundancy).

745 out of 786 (94.8%) of filecoin nodes have more than 1.3 TiB of storage (there’s a list of node sizes here).

217 out of 786 (27.6%) of filecoin nodes have more than 1 PiB storage and could store the whole of filecoin data.

If we took filecoin storage distribution as it is now and applied Safe Network rules to it, it would take a very long time before any new nodes would be allowed to join.

There’s some considerations for this comparison though…

Storage on Safe Network could be cheaper than filecoin so it would fill the spare space faster and reach an equilibrium sooner. This is fine, I accept the reasoning, but filecoin is already usually 20x cheaper than major cloud storage, so why does filecoin see only 0.1% storage utilisation? I’m not convinced that being cheaper means we’ll achieve better distribution.

If the top filecoin nodes broke their single massive nodes into many smaller ones they would have most of them not allowed onto the network. I’m not exactly sure how the logic goes, but with 99.9% unused space virtually all nodes on the network can continue to accept new data for a very long time. Doing a maybe over-simplified analysis, let’s say 0.1% full in 50 days means it would take another 50,000 days to fill the remaining 99.9% of storage (assuming no additional storage came online). That’s 136 years of spare capacity.

Filecoin has separate mechanisms for storage space and uploading and pricing, which allows a lot of spare space to come online very quickly. Safe Network doesn’t have this, it links pricing and spare storage space and uploading all in together. So I’m not sure how the difference in pricing / storage / utilization functions for these two networks will show themselves in the real world.

We’re really lucky that filecoin has shown us the utilization rate. If we’d only seen the uploads of 1 PiB in 50 days we’d say ‘nice work filecoin’ but we are lucky to also be able to see 1 EiB of unused space which gives us some real head scratching to do. In our network we won’t get to see how much spare space there is, so we have no idea how long it might be until the 50% full nodes mark will be reached.

My main worry is (to use an exaggerated example) if we end up with the top 10 nodes of filecoin as the first 10 nodes in Safe Network (between 23 PiB and 71 PiB in size) we’ll be waiting a very long time for new nodes to be allowed to enter the network because it would take a very long time to fill 50% of those nodes.

Should we be worried about the huge amount of spare storage out there hindering growth and node membership? I’m not sure but filecoin makes me feel we should consider things carefully, they have a really huge amount of spare space.

13 Likes

This data supports what we see in other decentralized storage networks.

There is no demand from end users for such product. There is no demand from business customers for such product.

Of course we have a better product and we will show much better results. But there is one serious but. Our results will not change the fact that there is no demand for such a product from millions of users.

This means that we have to prepare for years if not decades before our network replaces the old internet. During these years, the greatest danger will not be someone putting big farms into our network…

The biggest danger will be someone making a copy of our network without us. As Mav said, the unique information in our network is our tokens. Copies of Safe will try to steal the value stored in the tokens.

How?

An easy way is by using part of the inflation in their network to allow specific groups of people to upload for free - YouTube content creators, Spotify copies, etc. This will allow their networks to be used by people and attract new farmers.

If we want to be competitive, we must provide an option to use part of our inflation for free uploading of data in our network so that new farmers can be attracted to us and not to the foreign networks.

2 Likes

The other unique thing is the personal data we all will be keeping about ourselves that cannot be copied to a new network by any copycat.

This is your

  • personal backups
  • App data. Such things as game, assignments, documents, preferences, ledgers, etc
  • Attached (mounted) drives that is actually the Safe Network data. This is useful for using different devices yet having the same “drive” attached/mounted. No longer is it a limited sized USB drive you have to lug around and mount on one device at a time. But it is one that is open ended in size limited only by tokens to store more data.

It is this sort of data that APPs (including programs used on PCs/apple now) will be storing in your private area.

This cannot be copied by a 3rd party

7 Likes

Of course, but this information is not expensive and can easily be copied to another network.

Even more so, if as the data show, it is so small in size and therefore cheap to copy by the owner…

2 Likes

Would it help to limit the maximum volume of a vault? In the beginning it could be like 100GB - just to throw a number there, I don’t know what would be good in reality. Then it could be increased in proportion to the network size so that we can allow larger nodes later.

I’m not sure if this is effective, since the storage seems to be so cheap, that the price of upload may not be significant factor in any case. Or maybe it is, but I’d like to see some calculations first. On the other hand, the act of paying for uploading is in itself going to be a some degree of a barrier, so if you could make that barrier invisible, then it could have real effect, I suppose. I think free vs. very cheap is in this case way bigger difference than very cheap vs. very expensive.

Also I think that Safe Network should not be marketed as a (perpetual) storage in the beginning. At least I personally would not do it, just because it will take some time to see if this thing flies or not. I defenetily would not recommend it to anyone as an “only backup you will ever need”.

From technical point of view the storage aspect is the main thing, but as a use case for the end user, it is not, at least not in the beginning. (Well, OK the data has to be somewhere, but I hope you know what I mean. I don’t consider the server holding my homepage files as “storage”, in any serious capacity.)

By the way @mav, I agree that those Filecoin numbers sound huge, but I don’t really have anything to compare them with. Would you care to dig some numbers, maybe something like volume of torrents, volume of whole internet, or volume of onion sites…

3 Likes

Yes. And that’s why I don’t like this modification at all.

I believe that nodes should not be left alone to accept or not accept new nodes.

1 Like

Brain-dump incoming :laughing:

I think this is a good point that’s sort of tricky to navigate. It’s also not going away anytime soon, so it’s worth considering. Decentralized systems have a tendency to cluster around a handful of nodes in general that sort of act as cornerstones of the net. The internet is one such example where this happened. It’s decentralized (although not anonymous), and in a short time it became dominated by only a handful of websites. Safe, which intentionally blocks new nodes from entering the net after a certain number of nodes enter, actually exacerbates this quality, because, like you said, especially when its starting out, it’s “vulnerable” to a very large node joining, dominating the storage supply, and not letting anybody else in for a while.

Thinking in hindsight, there’s also the added danger of “side-channel” attacks for large nodes. Essentially, by monitoring general activity level of the node, you can deduce something about the chunks its holding. For many small, distributed nodes, this is less of a problem because of caching and you would need to monitor many nodes at once. But, with one large node, the attack is easier to carry out.

Anyway, whether or not this is tolerable, especially when the net is small, is perhaps up for debate, but I think there are several ways to handle this:

Work with Pools/Super Nodes, Instead of Against Them

Instead of fighting it, provide that the existence of super nodes and farming pools fill a different “ecological niche” than your average individual node to strengthen the net instead of making it more fragile/exclusive.

Quick example: maybe nodes store “cold” and “hot” data (by that I mean, seldom-requested and often-requested data, respectively) in proportion to how large the node is. This means very large nodes are not being relied on as much day-to-day and don’t bottleneck services for one. Second, it means diminishing rewards on each additional GB of data provided, since the more you provide, the less likely it is you’re going to make a profit on supplying that data past initial storage (e.g. you will generally have less farm attempts per new GB of data). This way, we transform super nodes into a sort of internet cold storage. The data they would hold is redundant and “settled,” there’s no danger to them going offline suddenly, it still allows newcomers opportunity for successful farm attempts, and allows nodes which aren’t in a pool to still prosper (as opposed to say, BTC, where an individual not in a pool has no hope). It would disincentive, but not eliminate centralization, while not harming individual nodes. Then we recruit nodes based on, not just space available, but space available for frequent access and space available for cold storage, so as not to bar new nodes from entering as well.

Similar such divisions exist for perhaps meta-data and BLOBs, etc. Although, “hot” and “cold” is a pretty good one since the data held is still entirely anonymous.

Ignore It

LIke I mentioned above, maybe just ignore it. In the early stages especially, it might be inevitable growing pains. When your user base is small, perhaps we can’t afford to be choosy. I don’t like this approach necessarily, but its low in complexity, and we can acknowledge the problem without addressing it immediately, and instead save this battle for a future date. Of course, perhaps by then it’s also too late.

Explicitly Disallow It

It’s not unreasonable for Safe to straight-up cap the node size to a certain percentage of the (estimated) network storage. This would make super nodes impossible, and pools would not be rewarding given that data is randomly distributed anyway, it doesn’t really help the individual pool members. This is simple(ish) to implement, and allows the amount of storage to track the total storage a little better. The downside is that, if the supply of nodes is far outstripped by the demand, this kind of system could exacerbate the problem. It would be a balancing act to manage the per-node cap such that no node holds too much, but that there is still enough storage provided on the network. I don’t know about the long-term viability of this, but it’s worth a mention.

2 Likes

Side note here, (correct me if I’m wrong somebody) but my impression is that this isn’t exactly what this patch is doing. Judging from a (very) quick read-through, this is just the facility that routing provides to flip this. This patch doesn’t indicate anything about the consumer’s behavior (e.g. how sn_node is going to use this).

I’m basing that off of this quote from maddam from the discussion:

…my impression is that the elders will track all the adults in the section and when they detect the average storage capacity (or some other resource) is too low, they will flip the flag to true so the section starts accepting new nodes.

In essence, only elders, nodes with some degree of proven trustworthiness will make this decision. Further, if one rogue elder decides to randomly flip this for their own devices, it doesn’t mean the other nodes in the section will do so, nor are they forced to interact with the newly joined node, since section joins need to be approved by the section iirc. Redundancy aside, the network is fault-tolerant to something like this.

1 Like

But only when they detect that the nodes are getting saturated.

the elders will track all the adults in the section and when they detect the average storage capacity (or some other resource) is too low, they will flip the flag to true so the section starts accepting new node

If, as in filecoin, the nodes are too large, it could delay the entry of new nodes and increase, dangerously, the concentration.

2 Likes

I think it’s useful to do these exercises and also easy to get caught up in and start over thinking them. Safe is not just storage so we shouldn’t make too much of these comparisons.

Even if it was, Safe’s model is radically different from all of them, so again I don’t think comparisons are good predictors, and if they’re not good predictors, they’re not good input into design. I think modelling is better, though also not that reliable because we don’t have good models.

I’m not saying we shouldn’t do these exercises, or that we shouldn’t think about design tweaks, rather I’m saying it is not that important to make this or that design change now. The thinking helps us think about this now and come up with our best guess, and more importantly will help us think about what’s actually happening after launch, or during late tests.

4 Likes

These competing networks have paid millions and millions of dollars to make working products and show what the state of the market is. I think they are very useful as information and I am grateful to @mav for taking the time to extract the data.

2 Likes

I am not sure here, maybe in some aspects. It takes only few hour to run virtualization and pretend to have 1000 small 100GB vaults instead of one big 100TB vault.

I think this shouldnt be done on network level, apps can pay for the upload so it is free for users.

No data on torrents, but few interesting numbers:

  • In 2013 Neflix complete dataset was 3,14 PB
  • YouTube video storage is over 1 EB (Exabyte)
  • Smart devices (for example, fitness trackers, sensors, Amazon Echo) produce 5 EB of data daily. Only small fraction of that is stored.
  • People share more than 100 TB of data on Facebook daily. Every minute, users send 31 million messages and view 2.7 million videos.
5 Likes

I think that this is not up to network, but all of us must do something to support content on Safe network. We can make App with free uploading of public content or upload it myself or just support development of some Apps. And maybe we will see some business model which will work and fill free space.
Anyway with very cheap PUT price the value of uploaded content is much bigger when there is no expiration.

4 Likes

I am personally optimistic. The biggest danger was the lack of an ERC20 token to give us access to the DEX world. With such a token, our access to the market cannot be stopped. I would say that uploading data to Safe is not a serious problem. It can be solved on the go. I personally plan to use everything my farms earn to upload to the network. I’m sure I won’t be the only one. This is probably even enough so that we do not have a problem with the other Safe copies…

:jeremy:

5 Likes

Yes, but I think there are other tasks for a node than just storing the data. I mean verifying signatures, and whatever processing there might be. That would increase the workload of many nodes vs. single node with huge storage.

2 Likes