Farming Pools and InfiniteGB Nodes

digipl · December 8, 2020, 11:00am

Yes. And that’s why I don’t like this modification at all.

I believe that nodes should not be left alone to accept or not accept new nodes.

Scorch · December 8, 2020, 11:32am

Brain-dump incoming

I think this is a good point that’s sort of tricky to navigate. It’s also not going away anytime soon, so it’s worth considering. Decentralized systems have a tendency to cluster around a handful of nodes in general that sort of act as cornerstones of the net. The internet is one such example where this happened. It’s decentralized (although not anonymous), and in a short time it became dominated by only a handful of websites. Safe, which intentionally blocks new nodes from entering the net after a certain number of nodes enter, actually exacerbates this quality, because, like you said, especially when its starting out, it’s “vulnerable” to a very large node joining, dominating the storage supply, and not letting anybody else in for a while.

Thinking in hindsight, there’s also the added danger of “side-channel” attacks for large nodes. Essentially, by monitoring general activity level of the node, you can deduce something about the chunks its holding. For many small, distributed nodes, this is less of a problem because of caching and you would need to monitor many nodes at once. But, with one large node, the attack is easier to carry out.

Anyway, whether or not this is tolerable, especially when the net is small, is perhaps up for debate, but I think there are several ways to handle this:

Work with Pools/Super Nodes, Instead of Against Them

Instead of fighting it, provide that the existence of super nodes and farming pools fill a different “ecological niche” than your average individual node to strengthen the net instead of making it more fragile/exclusive.

Quick example: maybe nodes store “cold” and “hot” data (by that I mean, seldom-requested and often-requested data, respectively) in proportion to how large the node is. This means very large nodes are not being relied on as much day-to-day and don’t bottleneck services for one. Second, it means diminishing rewards on each additional GB of data provided, since the more you provide, the less likely it is you’re going to make a profit on supplying that data past initial storage (e.g. you will generally have less farm attempts per new GB of data). This way, we transform super nodes into a sort of internet cold storage. The data they would hold is redundant and “settled,” there’s no danger to them going offline suddenly, it still allows newcomers opportunity for successful farm attempts, and allows nodes which aren’t in a pool to still prosper (as opposed to say, BTC, where an individual not in a pool has no hope). It would disincentive, but not eliminate centralization, while not harming individual nodes. Then we recruit nodes based on, not just space available, but space available for frequent access and space available for cold storage, so as not to bar new nodes from entering as well.

Similar such divisions exist for perhaps meta-data and BLOBs, etc. Although, “hot” and “cold” is a pretty good one since the data held is still entirely anonymous.

Ignore It

LIke I mentioned above, maybe just ignore it. In the early stages especially, it might be inevitable growing pains. When your user base is small, perhaps we can’t afford to be choosy. I don’t like this approach necessarily, but its low in complexity, and we can acknowledge the problem without addressing it immediately, and instead save this battle for a future date. Of course, perhaps by then it’s also too late.

Explicitly Disallow It

It’s not unreasonable for Safe to straight-up cap the node size to a certain percentage of the (estimated) network storage. This would make super nodes impossible, and pools would not be rewarding given that data is randomly distributed anyway, it doesn’t really help the individual pool members. This is simple(ish) to implement, and allows the amount of storage to track the total storage a little better. The downside is that, if the supply of nodes is far outstripped by the demand, this kind of system could exacerbate the problem. It would be a balancing act to manage the per-node cap such that no node holds too much, but that there is still enough storage provided on the network. I don’t know about the long-term viability of this, but it’s worth a mention.

Scorch · December 8, 2020, 11:47am

Side note here, (correct me if I’m wrong somebody) but my impression is that this isn’t exactly what this patch is doing. Judging from a (very) quick read-through, this is just the facility that routing provides to flip this. This patch doesn’t indicate anything about the consumer’s behavior (e.g. how sn_node is going to use this).

I’m basing that off of this quote from maddam from the discussion:

…my impression is that the elders will track all the adults in the section and when they detect the average storage capacity (or some other resource) is too low, they will flip the flag to true so the section starts accepting new nodes.

In essence, only elders, nodes with some degree of proven trustworthiness will make this decision. Further, if one rogue elder decides to randomly flip this for their own devices, it doesn’t mean the other nodes in the section will do so, nor are they forced to interact with the newly joined node, since section joins need to be approved by the section iirc. Redundancy aside, the network is fault-tolerant to something like this.

digipl · December 8, 2020, 11:58am

But only when they detect that the nodes are getting saturated.

the elders will track all the adults in the section and when they detect the average storage capacity (or some other resource) is too low, they will flip the flag to true so the section starts accepting new node

If, as in filecoin, the nodes are too large, it could delay the entry of new nodes and increase, dangerously, the concentration.

happybeing · December 8, 2020, 12:24pm

I think it’s useful to do these exercises and also easy to get caught up in and start over thinking them. Safe is not just storage so we shouldn’t make too much of these comparisons.

Even if it was, Safe’s model is radically different from all of them, so again I don’t think comparisons are good predictors, and if they’re not good predictors, they’re not good input into design. I think modelling is better, though also not that reliable because we don’t have good models.

I’m not saying we shouldn’t do these exercises, or that we shouldn’t think about design tweaks, rather I’m saying it is not that important to make this or that design change now. The thinking helps us think about this now and come up with our best guess, and more importantly will help us think about what’s actually happening after launch, or during late tests.

Dimitar · December 8, 2020, 12:30pm

These competing networks have paid millions and millions of dollars to make working products and show what the state of the market is. I think they are very useful as information and I am grateful to @mav for taking the time to extract the data.

peca · December 8, 2020, 12:44pm

I am not sure here, maybe in some aspects. It takes only few hour to run virtualization and pretend to have 1000 small 100GB vaults instead of one big 100TB vault.

I think this shouldnt be done on network level, apps can pay for the upload so it is free for users.

No data on torrents, but few interesting numbers:

In 2013 Neflix complete dataset was 3,14 PB
YouTube video storage is over 1 EB (Exabyte)
Smart devices (for example, fitness trackers, sensors, Amazon Echo) produce 5 EB of data daily. Only small fraction of that is stored.
People share more than 100 TB of data on Facebook daily. Every minute, users send 31 million messages and view 2.7 million videos.

Mendrit · December 8, 2020, 12:55pm

I think that this is not up to network, but all of us must do something to support content on Safe network. We can make App with free uploading of public content or upload it myself or just support development of some Apps. And maybe we will see some business model which will work and fill free space.
Anyway with very cheap PUT price the value of uploaded content is much bigger when there is no expiration.

Dimitar · December 8, 2020, 1:04pm

I am personally optimistic. The biggest danger was the lack of an ERC20 token to give us access to the DEX world. With such a token, our access to the market cannot be stopped. I would say that uploading data to Safe is not a serious problem. It can be solved on the go. I personally plan to use everything my farms earn to upload to the network. I’m sure I won’t be the only one. This is probably even enough so that we do not have a problem with the other Safe copies…

Toivo · December 8, 2020, 1:27pm

Yes, but I think there are other tasks for a node than just storing the data. I mean verifying signatures, and whatever processing there might be. That would increase the workload of many nodes vs. single node with huge storage.

Traktion · December 8, 2020, 6:45pm

So, storage will be very cheap until this changes.

See above.

I don’t see why inflation is needed. If there is loads of uber cheap storage, users will get a great deal.

Moreover, allowing site owners to subsidise uploads has been discussed in the past. If people really want to sell their data privacy and security, for the sake of pennies, that option will also be available.

Personally, I pay for amazon photos and google drive for extra email storage. If I could pay a network directly instead and gain privacy and security, I would gladly.

Copying private, encrypted, data does not mean that the users will follow it to another network. Unless you have nodes on every section of the network, you aren’t going to get a complete copy either. Obviously, there is no way to identify what belongs to who either, so all the data would need to be copied.

Traktion · December 8, 2020, 6:56pm

Would the owners of huge spare storage want to add it to Safe Network when there was so little demand for it? Would the return on investment be worth it?

If farmers decide to throw petabytes at Safe Network, storage costs will remain super low for a very long time. This seems to be in the interest of users, who will benefit handsomely from super low upload prices. If those farmers up and left, those prices would start to rise until it made sense for farmers to stick around.

I suppose it is an attack vector though, where some nodes could seek to control so much of the network data that they could take it down. Until the data footprint becomes too large for big players to attempt this, the network will be somewhat vulnerable. If the initial network is seeded with known good nodes to start with (Maidsafe and community), it would make it harder to do this sort of exploit.

Bogard · December 8, 2020, 7:41pm

I was wondering about this too actually. Adding to your point, another aspect is that section splits (and thus network size) will no longer be determined by the number of nodes in the section as ostensibly indicated in the code but rather by the amount of data being stored by the section. What implications would this change have on network security? But overall, given that @maidsafe is running internal tests, I’m sure they’re on top of these considerations. Would be great to learn more though.

mav · December 8, 2020, 10:18pm

Filecoin’s 1 PiB in 50 days is hardly no demand. I’m nitpicking for sure but let’s not completely dismiss it. Even if it’s 90% test data, 100 TiB of non-test-data in 50 days is still a very good result. Wikipedia is 20 GiB (source), so 100 TiB is a significant amount of data.

Top 100 all pirate bay torrents have a total size of 212 GiB

Top 100 music pirate bay torrents have a total size of 49 GiB

Top 100 video pirate bay torrents have a total size of 207 GiB

Here’s the one-liner to paste into the dev console to calculate it for other categories:

rows = document.querySelectorAll("#torrents li"); gbs=0; for (i=1; i<rows.length; i++) { row = rows[i]; cell = row.querySelectorAll("span")[4]; texts = cell.textContent.split(/\s+/g); size = parseFloat(texts[0]); if (texts[1] == "GiB") { size=size/1; } else if (texts[1] == "MiB") { size=size/1024; } else if (texts[1] == "KiB") { size=size/1024/1024 } else { size=size/1024/1024/1024;} gbs += size }; console.log("Total GiBs: " + gbs)

I agree with your point but it’s a little funny because using the word ‘big’ and ‘100TB vault’ is a little conservative. More than half the filecoin nodes (460/794) have more than 100TB. It’s hard to intuitively grasp the magnitudes at that sort of scale.

I don’t think that’s true for the expected size of nodes and the flow-on effect for membership. Filecoin has shown us how large the spare resource pool is and how those spare resources are distributed (the big nodes are really very big). This won’t show up in internal testnets. I guess it highlights the need for the algorithm to be agnostic to this sort of thing, to be secure against present unknowns and future unknowns.

Dimitar · December 8, 2020, 10:59pm

In my opinion, this cannot be defined as a spare resource. This is a resource bought for investment purposes, just as ASIC miners are bought. I remember 3-4 years ago in the old SIA forum there were pictures of a whole data center that a private person was building to farm SIA. These private investments seem large, but are insignificant against the real spare resource on people’s computers…

If you want to see real spare resources see Storj or Sia:

Bogard · December 8, 2020, 11:19pm

I agree

I mean they can scale appropriately, e.g., if a section with 50GB of space, they could fill it up to only 50MB (per current filecoin ratio) to model the effect and take action accordingly. But I see your point. It’s probably hard to model, and intuitively I think it’s better to not have node count (network growth) be dependent on amount of data stored.

Dimitar · December 9, 2020, 11:41pm

1 PiB is 50k dollars (in hard drives), the daily trade of Filecoin is 250,000,000 dollars. The speculative demand is huge compared to the storage demand (the product)… The conclusion I draw is that there are many users who want to speculate with the coin and very few who want to use the product.

bones · December 10, 2020, 12:08am

1PB cost under £17 k in seagate 16tb drives.
£270 per drive.

Dimitar · December 10, 2020, 12:15am

It doesn’t matter, if you divide it by 50 days and compare it with the daily amount of money that goes through this product, you will see that the percentage is so close to 100 that it can be said with great confidence that the demand for the product is 0…

Which is normal, the people in the crypto world are gamblers. Maybe there is more demand outside the crypto world, but when the normal non crypto people hear “cryptocurrency” start shouting scam and pyramid…

bones · December 10, 2020, 12:17am

If the ppl you talk to are like that, you’ve sold it to them badly.
I get nothing but enthusiasm when I talk to ppl about crypto.

Topic		Replies	Views
Will there be farming pools on SAFE network? Safe-Node	55	4789	February 10, 2015
A Decentralized and Level Farming Playing Field Features	6	1321	February 8, 2015
How is Farming Centralization Disincentivized? Autonomi Network Token	33	5278	June 14, 2017
The Perils of Big Farming Safe-Node	54	3309	August 31, 2015
Is farming viable? Safe-Node	65	3653	July 22, 2019

Farming Pools and InfiniteGB Nodes

Work with Pools/Super Nodes, Instead of Against Them

Ignore It

Explicitly Disallow It

Related Topics