Proposal for network growth based on node size rather than node fullness

I strongly believe the current growth based on the portion of full nodes is unsustainable (see farming pools and infinite gb nodes). Instead, we could use the portion of nodes above a target node size. I feel this will lead to a more sustainable network growth rate (see safe network growth spreadsheet).

Current network growth technique:

  • elders check how many nodes are full
  • if it’s more than a certain threshold (50%) more nodes join the section

Proposed network growth technique:

  • elders check how many nodes are storing more than a certain amount of data (eg 50 GB)
  • if it’s more than a certain threshold (50%) more nodes join the section

There’s a bunch of reasons for why this might be the case. Using a target node size to govern network growth is superior because:

  • Participation becomes easier over time rather than harder.

  • Allows devs / network engineers to optimize around a predictable set of parameters and behaviors.

  • Problem of large nodes gets worse exponentially, problem of hops gets worse logarithmically.

  • Improved scaling characteristics, ie overhead for nodes (cpu/ram) scales with network size.

  • Simple overall network size calculation, good for marketing.

  • Parallelism always beats sheer size (disk/cpu/ram/network) when it comes to distributed networking.

  • Improves the unfairness and difficulties of asymmetric consumer upload/download speeds.

  • Improves granularity of disk usage, eg can only run a single 2 TB node on a 3 TB drive so 1 TB must be wasted.

I won’t elaborate on each point because it becomes very long-winded but I’m sure there will be doubts about some of these points (I have them myself) so feel free to debate and discuss, I will be happy explore any of these ideas.

The change is in essence very simple (although in practice there’s a bunch of signalling that would be affected).

sn_node/src/capacity/rate_limit.rs#L51-60:

pub async fn check_network_storage(&self) -> bool {
    info!("Checking network storage");
    let all_nodes = self.elder_state.adults().await.len() as f64;
-   let full_nodes = self.capacity.full_nodes() as f64;
+   let full_nodes = self.capacity.nodes_over_size(50_gb) as f64;
    let usage_ratio = full_nodes / all_nodes;
    info!("Total number of adult nodes: {:?}", all_nodes);
    info!("Number of Full adult nodes: {:?}", full_nodes);
    info!("Section storage usage ratio: {:?}", usage_ratio);
    usage_ratio > MAX_NETWORK_STORAGE_RATIO
}

In reality I would prefer this to be not a fixed X GB target node size. A fixed size an easy analogy to get a deeper point across. I would prefer to use a velocity measurement. eg if a section would take longer than X minutes to replace Y nodes then the velocity has dropped too far and we need to get more nodes to join. Velocity is the same as a target node size that varies with current bandwidth availability. This is a little more complex to calculate and reach consensus than a fixed target node size but would be much more flexible in the face of technological improvement over time.

18 Likes

What the network really needs to know is how close to capacity the nodes are isn’t it? In other words, the case where 100 nodes, each having allocated 100GB for Safe storage and all of them are currently storing 50GB is not as critical as a circumstance where 100 nodes currently storing 50GB each have only committed 60GB each for Safe storage.

5 Likes

If I understand correctly with this mechanism, we will be able to have more nodes in the network if we choose a small amount of GB, ie. in theory, more people will be able to earn some tokens which is good for the network :dragon:

3 Likes

I would think some function of the two is best to cover edge cases, but primarily on number of nodes, but checking for some limits dealing with node fullness

4 Likes

To echo what @VaCrunch is saying, the elders check to see if nodes are storing more than a certain amount of data but do they also know how much space has been allocated? If for example 50% of nodes are storing 50 GB but each node has allocate 1 TB, then does that justify adding nodes to the section?

I’m guessing I’m missing something but I like the benefits you list so would love to hear more details.

Btw, is this a trivial change? Not that that would sway my opinion but I am curious.

3 Likes

@Nigel Better to work with stored. Don’t want to swamp the home user with people who have 50,100 or even 200TB

More fair if node average out more in data store. Let those with a lot provide many nodes. Fairer for them really and fairer for those with “normal amounts”. In my opinion of course

6 Likes

I almost included a comment on what I think you’re saying but just to be sure. You’re saying the elders would know the allocation of nodes because there is a set limit to allocation?

For example, each node can only provide up to 50 GB as to allow for more PC farming, large storage operators just create more nodes. The elders check to see if more than 50% of nodes are storing more than some threshold of the allowed 50 GB and add more nodes to the section.

Is this what you’re saying with using both approaches or what @mav is saying?

The numbers I used in the first example were intended to be arbitrary but you make a fair point when using such high numbers that it swamps out home farmers.

2 Likes

More the original concept MaidSafe had where there is an average vault size (used) and the larger vaults will most likely be storing more and they are disadvantaged somehow.

Now I am not suggesting that here, to be sure

What I am suggesting is

Using node allocated storage then those who allocate 100, 200 TB will swamp anybody supplying 100GB or 1TB and it would work against adding more farmers.

Using stored size then the 100TB person will likely have something like (but more I’d expect) the one who allocates 1TB. Say the 50GB. In this case then using node stored size will mean the 100TB person does not unfairly affect the 1TB person. And encourages the 100TB person to run multiple nodes which is an advantage to them and the network.

3 Likes

I think we may be saying the same thing just differently.

If I can attempt to clarify. If there is a limit to how much space a single node can store, something reasonable to a home PC like 50 GB, then a large storage center would have to have more nodes to use all of their available space.

So back to the original question. If the elders choose to add more nodes if 50% of nodes are storing x amount, doesn’t x matter? What should x be?

1 Like

Why not drive enlistment of new vaults from existing vaults refusing to store more data?

2 Likes

The network can know this more accurately if each node is a fixed target size.

We could try to measure spare space, but it seems unnecessary if we move to a fixed target size.

If we have 10 nodes trying to join our section vs 100 nodes, would that be a decent indication of how much spare space is available? 10x fixed target size in the first case, vs 100x fixed target size in the second case.

I feel we can’t / don’t need to measure spare space.

Node operators can communicate spare space to the network via queued / unjoined nodes. If I have 100 GB spare and the current node target size is 50 GB then I would keep trying to join until I had 2 more nodes on the network. If I have 60 GB spare, same process but 1 node instead of 2 nodes.

We also need to decide what counts as ‘critical circumstance’. I don’t feel like spare space is a measure of critical circumstance. I feel the replacement speed of any existing resource is a better measure (this is what I’ve called the velocity of the OP, not sure how appropriate that term is). We can probably always measure the replacement speed regardless of the specific growth mechanism, but my point is I feel the replacement speed is what matters for risk, not the spare space. For sure we need spare space to do the replacement, but I feel that’s not as critical as the speed.

For an extreme example, a node with 100 TB spare space on a 56K dialup connection. That node doesn’t improve a critical circumstance at all, if anything it increases it since it will take a long time to replace a lost node with that spare space.

Yes, this is right. And as you say it’s more than increasing the number of nodes, it’s also increasing the diversity of nodes. If we set a lower limit then it allows a broader range of participants. So it’s both more nodes and more diversity.

My feeling on this is to kill rather than allow full nodes (but I’m open to being convinced otherwise). In my view the complexity for redirecting chunks away from their closest nodes because of full nodes is too risky and unnecessary. I’m not completely sure though.

I say yes for a few reasons. Firstly I don’t feel we should be trying to measure or work with spare space (it can go away any moment, either offline or for some other use). Secondly a 1 TB node takes a very long time to replace if it relocates or departs, which is a risk / inefficiency to the network which goes away with a smaller target node size. Thirdly, that 1 TB node sets a kind of standard for other node operators, if not now then in the future, and I don’t feel that sort of competitive mechanism is healthy or sustainable.

Possibly. Currently it’s a trivial change plus some tidyup as per the code I quoted in the OP, but that code is also currently unused so it’s not too clear what the future holds for this.

Or even 75 PB like with filecoin (I know, very different mechanisms, but these size nodes do exist in the real world and if farming becomes even slightly popular or competitive or professional I would prepare for much larger operations than that).

Yes you’re right. X does matter. A lot!

A basic approach is to look at, say, the steam hardware survey and say we want 99% of these people to be able to run at least 1 node, which is a target node size of 100 GB.

Maybe we can extend this by allowing the target to increase at a set rate, or by network size, or by vote, or some other way.

I feel like instead, elders should set the target node size by measuring the likely replacement time of nodes. eg if a node departs, how long would it take the remaining nodes to reduplicate the lost data? If it would take a long time (eg more than five minutes), bring more nodes into the section until the replacement time is below the target. Bringing in more nodes a) increases the total speed of the section and b) reduces the data per node. As things change (more data uploaded, more nodes join, better connections are developed etc) the replacement time hopefully remains a fairly consistent measure of risk and is an effective way to decide when to bring in new nodes.

How do we know they’re refusing to store it? Or more precisely, what’s the lag between them really not storing and elders actually finding out about it? Seems risky to me.

9 Likes

@mav

Make writes concurrently to two vaults in the same section to drastically increase the odds that at least one will succeed. Writes will need to be replicated eventually anyway, so even when both succeed (the majority of the time) it wouldn’t be wasteful.

2 Likes

Is there a mechanism for section shrinking as well as section growth?

I think it would be interesting if a form of natural selection could come into play by introducing some randomness to these numbers and allowing the more successful sections to thrive and grow and the less successful to die - so that the network can adapt to unknown future changes.

Is that in the cards or could it be … or is it all in the too hard box?

1 Like

Can you please clarify this, I’m not sure I understand… to my understanding chunks are already written to multiple nodes as soon as they’re uploaded, right?

https://github.com/maidsafe/sn_node/blob/b0c8e67f40d162fc627be13a42e681f5431c02ca/src/node/elder_duties/data_section/metadata/blob_register.rs#L33

Sections, yes, they can shrink. But there’s no way for sections to shrink so much that they’d need to merge.

Sections can shrink by nodes going offline.

So far there’s no punishment system that I know of (except bls_dkg with complaint and justification), but it will definitely be more prominent later in development.

Yes this could be useful, although it’s probably too early to say how it might work. The testnets are a sort of manual form of this in a way.

Natural selection is a kind of space search, so we’d want to make sure the parameters were pretty clear. In my opinion Perpetual Auction Currency is a good open-ended option for doing this sort of exploratory survival optimization stuff.

2 Likes

Does this mean that mobile users are out of play for the farming game? A lot of people these days are doing all their internet on the phone, especially in “emerging” countries, prime beneficiaries of Safe. To accommodate them, would 25GB be a workable size?

1 Like

I can understand why replacement speed of a node is important, but if priority is given to faster nodes, won’t we possibly end up with data centers hosting alot of nodes?
We’re supposed to be being inclusive for anyone to join, but surely if we look at speed alone we will end up with something efficient but exclusive.

1 Like

From the beginning, I believe, there has been a loose plan to place a “drag” on uber setups like datacenters to reduce the performance advantage they have.

Also, it has always seemed to me that some kind of random element (doesn’t have to be large) should be included in the algorithm to award tokens to the appropriate farmer(s).

2 Likes

The main issue I see (and I do agree with much/most of your conclusion) is that Magic number, then one that says what size is good/correct. As disk sizes are exponentially increasing then that number needs adjusting manually (upgrade issues perhaps). Main point though is who set’s that number?

8 Likes

It would be arrived at through testing. As long as the fixed size can be adjusted (for new nodes only of course) via upgrades, it doesn’t seem to be that critical to get it absolutely right in the beginning.

2 Likes

What I mean is this changes over time and who is to say we got it right (blockchain size debates). So we can test with early adopters who have a particular “fit” then the general public don’t use it cause the fit is wrong. Then do we force upgrades and so on?

7 Likes