Fixed target vault size

JimCollinson · January 20, 2020, 9:49am

Yeah, you’re asking the wrong guy I’ll let someone else take that one

JohnM · January 20, 2020, 12:22pm

I would add that I think it would be important to be able to adjust the max size and bandwidth parameters in the future and not just when the vault was created. Maybe I add a HDD and can support a bigger vault, want to install other things and shrink the vault (over time as the network adjusts), or find that 50% bandwidth is too limiting to other functions locally and want to play with that parameter to see what works best.

dirvine · January 20, 2020, 12:48pm

Yes, atm we can cope with different size vaults. When one becomes full we put the data on another, but the full node does not get as much of the farming reward any more, the others do.

When a node is full then vaults tell routing, give us another vault. That is how the network grows. It also slows growth which is good in a lot of ways.

This is why say a data item will “likely” be stored on the nodes in a section xor closest to it, but not necessarily. There are some side effects and we will get to those very soon. (i.e. node joins, should it be given all data close to it etc.)

davidpbrown · January 20, 2020, 3:54pm

Expecting necessarily that still is more than a new node filling up or the temptation will be to restart new nodes for the greater reward.

Greed is not good but many will be motivated by what is in it for me.

dirvine · January 20, 2020, 5:41pm

Absolutely, it will be much cheaper to stay than restart, more so as you age more. Restart will lose 50% of age. New nodes will be at almost zero again.

Southside · January 20, 2020, 7:14pm

My favourite radio station - WIII-FM

mav · January 26, 2020, 2:10am

I tried to find the dominant strategy for this (regarding vault size) but it seems fairly difficult to say with confidence what it might be. I retained the writing of my thought process anyhow, since I learned something from it so maybe others will too, or find stuff I missed or got wrong. It will be fascinating to see how things progress in the testnets.

I still think fixed target vault size is a good idea, better than targeting 50% full vaults, but I think the 50% full vaults idea will be great to try out and will give useful data from the testnets.

Here’s a link to RFC0057 - Section Health which describes the current intended joining / disallow mechanism and how it relates to vault size.

each section will aim to maintain a minimum ratio of 50% good nodes [ie not full]

and what I consider the main economic effect (halving reward for full vaults)

[adjusted age for reward portion] if flagged as full { node's age/2 }

My thought process was like this:

I think there’s an unequal amount of power when the disallow rule is based on full vaults (more power to large vaults), and I think that power difference goes away with fixed target size vaults, or at least is moved from an individual private optimisation (of vaults) into a shared public optimisation (of routing).

Joining

A new large vault has the power to prevent other new vaults joining. When the large new vault joins it tips the balance of full vaults from slightly less than 50% to slightly more than 50% so that means new vaults are no longer required for this section (at least for a while).

A new small vault has the power to allow another new vault to join. When it joins it almost certainly keeps the balance of full vaults at less than 50% so another vault must also join.

Is one of these actions more powerful than the others? I don’t think it’s immediately obvious one way or the other. I suspect the large vault has more power since the act of trying to join is available for everyone but the act of being large and blocking new joiners is only available for those already on the network.

Adjustability

An existing large vault can continue to increase their spare space ‘on demand’ and remain not full. It can choose when to do this (eg at 80% full or 90% full or 99% full).

An existing small vault… I’m not sure, can it increase size and begin accepting chunks again? This would be an interesting situation, having a kind of ‘switch’ on the disallow rules.

I think large vaults have more power here since they can respond to the current network situation; they get to decide their amount of overhead (spare space) in a way that small vaults cannot. The ability to be responsive seems like a benefit or power, but maybe it’s neutral in the end?

Small vaults get to vote ‘we need new vaults’ and large vaults get to vote ‘we don’t need new vaults’ so to me the power of vote-by-size seems to be equally powerful. Probably small vaults have an advantage because they get more votes-per-GB than a large vault would (eg 10 x 1GB full vaults count ten times more in the joining calculation than 1 x 10GB not full vault).

So I guess from an adjustability perspective it’s also not clear whether small vaults or large vaults have more power on the joining mechanism and overall vault size.

Harassment

If large vaults are harassed (eg ddos) and eventually kicked off the network, the disruption and amount of recovery to the section is significant. Their chunks must be redistributed (and possibly redirected if close vaults are full). A large vault being booted may also mean more joining is required, and if all new vaults are small there may be a lot of joining required.

If small vaults are harassed the disruption is less since there are less chunks to recover. It may mean redirected chunks can return to not being redirected so overhead on elders may be reduced. (Would elders feel compelled to harass small vaults to reduce the overheads for redirected chunks?)

Again I don’t feel there’s any clear imbalance in power (although there may be in reality). Perhaps large vault operators harassing small vaults have more power since a small vault leaving means less chance of a new vault needing to join, so greater concentration of power to the existing large vaults.

But conversely, small vaults that harass a large vault off the network may lead to many more small vaults being able to join, which seems very powerful.

Large vaults seem to have more direct benefit from harassing small vaults since it directly concentrates power to that large vault. A small vault only gets indirect benefit from harassing large vaults since the new joining vaults may or may not belong to that operator.

Economic motive

Full vaults are rewarded less than notfull vaults (their reward amount is halved). This is a direct incentive toward large vaults, which could possibly compound over time. Unless small vault operators are happy to accept the reduced rewards there is little chance of small vaults deliberately exerting force on the average vault size. If you can run a large vault you certainly would try to because it pays twice as much. The algorithmic need for 50% full vaults suggests that running a large vault will not be achievable for everyone, so who do you think will end up actually running large vaults? Almost certainly the most competitive and knowledgeable and technologically literate operators. And they are the operators getting the most reward, so are best positioned to maintain their operations.

The economic power seems clearly to lie with large vaults.

Perspective

Would you rather be a vault operator fighting to increase average vault size, or would you rather fight for small vaults? As in, which tribe would have an easier task?

If I was going to try to get large vaults to dominate, I’d be doing this:

very actively attempting to join the network with one vault and never letting the vault reach capacity once joined
continue joining one large vault at a time until a bottleneck is reached
if any vault becomes full, kill it so the remaining vaults can continue to have headroom
harass the smallest vaults in the network however possible

If I was going to try to get small vaults to dominate, I’d be doing this:

very actively attempting to join the network with the smallest possible vault size and in the most parallel way possible
continue joining with more small vaults until a bottleneck is reached
once the bottleneck is reached, kill my largest vault and continue trying to join with the smallest possible vault size
harass the elders of all sections my vaults are in since they are probably the most stressed anyway (they have a lot more responsibility than adults so are probably closer to capacity than the adults). There may be a smarter strategy for harassing than this.

Of these two strategies, my feeling is the larger vaults have the biggest advantage mainly because of the larger economic reward but also because of the concentration of power they get when small vaults depart.

I also feel that many operators (hopefully most) will do the right thing and run vaults at the optimum size depending on the latest analysis of network conditions. It’s easy to focus too much on the nefarious operators and the greedy actors, but my personal approach would be to run a vault in the middle ground to hopefully best contribute to a stable and healthy network.

I guess an important question is, would you prefer to run a large vault? Or to rephrase, would you run a small vault if your reward is half that of the large vaults?

dirvine · January 26, 2020, 10:59am

A lot of great insight as usual, but I think possibly missing something. I am not sure super small vaults are as useless as we imagine. If they all fill up the section will split, the large vault cannot stop that (it is one of circa 200).

Also the section needs to have a split trigger, so that is say 200 (imagine full balance, but we know it’s more than 200 in reality). That split trigger is a hard function so based on numbers of vaults and their addresses. However there needs to be a way the network requires new vaults, otherwise we have mass join attack etc.

So not answering any point (I am deep in other things as usual, but hope to fix that very soon) but just throwing in these thoughts. If we are fixed vault size it might still all work/be OK etc. but perhaps there is more to it. I am not sure I am adding much but any way some more thoughts. (btw I lean towards fixed for simplicity on launch, something I am very keen to push right now)

davidpbrown · January 26, 2020, 11:25am

Does node size matter, if the network and notionally the data itself, is trying to find balance across all nodes.

Having more nodes that are of reasonable age, surely is more robust and to be preferred.

The idea that providing resources to the network, encourages the simple idea that providing bigger is better. However, if there was some sense of magic balance - like magnetism - that data prefers the emptier node, then that would encourage more nodes, not larger ones.

I wonder there are a number of considerations that encourage that more nodes are better than larger ones.

It is how the network makes use of what is available AND that there is a mixture of nodes types available, that will help provide a robust environment. Nodes being all the same, likely introduces a risk of one form or other.

If all nodes provide an increasing resistance to being full, then nodes that are less full and more responsive, can jump in and take that new strain.

My instinct would be to use something natural like 1/r^2, as a measure of resistance to more data and another acknowledging its age.

So,

space      |     resistance
 remaining |      to new data
  0        |     infinite
  0.1      |      100.00
  0.2      |      25.00
  0.3      |      11.11
  0.4      |      6.25
  0.5      |      4.00
  0.6      |      2.78
  0.7      |      2.04
  0.8      |      1.56
  0.9      |      1.23
  1        |      1.00

and another for node age that could just be a linear to minimum age and then all age is equal… or is there something about elders that I’ve not read enough about that is a class above based on age. I’m guessing that elders will have some reputation of not failing over a long period (relative to others - so, worst case the best available are made use of… connectivity being always an issue of any node.)

So, that allows in a crisis, the network to signal additional stress that call on that resistance OR more simply that each section of the network is working with what is available and when a new node appears, that naturally gets preferred for offering space that sees less resistance… bit like water flowing to an empty hole.

2¢

edit: the other consideration perhaps is that each section needs a fair mix of whatever kinds of nodes exists or risks being full of small | large | responsive | other, without all the added value of those other kinds.

edit2: trivially - its inevitable that data will fall to larger nodes… bigger, more responsive, larger bandwidth nodes, should be wanted - and lots of them. To encourage small nodes, seems to prefer the short term and risks longer term than larger nodes are not available when needed.

jlpell · January 26, 2020, 8:39pm

Nice presentation @mav. You have further convinced me that all vaults should be the same fixed size.
This size could increase over the long term with network updates, but at any given instant, all vaults in the entire network should have identical storage capacity. Much like the bitcoin “halvening” we could have a vault size “doubling” every so often when certain criteria are met. However, I still contend that a fixed vault size of 1TB would likely be sufficient and performant for the next 10 to 20 years, so the automated doubling might not really be necessary.

davidpbrown · January 26, 2020, 9:43pm

All constraints are potentially brittle.

Today’s TB is tomorrow’s GB

1TB seems small given the Dropbox’s standard offer is 5TB - 12Euro/month, which I consider is a moderate size.

Nodes owners to users is unlikely to be 1:1 … and no telling what might be practical in future… 5G bandwidth might help support larger nodes etc etc

Encouraging node owners to invest in hardware that is limiting, might be ok today but then a problem tomorrow.

tfa · January 26, 2020, 10:07pm

I would say that 1TB is too ambitious. When community network was launched a minimum of 32GiB free disk space was specified but many vaults didn’t respect this requirement.

davidpbrown · January 26, 2020, 10:12pm

Sure, initially some squeeze might help spawn many nodes but as above resistance is an option to see that new empty nodes are strongly preferred.

If it works, then it won’t matter.

jlpell · January 26, 2020, 11:12pm

I didn’t say forever like Gates did. The actual size selected at Genesis is a design decision and of lesser importance to the decision of enforcing that all vaults are the same size.

No, today’s TB is effectively next 1 to 2 decade from now’s GB. So there is plenty of time after launch to test and propose the next size upgrade 10 years from now. The primary mechanism for increasing network storage capacity should be increasing the number of sections. An efficiency gain from increasing vault size within each section is only needed to reduce intersection communication overhead.

I agree, but it is likely the most practical given bandwidth limitations. A 10TB requirement at launch given the current size of 3.5inch hdds is the next logical alternative. However, you would exclude too many people from participating. Anything less than 1TB is too small and those wanting to offer less (down to sub GB) could serve as cache vaults. A standard vault size of 1 million chunks (each being 1 million bytes) is easy to visualise. KISS. With the difference in size between 1TiB and 1TB, various metadata could be stored.

davidpbrown · January 27, 2020, 7:38am

Sounds like guesswork, so you might be right.
My guess would be 80% long term cold store and depth perhaps useful.

My instinct is to avoid assumption and brute force problems with flexibility from the start to see what emerges.

In this case perhaps it doesn’t matter for if the network takes off then new can replace old… a time of simple stability and forcing of small clone size could be useful too.

No strong opinions… whatever helps make issues visible is useful and one route to testing is artificially restricting parameters.

Topic		Replies	Views
Optimal vault size Features	5	1070	May 12, 2014
Proposal for network growth based on node size rather than node fullness Features	95	2648	January 4, 2022
Vaults - Statistics and Fundamentals Features	0	739	April 8, 2016
Max supply of storage Features	3	741	September 11, 2017
Bandwidth usage in vault configuration - RFC research RFCs	6	1420	July 4, 2016

Fixed target vault size

Related Topics