Fixed Target for Vault Size
What would be the benefits and drawbacks of having a fixed target for vault size?
This post aims to explore the idea. I’m personally mildly in favour of it over a floating vault size, but am really just looking to explore the topic.
The Idea
Vaults have a fixed target size; I’m going to use 50 GB throughout this post as the example target.
The target is not a strict boundary. It’s used as a yardstick by which other operations are performed, such as how to adjust storecost.
Why
Network size increases proportional to the amount of data stored.
This may not seem important, but consider the reverse of this, where network size has no relation (or very little relation) to the amount of data on it. The amount of overhead (mainly from routing messages) would depend on some other constraint of the network (eg maybe the amount of spare space, maybe the economic algorithm etc). There is probably a very long and complex conversation to be had about this point.
Joining and relocating has a fixed cost (in terms of bandwidth and storage, not necessarily dollars).
This is important because if vault size becomes very large then it restricts who can take part in the network. It could get to the point where joining or relocating is impossible unless you have an industry-grade network connection or disk storage. This might be acceptable or even desirable (mainly for end-user performance reasons) but I think running a vault should be possible for many users. One way to ensure this is by making the startup conditions achievable by setting a fixed size. The difficulty changes of bitcoin mining and constant increases in hash efficiency has made mining ‘industry only’; an unconstrained vault size (probably tending toward large vaults because of demand for ‘efficiency’) would lead to industry domination of farming in a similar way. Again, a very complex conversation could be had here.
It improves the network structure.
I’m not sure about this… Having a fixed vault size, like having a fixed chunk size, allows better understanding of the effect of events that affect the structure of the network. Having very large vaults compared to having very small vaults would require different responses when a relatively infrequent event happens, say, 10% of the network dropping out. Having a fixed vault size allows the potential responses to be better understood and accounted for.
It has a better ‘worst case scenario’.
The worst case scenario if bandwidth and storage continue to grow exponentially but vault size remains fixed is a bit like having a very small fixed vault size today, eg only 100 MB. The consequence is more routing overhead than the ‘optimum’ vault size that suits the current group of vault operators. I think this non-optimal-routing cost is ok, since a too-small-vault-size has the effect of expanding the potential set of vault operators. This also forces vault operators to focus on improving the routing efficiency (which is an open / common / communal problem) rather than vault efficiency (which is a closed / individual / private operator problem).
The consequence would be that individual operators would be required to run multiple vaults per machine. I think that’s better than having a floating vault size that leads to industrialised farming.
Fixed vault size suits the fixed chunk size better.
A fixed chunk size makes assumptions about the current state of networking and computing and storage mediums, but does not account for what may be ‘optimum’ in the future. This is an acceptable compromise, since many small chunks is not seen to be significantly problematic compared to few large chunks (possibly incorrectly). So the question is, if a fixed vault size is not desirable, why does the same not apply for a fixed chunk size?
I know there’s a pretty strong mindset here against magic numbers, and I agree on the most part with that. But I’ve never quite reconciled that with the ‘engineered’ magic numbers of, say, 1 MB chunk or 8 redundant copies or X section size etc. It’s a tricky situation no doubt about it.
It simplifies the economy.
The economy is loosely attached to the concepts of supply and demand of resources such as bandwidth and storage space. The ability to increase supply of network storage when vaults are 1 TB in size is much more constrained than if vaults are 1 GB in size. Fewer people can address the supply shortage for the large vault scenario. Adding flexibility in the supply improves the economics of the network.
It simplifies the membership rules.
Currently the proposed disallow rule for joining the network is based on the portion of full vaults. This is a very coarse measurement and during high stress events may not provide enough buffer. Alternative disallow rules based on the amount of spare space require pretty complex algorithms to measure the spare space. With a fixed vault size both these problems are greatly reduced. This is a bit of a brief explanation and there’s a lot of detail to dive into here, which may end up showing this perspective to be too simplistic. It’d be great to see more thoughts on this particular point.
Operational expectations.
Having many small vaults per operator requires them to approach vaults as ‘cattle’ rather than ‘pets’ (see this technology parable). This requires a certain degree of tooling and failure management from the start. Hopefully proprietary tooling and operations don’t become an advantage to large operators. But if we come to have a mix of pet vaults from small operators and cattle vaults from industry it might mean industry has a big advantage because they’re motivated to have superior tooling and failure modes than the pet level operators. Why not just make everyone work with cattle?
I admit I may be overdoing it here… There will always be areas for improving operational efficiency no matter if the network is pets or cattle or a mix. I think the topic of operations is a risk worth mentioning. It’s easy to take for granted how much work it is to run a bitcoin miner or safe vault. Lots of small vaults makes the operations even harder perhaps, but at least that difficulty is seen for what it is up-front rather than years down the track.
Proposed Mechanism
A network section allows a new node to join at any time, but only one after the other, not multiple at the same time (or maybe this could be some fixed number of simultaneous joins?). Any node trying to join while a new node is being accepted is turned away. This sets an approximate join rate depending on the bandwidth of the joining node (see modelling below). Note that the overall joining rate may be slower if potential new nodes decide not to join the network and the ‘join queue’ is empty.
StoreCost is adjusted every time a new vault is allowed to join, calculated by the difference between the average vault size and the target vault size. If the average is larger than the target size storecost goes up to try slowing down the upload rate. If the average is below the target size storecost is reduced as there is spare space to fill. If the average is close to the target, storecost remains the same. The exact amount of the adjustment is open for debate!
Rewards are set by the rate of nodes being turned away. During the period while a new node is joining, the number of nodes turned away is counted. When the node has finished joining, the rate of join attempts is calculated. If there’s more nodes trying to join now than previously, rewards are slightly decreased. If there’s less nodes trying to join now than previously, rewards are slightly increased. If it’s about the same rate rewards are kept the same. There’s a natural tension here for existing operators: they will naturally want to have more nodes join, but trying to join too aggressively will reduce the reward for their existing nodes. It also gives new operators the ability to be more aggressive in joining than existing operators.
In effect, rewards are intended to control the join queue size and storecost to control the upload rate. This is all sitting within a ‘temporal framing’ of the join rate, which is determined approximately by the average bandwidth of new nodes.
Open Questions
Does having a fixed target vault size affect the security of the network? Is a bigger network (more nodes) inherently more or less or equally secure than a small network? Is there an optimum network size for a given level of technological development?
Is there such a thing as ‘too small’ for vaults? Is this true always or only true once we have a certain level of technology?
What are the primary drivers for how a floating vault size evolves over time? Is it primarily economic? Technological? Evolving toward exclusion or inclusion? Evolving toward high performance or low performance?
What are the costs and benefits of a floating vault size and is it a preferable compared to the costs and benefits of fixed vault size?
Is there a good hardcoded value to use for a fixed vault size?
How does it affect the economic model for safecoin storecost and rewards? My first thoughts would be a) use reward amount to manage the queue size of vaults waiting to join (not too big not too small) b) use storecost to encourage more uploads but not too much that it’s stressful.
Modelling
Network Size
This modelling is based on 50 GB fixed vault size, 100 vaults per section, 8 copies of each chunk.
Small network, 1 PB of data (ie 8 PB of chunks), requires 160K nodes, 1600 sections, maximum 11 hops.
Medium network, 1 EB in size, requires 160M nodes, 1.6M sections, maximum 21 hops.
Large network, 1 ZB in size, requires 160B nodes, 1.6B sections, maximum 31 hops.
Those numbers seem reasonable to me. Maybe slightly on the high side but not a complete show stopper.
Variance
Because of the random distribution of xornames for chunks, not all vaults will store the exact same amount of data.
This is covered in the topic Chunk distribution within sections.
For 50 GB average storage, expecting a reasonable max/min size ratio of 1.5, the worst variation would be about ±10 GB (ie 40 to 60 GB), with the majority of vaults being between about ±4 GB.
Joining Requirements
50 GB download time (a relevant link here is List of countries by Internet connection speeds)
connection | download time |
---|---|
1 Mbps | 5 days |
10 Mbps | 12h |
100 Mbps | 1h |
1 Gbps | 7m |
10 Gbps | 43s |
Growth Rate
Consider a network with 1600 sections × 100 nodes per section × 50 GB per node = 8 PB of chunks or 1 PB of data.
There can be up to 1000 new nodes joining at a time (one per section).
They will all need to download 50 GB which on a, let’s say, 10 Mbps average connection, takes 12h each.
So storecost and reward rate is adjusted approximately twice a day in every section.
The network would grow at 50 GB × 1000 sections = 50 TB every 12h, or 100 TB per day, or 1.25% per day, or nearly 100-fold in a year.
This sounds ok to me. Maybe the growth rate is a little too high, but if the join queue reduces then the growth slows, so fastest growth of 1.25% per day is roughly in line with some ballpark intuition for reality. Would be nice to explore this growth rate under various conditions a bit more deeply.
The larger the network the faster it can grow (more sections means more simultaneous joining possible).
The faster the average network connection the faster the network can grow.
Summary
I think fixed vault size makes a lot of sense and would create a more sustainable network than a floating vault size. The fixed size means it’s easy to understand the joining requirements and it’s less likely to lead to centralization and exclusion due to very large vaults.
There are some additional overheads introduced as a result, but I think the cost is not prohibitive and is outweighed by the likely benefits.
One of the main benefits of a fixed vault size is it would force vault operators to focus on addressing routing efficiency (which is an open / common / communal problem) rather than vault efficiency (which is a closed / individual / private operator problem).
As I said at the start I’m only mildly in favour of this idea so would love to hear your thoughts. This long post makes it seem like a well-formed idea, but really I’m just wanting to discuss it and am open to all responses.