Fixed target vault size

Having a fixed size vault seems to me to increase the barrier of entry too much. If someone wants to particpate in the safe network, I think they should be able to with any amount of storage and bandwidth (expect if the amount of storage is ridiculously small like 10 MB or something). But I like the idea of rewarding more to the optimal size, so long as the optimal size is within what the average person could afford to set aside for the safe network.

10 Likes

What about making vault size neither free, not completely fixed, but parametrized? I am thinking about 2^x GB, so the allowed vault sizes would be 2GB, 4 GB, 8GB, 16GB, 32GB, …

This in my head should have less complexity than floating vault size an may still have some advantages of the fixed size vaults - all vaults of size x have same parameters. X-sized vault can be replaced with two (X-1)-sized vaults without too much recalculating and so on.

2 Likes

I think we have to be realistic about what will benefit the network too, especially in the short term. Going with useful defaults to get to MVP is fine, imo. Adding extra features to broaden appeal later on is normal.

6 Likes

This…

Lets not forget what the M in MVP (or MVE, @JimCollinson) stands for.

Minimum to get us up and running, enhancements can come later, once we have demonstrated basic functionality.

7 Likes

I think so. If the network is being constrained by cpu load (mainly for signatures and hashing) then it makes sense to have larger vaults which would reduce the portion of the requirements that are cpu related and return to a more ‘balanced group of bottlenecks’ rather than just cpu. Ah I see later @jlpell says a similar thing.

Technology is also quite variable through time so that means a fixed vault size is maybe not correctly encapsulating the underlying physical reality, or rather it’s about efficiency more than correctness.

The load on the network is also presumably quite variable, both from uploaders and from storage suppliers, so it makes sense that the vault size itself might vary to compensate for this.

But I think these benefits do not justify the risk of powerful operators slowly chipping away at increasing vault size to exclude smaller operators.

I’m not sure what you mean by this; do you mean distributed in xor space or geographically or something else?

A few considerations on this…

Joining requires immediate relocation (chosen by the initial section), so the attack is over the entire network, and is very difficult (impossible?) to target at any individual section.

Throw in some mildly costly activity (maybe proof of resource) for joining and it would seem the cost of this becomes fairly high.

The effect of the attack is to reduce reward, so I’m not really sure how this effect is supposed to help the attacker…? Ah yes, @digipl got there before me “Adding farm bots would only reduce the StoreCost so it does not seem economically beneficial.”

Definitely worth thinking about but I don’t see this as a major obstacle with the current understanding of it.

Thanks for these calcs. I goofed on my initial ones anyhow (typo of 1000 instead of 1600 half way through).

Starting and stopping vaults is more complex for many vaults than one vault.

Logging and monitoring of multiple vaults is more complex.

Certain strategies for load balancing of multiple vaults may provide advantage over other strategies, which a single vault is not subject to.

I’m sure there’s other things too but these are the ones that come to mind.

This would not work. From RFC0058 RMD - General mechanism of delivery

“chooses a neighbouring section closest to the message’s destination - the next hop section”

A can’t pass to B directly unless it is a neighbour and is in the Delivery Group.

Sure, I agree they should be as large as possible, but what actually is that? Not the number or the size, the ‘possible’ aspect. I mean, should it be ‘possible’ for a certain portion of internet users to run vaults? Understanding what is ‘possible’ needs an understanding of the demographics being targeted. It’s sorta painful to me to be so fuzzy about it, but I think it’s really important to appreciate that there are audiences and demographics and circumstances and real life stories happening behind all this. I think that’s mainly what determines ‘possible’ rather than the numbers or the technologies (sorry I have no data to back it up!!).

Yes definitely, no point removing a storage or bandwidth bottleneck by forcing small vaults only to replace it with a (potentially worse) computation bottleneck.

I think there is something, or maybe I misunderstand this. The point of the vault initial download of chunks is to obtain economically useful chunks (ie close to their xor location in the network). And since it’s only a tiny subset of chunks (which the vault can’t know prior to being located) then the large vacuuming of chunks isn’t economically sustainable for that vault. I think the “initial block download” (to use a bitcoin phrase) is extremely important to consider, since it’s the first (and probably hardest) hurdle to overcome for any operator.

I like your distinction here between upload and download bandwidth. It’s a good one to keep track of. Maybe upload bandwidth will have the greater impact on target vault size in the end.

Not sure I can agree with you on this. No ‘normal’ user would tolerate that load on a home internet connection when they also want to watch videos etc.

I agree on an ‘ideal network operations’ capacity but in a real life scenario and diversity of vault operator scenario I can’t see this being feasible.

I know you’re just outlining an idea, so this is not a totally serious question, but where do these numbers come from? “Fixed audit ratio”… Mainly I’m curious about how to optimise this ratio with some engineered method, how much auditing to do and the effect on security when adjusting that number.

I dunno about that. If the network is fairly quiet then it may only require 1 core for 1 in every 10 seconds, ie 0.1 cores.

You frame this nicely here. The engineering is a lot about balancing various hardware capabilities, which are to some degree unknown into the future, so we want to do the engineering with that uncertainty in mind somehow.

But I think another aspect is missing here, which is the social aspect, ie understanding who has access to what and why and how they may use that or not use that on the network. There’s a social / governance / political / economic aspect to this which is not just the technological engineering but the ‘soft’ engineering aimed at aspects such as target audience and inclusion etc. Maybe my politics encroaching slightly too far?

1 TB downloading at 8 Mbps takes 12.7 days to download… so when a vault joins and is expected to take on 1 TB of chunks, that’s two weeks for startup? Am I understanding this correctly?

For sure this is quite possible. To be clear, I’m not talking about that in the original post. Variations to the fixed size is sort of a next step type concept, but totally possible. I think varying the vault size over time comes with some issues which confuse the intention of the original post (original post is looking at maybe the first 2-5 years from launch whereas variable size is maybe looking at issues that would begin to arise 10-20 years out in my opinion, sorta like the issues of floppy vs cdrom, that kind of time scale and technological progress). So I just addressed the most basic form.

Yes, this sounds sensible to my intuitions.

I guess the fixed vault size is a bit like the fixed block size used in bitcoin, things will pivot around it but that anchor provides the point upon which to pivot. It creates friction for sure, but without friction at least somewhere you may end up sliding right off the map.

Maybe a way to think of this difference in approaches is 1) fluctuate between two (or more) repelling boundaries and 2) fluctuate around a point of gravity.

Allow more vaults in to distribute chunks more thinly. That’s the only way I know of! And to add more vaults, increase the reward. Or focus on adding only high bandwidth nodes. Or if there’s a disallow rule then relax it. Or if there’s limits to simultaneous joining then increase that limit. Or all of these.

I’m not sure if one type of choice for size is any more or less artificial than any other, be it fixed or floating or algorithmic or genetic or market driven or whatever. They’re all choices and are all equally ‘artificial’.

I agree, participation should be flexible and open. This basically means a fixed vault size must be enforced when the chain of logic is followed far enough.

Data is evenly distributed among all vaults (because of how xornames work it’s about ±20%), eg if the average vault size happens to be 100 GB then all vaults will be within approx 80-120 GB. So it’s not like we’ll ever see 10 GB and 100 GB vaults at the same time on the network.

If we let vaults get very large then only people who can run large vaults can participate because all vaults must be large.

If we ensure vaults are small then all people can participate, with large operators needing to run multiple vaults.


Sorry only got around to replying, hopefully will add more material related to the original post later.

5 Likes

I think we have to think of the V too. If we make decisions that effect the network’s ability to sustain itself, then it’s not viable. The thing needs to float.

From my UXey focused perspective, I’d say, rather than a hard rule (which might say, exclude people with limited resources, and thus result in fewer vaults) we could nudge people in the right direction.

I actually put a little thinking into this with the Vault UIs, here:

When folk are starting up a vault, we take a look at their available space on the target drive, and offer three quick options, with the most desirable size for optimal network health being the pre-selected option.

This flow could be further tuned to include hard stops, or pre-defined increments based on network requirements I guess.

So a nudgy way of doing it, if we don’t want to be hard and fast about it.

Understanding what to nudge users towards, is another question though!

14 Likes

I think both you and @digipl misread what I wrote, and missed the point. It is not about StoreCost or farm bots, it is about attacking Reward through building up join queue, as it is calculated like this:

It is enough to be rejected to cause changes in Rewards. And just being rejected requires far FAR less resources than previously possible attacks (i.e. they never even need to be able to pass resource proofs, which basically all other econony-attacks required). Still need to attack the entire network, but every bot can be basically as simple and cheap as to only manage to do a join request.

So, that’s the difference between this proposal and others, that the barriers for sabotage have been lowered with probably orders of magnitude.
Which is worth mentioning. It makes the network vulnerable for a longer time, and requires it to grow bigger before being reasonably safe, than would otherwise be needed.

And an attack doesn’t need to be immediately beneficial for it to happen. Leaving something open for sabotage will almost certainly cause it to happen, and affecting rewards/prices in any type of economy is sure enough desirable for plenty of strong players out there.

9 Likes

I think that there is a distinction to make here. For blockchains, the current state is what matters, so you need the full IBD to complete before you are up and running. For SAFE, I think the vault will be able to start serving data as it recieves it, akin to a torrent user being able to upload what it has before the torrent is complete. Vaults could also participate in routing and caching before ICD (initial chunk download™) is complete, which may have associated rewards.

8 Likes

xor space…
In that simple list of suggested interests, it was the most important copy… so, worst case stress in different scenarios might call for different kind of node… fast or large or responsive cf CPU; TBs; RAM; etc

1 Like

From simply a user perspective (and maybe this is longer-term) I would envision something like the following…

I install the vault software and set the maximum storage space I want to dedicate to the network (say 5 TB) and also perhaps what percentage of upload bandwidth I want to devote (say 50%). From there the vault opens a 100 MB space and starts to work. Over time, depending on the network speed and reliability my one vault can grow to as much as 5 TB. However, it can never be more than that. And in fact if my vault ends up at 4 TB, but my connection slows for any length of time, the network could slowly shrink that 4 TB to some thing less to again hit the optimum. Also, whether this is handled internally as one big vault or multiple sub-vaults should be handled internally by the interface app in my opinion. I suspect, though, that a HDD failure will be the biggest cause of data loss so I would assume one big vault is better or at least no worse from a bandwidth/reliability standpoint than multiple small vaults.

Not sure how hard it would be to achieve all of this, but this seems like the ideal user experience. And it allows the network to optimize within the max parameters set by the farmer. I suspect predetermined limits on storage based on bandwidth or other factors will turn out to be short-sighted before long. The network should ideally be self-adapting. Apologies in advance if I missed that point somewhere in the thread above.

8 Likes

Thats my understanding too. If an average user wants to run a BTC node then its a long time before they can do a single search on the DB (blockchain). We are talking days to couple weeks depending on CPU and link speed.

7 Likes

I’d say that you can still have a user defined storage amount, but it just may be made up of one or more vaults to achieve it. It is important to realise that fixed vault size of X does not mean a user can only share X; it can be any multiple of X.

The important thing is how the network scales with the fixed sizes and how much complexity it saves. Also, the minimum amount should not be so large that it excludes too many home users, imo.

5 Likes

This attack could be reduced by limiting the number of Join Queue nodes considered in the calculation and with my proposal to add a Dropout Ratio, which also includes those nodes that refuse to enter. This would achieve that the influence of this type of malignant nodes would be almost null.

However, my great concern about this proposal is that, according to the current design, we can expect that the immutable data will be distributed mostly uniformly, but this is not the case of the mutable data, especially sequenced, which could grow significantly creating important size differences between the different Vaults of a section. We would need some sort of solution to this divergence to make this proposal viable.

4 Likes

I proposed something like this a while back:

Vaults could come in a series of fixed sizes. It would simplify the economy as @mav already mentioned and, if the size depended on the age, it would also lessen the loss if younger, thus less reliable, vaults disappeared or misbehaved.

4 Likes

Rather than a single size, we could tackle the problem of centralised large vaults by setting a cap or a range. This would avoid or limit the exclusion of those only wanting to share smaller amounts which I think it’s to be encouraged so long as it is useful to the network.

3 Likes

And perhaps should have from early on (in the live network), to properly incentivise joining and contributing (considering the onset time of a fully loaded vault).

I’m actually working on a proposal since new year, addressing exactly this.

2 Likes

Long post follows, responses to multiple posts all in one, starting with @mav.


The parallelization offered by sections may offset this considerably. For example, consider fixed vault sizes and section counts that are known to vary from a minimum of N to a max of 2*N. Therefore the communications overhead within a section will vary from birth O(N(N-1)/2)* to binary split at O(2N(N-1)). This communications overhead within a single section will be the same regardless of vault size. The magnitude of the fixed vault size determines the required number of sections to satisfy the total network storage requirements. Presumably intersection communication is negligible compared to intrasection communications overhead (no parsec between sections right?). This clearly indicates that it is not worth the effort to manage variable vault sizes within each section. You would only NEED to do this if for some reason the network was limited to a maximum number of sections, which it is not. Instead it’s simpler and easier to play with the rules for section splits and section sizes via elder decisions from a higher level. Keeping fixed vault sizes also makes this section split rules and accounting much easier to rationalize and define. So no real benefit to a floating vault size except for premature nano-optimization for the sake of nano-optimization.

If history is any guide, then fixed chunk sizes and vault sizes are the way to go. Storage drives originally had variable sector sizes but this was eclipsed by fixed block architecture. Hard disk manufacturers used a 512 byte fixed sector size from as early as 1979 to 2010 when specs were migrated to 4k sector sizes. Over thirty years of technological advance using the same basic unit of storage. This is a positive indicator for maintaining fixed chunk sizes and fixed vault sizes as basic building blocks of the network. KISS, MVP, KISS, MVP, Launch. In thirty years when we all have 1PB storage drives in our augmented reality contact lenses we can revisit what the next fixed vault and chunk size should be to improve performance.

A simple way to do this is have them create a fixed size data store of 1M chunks. The chunks can be initialized with a seed provided by the section. The hash of that data would need to be returned to the section before they can join as a vault. The section can validate the hash against other candidate responses based on majority rules. It means vaults join 4 at a time, with at least 3 in agreement as to the correct hash.

Yes, and given the cognitive load I would say 1 vault per processor core (softcore in the case of a hyper-thread enabled system) is the max I could stand managing on a single box.

I agree that the barrier to entry should be minimized. It’s a bi-objective min-max problem. Minimize cost and barrier to entry while at the same time maximizing performance/stability/etc. We have to draw the line somewhere. IMO that ideal line is a 1TB standardized vault structure. See below for more discussion on how use this standard vault size while also lowering the barrier to entry zero ( a 1 chunk vault anyone?).

As soon as the first chunk is downloaded, it should be able to be served. Facilitating this operation becomes even easier if we use a standardized fixed vault datastructure. I don’t believe there is any need for a large initial block download waiting period. As the vault size increases the number of GETs served will too. So instead the user might see little farming reward earnings until the vault has reached full size.

After all blocks have been downloaded from the section, upload bandwidth is really all that matters. Take a look at internet transit costs to get a real sense of what bandwidth costs. Your isp could never afford the download bandwidth they quote you for a 24/7 operation.

I think you are forgetting a few things. 1) The ‘normal’ home user is primary downloading content. A vault will be a primary uploader so they might not even notice the extra load. 2) The ‘fixed size vault’ spec demands a fixed size minimum bandwidth requirement that the user must meet. (Ex. 1Mbps 10% @ 24/7 operation) The maximum bandwidth allowed to a vault can be configured by the user based on how much of the resource they want to provide. (Side note: my first source on projected connection speed increases from cisco at 2X in the next 5 years is at odds with 2X every two years via Nielson’s Law. Personal experience doesn’t match in my part of the world; we’ve had roughly a 2X increase over 10 years for the same cost.)
3) Based on a world average upload bandwidth of 12 Mbps mobile and 40 Mbps fixed, a standard minimum bandwidth requirement of 1 to 2 Mbps per vault would be less than 10% of a ‘normal’ user’s upload bandwidth capability, including mobile.

This falls under barriers to entry. To be a vault operator you need :

  • A) Access to enough stable electricity that the vault can be powered during the majority of a 24 hour period.
  • B) Access to an internet connection with a minimum upload bandwidth.
  • C) A computing device such as PC, tablet, phone, server, etc.
  • D) A storage medium of minimum size to accommodate a fixed size ‘unit vault’.
  • E) Financing, to purchase items A through D and pay running costs.
  • D) Time, knowledge, and skill to manage the equipment setup and ongoing maintenance.

These barriers to entry are not zero. It is unrealistic to focus on one type of barrier to entry base the network design on that one characteristic alone. Barriers A through E are largely based on geographic/political/economic realities. Barrier D is minimized by a good UI via @JimCollinson, good tutorials, and a good help forum / community. Every software program has a set of minimum system requirements.

No. There is no reason why vaults would need to fill the whole 1 TB before they begin to serve chunks. It’s just about 12.7 days before their full Safe coin earning capability is reached. For a vault that is expected to be online for years, the 12.7 days to reach full earning capabiltiy is negligible.

Agreed. This is an important distinction. When I respond in regard to “variable” or “floating” vault sizes, I am referring to these in the most general sense. This means that it’s ok for any vault operator to boots up there machine, picks a random size from 1 MB to 10000+ TB, and connect to the network on their dial-up modem if they so choose. When I respond to your “fixed” size concept, I am referring to the idea that ALL vaults in the network at the current time have exactly the same properties. (They are all exactly 1 TB with the same mem-map type data structure, and they all have the same minimum cpu,ram and bandwidth allocation). At some point if vault sizes should increase to gain some performance, then ALL vaults in the network should then upgrade to the same “fixed” size, and all new vaults will need to meet the spec. As you say, if we focus on launch and the next 2-5 years, then 1 TB fixed vault sizes are IMO ideal ( 1 million chunks of 1 million bytes). I suspect 1 TB vaults could make it at least 20 years.

Yes!

You hit upon the key idea here. With the building blocks kept simple, the devs can focus on the high level section rules and the farming algorithm. Everything higher up the software stack becomes “easier”. This kills the proverbial two birds with one stone.

Yes, exactly. The issue is how small is too small? IMO, anything less then 1 TB is too small for a standard vault unit ( 1 million chunks of 1 million bytes).

Rewards are set by the rate of nodes being turned away.

As @oetyng very clearly pointed out, this is no good.

Yes, exactly. You’re on to something here.

This is completely unnecessary if all vaults offer the exact same resource spec. Memory and Storage would be fixed to a specific value ( Ex. 1GB RAM, 1 TB Disk). CPU and bandwidth would need to maintain specific minimum constraints (Ex. above a min score on CPU test, 1Mbps sustained upload, latency must be within a standard deviation of the average section latency to join).

So what is that number? A 2 TB HDD costs $49 on newegg. Is this too high a barrier to entry? A couple of smaller 256GB disks could be had for free from a dumpster to build a 1 TB raid. The standard 1 TB vault size I proposed is a pretty low barrier to entry. It probably would be better to go with a 10 TB minimum size for performance but that would exclude too many people. Ahhh, but what about phone farming you say? There is a solution for that below.


Vault Sizes as low as 1 MB through dedicated Cache Vaults
(Slightly exaggerated clickbait at 1MB, the lowest possible vault size needs to be high enough to thwart the obvious attack vector. 10 GB is probably a better lower bound.)
There is an important aspect that we haven’t really discussed yet, caching. The way to get sub 1 TB divisibility on vaults is to consider a standardized/fixed size vault store, and a completely arbitrary size cache store. This lowers the barrier to entry for small players, while also allows big players to play around with optimizations. Of course this would require that caching is rewarded by the farming algorithm. How this might work:

  1. User starts up vault software and enter the total space allocation offered to network.
  2. Vault software divides this into a set of N standard vaults (1 TB), and then takes any remaining space and divides it equally to form a cache store for each vault.
  3. Other system parameters are checked such as ram or cpu to make sure they can support N vaults. If not, the number of vaults is decreased to the sustainable level and the cache store sizes increased.

Examples:
*Assuming standard vault unit = (1 CPU min, 1GB RAM minimum, 1 MBps minimum sustained uplink, 1 TB hdd/ssd fixed.)

  • Micro Farmer (1 CPU, 512 MB RAM, 500 kBps, 1.5 TB hdd)
    -Farmer requests 1 TB vault.
    -Infeasible due to RAM and uplink limitations.

  • Small Farmer (2 CPU, 2 GB RAM, 1MBps, 512GB hdd) -
    -Farmer requests 128 GB vault.
    -Farmer gets 1 vault with NULL TB storage and a 128 GB cache.
    -Feasible up to 1 vault for CPU, RAM and uplink requirements.

  • Medium Farmer (8 CPU, 16 GB RAM, 14MBps, 4TB hdd) -
    -Farmer requests 3.75 TB vault.
    -Farmer gets 3 vaults, each with 1 TB storage each and a 250 GB cache.
    -Feasible for up to 8 vaults for CPU, RAM and uplink requirements.

  • Large Farmer (64 CPU, 128 GB RAM, 1 GBps, 384 TB hdd) -
    -Farmer requests 300 TB vault.
    -Farmer gets 100 vaults, each with 2 TB storage and a 1 TB cache.
    -Feasible up to 100 vaults based on high performance CPU resource test score, RAM and uplink requirements.

To make this work the farming algorithm would need to offer cache rewards. There are a few clever ways to do this so that cache hits are worth much less than original hits while benefits of caching are maintained without depleting the farming algorithm (will need to get into this at some point in another thread). For the case of NULL vaults that are caching only, the reward they get could be increased to make the experience more worthwhile, but not inline with that of a full standard vault. The major benefit I see with this approach is that you could have millions of mobile phone users ( with sporadic internet connections that go off and on many times throughout the day) that can offer up a few GB of their internal flash and immediately begin to earn farming rewards as “Cache-Only Farmers”:dollar: . Depending on the farming reward structure they might not need to be as concerned by losing node age also.


Tiered Vault Sizes

So I just introduced two classes of vaults by recommending Full vault operators with a 1 TB vault store and cache-only completely arbitrary size vaults. If one goes through that trouble why stop there right? Why not just have the option of selecting vault size from a subset of acceptable fixed sizes (small, medium, large)? This would be a middle ground approach. For example the following tiered structure would appear to span the gamut of typical system sizes circa 2020.

  • null - cache only
  • micro - 10 GB
  • small - 100 GB
  • medium - 1 TB
  • large - 10 TB
  • macro - 100 TB

Organizing the accounting of these various size structures may or may not add a lot of complexity to network programming. Experts please chime in here. Even if this type of scheme is desirable, the first step to having a multi-tier system is running cache-only nodes and one other fixed standard size. For launch and an MVP all we need is a single fixed vault size, and cache-only node capability with a farming reward algorithm that weights those roles appropriately.


I think you would find that this convenient way to do the UI still works fine if any vault with less capacity than the standard unit is set only do caching. (Also, remember your GibiBytes vs. GigaBytes when it comes to your UI. 1 Gibibyte (GiB)= 1024 Mebibytes (MiB), 1 Gigabyte (GB) = 1000 Megabytes (MB)

8 Likes

These threads are always interesting, but there is a reason why building an MVP, then iterate, have become common practice. There are simply too many unknowns, with the biggest being whether we can have a working network at all.

On day 1, we don’t need a fancy UI to help a granny create a vault. We don’t need to support granny’s 10 year old computer. We don’t need to support those who can’t run a vault for weeks at a time without interruption. These are all goals for after the network is functional and being used.

Once the network is live, investment and developers will come in. Once the network is stable, power users will move in. Once the network gets hyped, some regular users will try it. Once it is on the news and the papers, granny may ask what it is. Once it is everywhere, grandson or granddaughter can set her up.

If we can launch with a simple fixed size vault, with useful defaults, we can then see how it works out. Fewer moving parts, means less complexity and fewer, more fixable, bugs. Feedback from users can come in, word can spread.

IMO, MVP could simply be a docker container with a fixed size vault and basic docs. If that scares some, then they need to wait.

So, my position is - if fixed size vaults are possible and simplify implementation now, then that is the route that should be taken. We just need to make sure it doesn’t completely close off changing this later on, in a few years, after SAFENetwork is storing 1000s of terabytes.

15 Likes

The same, perhaps, is having some minimum requirement and what the network chooses makes use of over time relaxes.

Devs choice as to what they prefer to kick start…:thinking::+1:

3 Likes

Exactly…
Can we get consensus on this and return to this topic after launch?
A fascinating discussion, I actually understood quite large bits of it, but one I think we can have after launch.

4 Likes