Next step of safecoin algorithm design


#161

Sorry I think I need to clarify. I don’t mean that large sections cause centralization as in lots of data centralizes into a single section. I mean large sections cause centralization as in only a few centralized people are able to run vaults so consensus becomes more centralized. My meaning was that large sections pose a risk to consensus and power centraliztion, not data centralization.

This poses a bit of a conundrum. Extremely cheap prices attract Jane Smith as an uploader but discourage Jane Smith as a vault operator because of the large vault sizes (which are required to have cheap prices). The pricing algorithm has a natural tendency for uploaders to say ‘I’m just going to use these sweet cheap uploads and let the big operators worry about the vaults so they can make my uploads even cheaper by getting even bigger’.

I’m sure some interesting analysis could be done about how hard it would be for a group of ‘communist’ vault operators to combat large vaults and keep sizes relatively small and consensus distributed…


One other change that may be helpful in achieving one of the general aims of rfc-0012: “the farming rate decreases as the network grows”

Currently the farm rate decreases as the section grows, since it depends on the size of TP and TS which are specific to each section.

I think a better way to capture whether the network has grown is to include section prefix length in the calculation of farm rate. That way the overall size of the network can be calculated which better achieves the goal.

To illustrate why this matters, consider two networks with very different sizes but the same farm rate:

10 sections and 100K:90K TP:TS chunks per section (rfc-0012 gives a farm rate 0.1)
vs
1000 sections and 1M:900K TP:TS chunks per section (rfc-0012 gives a farm rate 0.1)

The second network is overall 1000 times larger than the first (100 times more sections and 10 times more chunks per section) but has the exact same farm rate. So farm rate has not ‘decreased as the network grows’.

I think it’s a mistake to incentivise increasingly large sections. Including section prefix length would allow farm rate to decrease as the network grows without also needing sections to get large at the same time.


#162

But perhaps my reply would still have some application here. Consider these points.

  • Data being stored is expected to be in a fairly random distribution.
  • Thus sections should get a fairly even spread of chunks.
  • if using rfc0012
    • then if too many farmers join one section then yes price really low and farmers will pull out if they end up there.
      • and therein is the expected solution, farmers will be pulling out and rejoining to get a better section.
      • problem expected to solve itself
    • ELSE if we adopt your FD = TP or @JoeSmithJr’s idea then is there a problem of “centralisation”? The price will not be too small.

For that to be truly successful then the “big” operators have to be in a large majority of sections otherwise any uploads will either be marginally cheaper for large files or randomly cheaper for small files.

Also the large operators are also getting small rewards and from the other topic energy costs alone are not insignificant when trying to do scale. Add to that cost of operations (see drop box figures) will mean that very large operators will want bigger rewards than a home user just to cover costs. So a problem for the large supplier of vaults who would cover a large percentage of sections because they need to recover costs.

Actually this is in the RFC. coin scarcity, but maybe not enough for your purposes. After the initial growth period when the network starts to mature it is expected that the number of coins will be increasing and thus the farming reward success rate reduces proportionally to the number of coins existing. And since it is expected that the number of coins existing is increasing then the effective (not actual) farming rate decreases.

Thus compared to the actual farming rate the effective rate is

  • early 15% of coins exist EFR = 85% of FR
  • say 1 year 20% coins exits EFR = 80% of FR
  • say 5 Year 40% coins exit EFR = 60% of FR
  • say at 10 year 60% coins exist EFR = 40% of FR
  • say at 20 Year 80% coins exist EFR = 20% of FR

Yes currently the calculations assume that a section is representative of the whole network and thus the figures can come purely from the section.

I am not sure that the section prefix is much better since again it is assuming the section is representative of the whole network. For instance one section prefix maybe 20 long yet others who have not seen anywhere as much splitting might be 10 long. Maybe these are two sections at the extremes of the average prefix length. Pretty much the same sort of thing that happens with other variables of the section.

I am not so sure that a large section will just keep increasing

  • the section size increase due to spare space increasing since the storage of data is to be assumed fairly randomly distributed across all sections.
  • If the section grows due to more spare space then FR decreases discouraging farmers from remaining in the section (ie they just restart)
  • Node churning moves vaults around anyhow so would you ever get a section remaining so large its an issue?
  • Basically the larger the section, the more spare space it has the lower the FR and thus the lower the desire to remain farming. Thus a positive force reducing the section size.

#163

Nice catch! Something that isn’t obvious.


#164

Yeah this is a good point and is one of the variables I neglected.

But effective farm rate only affects coin rewards.

Farm rate is also used to set price, but there’s no ‘effective farm rate’ for pricing, only for farming rewards.

So the statement ‘farm rate decreases as the network grows’ can be reframed as
‘rewards are reduced as the network grows’ (due to effective farm rate)
but not as
‘prices are reduced as the network grows’ because there’s no equivalent ‘effective farm rate’ for price.

Maybe the role of Number Of Clients is intended to serve a similar purpose and create an ‘effective farm rate’ for storecost?


#165

Good point about effective FR (rewards) and about number of clients.

Suppose we need to keep in mind that sometimes simple solutions have some good effects without too much negative edge cases.


#166

These pointers are fantastic. I’m currently working on a kind of ‘vector map’ that shows the various forces at play within the parameter-space of rfc-0012. It will hopefully give some idea of how behaviour could play out and what motives and incentives push and pull in various directions within the economy. So thanks for the points they all help fill in the gaps here and there.


#167

General idea of activities that cause change to the farm rate and coins remaining:

The effect of each type of client and farmer activity can be summed together to give an overall magnitude and direction to the change (image below). This will fluctuate through time as spare storage and GETs and PUTs naturally fluctuate, so the arrow could point in any direction and vary between large/unsteady and small/steady.

I think there’ll be a natural tendency of farmers to always be pushing slightly toward the right, since it’s easier to have less spare storage and they’ll find it more desirable to have faster reward rate.

I think there’ll be a tendency of clients to always be pushing slightly toward the top since it’s easier to browse than to upload.

So I think the overall natural tendency of rfc-0012 will be toward the top right. Does that sound reasonable?

Farmers that are also uploaders are important to the network since they have incentive to push toward the bottom left (cheaper uploads and more coins remaining) which counteracts and balances the ‘natural’ or ‘lazy’ tendencies toward the top right. Hopefully most farmers are uploaders, but I’m sure some will be there just for the economic activity.


#168

This I think will be one of the incentives to farm, to get coin and because they realise the network needs farmers if it is to survive and maintain their data.

Honestly it is only a tendency and not as major as it might seem. Once the network is accepted as operational and safe to store their data then I expect that “Need” will become a driving force. When social media, blogs, forums, etc are being used on SAFE then the desire to PUT will have greater forces at play than the immediate desire to get PUTs as cheap as possible. People who need to upload (their cat vid to social media, their holiday photos for their circle of friends) will change that tendency and vault size, spare space, PUT cost will not be the immediate concern, but the need to store the data will be. Obviously if the price is outrageous then they won’t But I expect that the price will tend more to an acceptable cost due to the fact people will tend not to store when price rises (like in your diagrams) but that range of acceptable will be reasonably large for the majority since they store medium amounts and not like the ones “archiving the internet” will be.


#169

The ratio of client download vs upload (ie GETs vs PUTs) is significant since it gives an idea of the rate that safecoins are issued and spent.

What is a realistic GET:PUT ratio? I gathered some data to try to get some ballpark understanding.

safenetforum: GET:PUT is 16
youtube: GET:PUT is 70

For safenetforum this is calculated from the 30 latest topics as views/(posts+likes)

Likes is calculated using the average likes per post from the about page multiplied by the number of posts in each topic.

For youtube this is calculated for a popular music playlist of 10 songs as views/(comments+likes+dislikes)

Spreadsheet of data is here: views_per_content.ods

Some data will have a lower GET:PUT ratio, eg periodic backups.

Some data will have a higher GET:PUT ratio, eg pornography.

But it looks like a reasonable range of expectation for most public data would be between 10 to 100 GETs per PUTs.

I imagine this ratio will have significance for the farmrate and thus also for the storecost.

It may also have significance for optimizing vault caching and maximum network size.


#170

Except I will throw a spanner in this one and youtube also will have some of this.

When you view a topic there is one/more record updated on the server that keeps track of what you have seen and not seen. So in fact for each forum content GET done there is a GET for that usage record and an update of that usage record. So its more like 2 or 3 GETs to one PUT.

Youtube also keeps global and personal usage data so its not going to be 70 to one but lower.

The point is that even for SAFE sites they will likely keep some personal usage data so that your experience is better and since that data is your own (owned by you) and not shared then its not an issue for anonymity etc.

Where you can use that data is for files that are uploaded for others to use and do not have specific APPs for. How many files are popular to unpopular, how long are they popular etc. For this your youtube figure might be closer to the mark.

But then youtube automatically plays the next vid so how many devices are just playing vids and no one is really watching them after a while (ie wasted gets) whereas they would not download file after file automatically. So then maybe your youtube ratio is a very very high figure for category of file downloads (incl your suggestion of porn).

Then mix the different types of usage and its probably a lot better than those figures. Mainly because of the fact that good sites will store in your personal data some info about your site usage so that you can come back to the APP and know what you’ve read and what you have not etc etc.


#171

This post is about building some more intuition around farm rate and the RFC-0012 farming algorithm.

Rather than talk about farm rate (FR), I’ll use farm divisor (FD) which is 1/FR, since it’s easier to talk about really big numbers than really small numbers.

The reason why FD matters (especially large FD) is it determines a) the rate of reward for vaults and b) the cost of storage. A large FD means cheap storage. But a large FD also means less frequent rewards. So consumers want the largest possible FD, farmers want the smallest possible FD, and hopefully most (or at least some) farmers double as consumers so they have some motivation to resolve the tension between these forces.

The FD is proposed to be a 64 bit unsigned int.

The full range of FD is between 0 and 18446744073709551615.

FD is calculated as
total_primary_chunks / (total_primary_chunks - total_sacrificial_chunks),
unless there are a surplus of sacrificial chunks in which case FD is the largest possible value (ie MAX_U64). This is because “we also want to ensure that farming stops if the sacrificial count is greater than or equal to the primary count.” (I think farming should never stop and this should be changed from MAX_U64 to total_primary_chunks).

FD and Storage

To get an idea of how FD might look in real life, consider ‘what would be a realistic very big in-real-life FD?’

Let’s take an existing-but-unrelated very big in-real-life number as an example; bitcoin mining difficulty is currently 923233068449 - that’s about 923 billion.

What would be needed to actually have FD as large as the current bitcoin mining difficulty?

The largest possible FD is to be had when sacrificial chunks are almost (but not exactly) the same as primary chunks, ie TS=TP-1. The formula for maximum FD for a given TP is FD=TP/(TP-(TP-1))=TP

This means to have a really big FD (ie FD = 923B) the section must be storing at least 923B primary chunks and 923B-1 sacrificial chunks.

I’ll assume every chunk is 1MB (not a good assumption but probably true in the end due to cost savings by aggregating several small files into a single full 1MB chunk).

This means the section is storing approx 860 PB of primary chunks (and same again for sacrificial). If the section has 1000 nodes, that’s about 880 TB per vault. Pretty big vaults.

To give an idea how hard it is to have FD=923B, consider if just one less sacrificial chunk is being stored for the entire 860 PB of primary chunks; FD would now be approx 460B, ie half the desired FD. So in reality, storing 860 PB of primary chunks is the bare minimum to get FD=923B, and in reality it would probably require an order of magnitude more storage per section.

FD and Rewards

Rewards are given at an average rate of 1 safecoin after a certain number of requests. If FD is 10, one safecoin is given after approximately every 10th request.

Using the previous very big in-real-life number of 923233068449, that means one safecoin every 923B requests.

How often would this be?

We’ll have to assume some numbers, but let’s use an average request rate of 10 Mbps per vault for this example. If each chunk is 1 MB that means there’s about 1.2 chunks being requested each second. This takes 774464030385 seconds to complete the 923B requests, ie 24558 years. So a 10 Mbps connection is presumably not viable for that FD.

Either connections will need to be much much faster than 10 Mbps or FD will need to be much much lower than 923B.

FD and Price

Cheapness of storage (permanent storage) is one of the big promises of SAFE.

How cheap would storage be if FD was a very big in-real-life number, such as 923233068449?

Store cost is defined as StoreCost = FR * NC / GROUP_SIZE or to rework it in terms of FD it’s StoreCost = NC / GROUP_SIZE / FD

GROUP_SIZE is currently fixed at 8, giving StoreCost = NC / (8 * FD)

NC is “the total number of client (NC) accounts (active, i.e. have stored data, possibly paid)”

The aim of this calculation in plain English is: “a safecoin will purchase an amount of storage equivalent to the amount of data stored (and active) and the current number of vaults and users on the network.”

The cheapest possible StoreCost is when NC = 1 (since presumably there must be at least one active client).

1/(8*923233068449) gives a Storecost of 0.00000000000013539, ie a rate of 7385864547592 PUTs per safecoin. If every PUT is 1 MB that makes a cost of 6878 PB for 1 safecoin.

Obviously the number of clients will be larger than 1, so if it were 1000 then it would be 6878 TB of storage for 1 safecoin. Or for 1 million clients it would be 6878 GB of storage for 1 safecoin.

I think it’s fair to say that a FD of 923233068449 does indeed make for cheap storage, but I feel it would be too cheap for farmers to support.

Note that GROUP_SIZE may be larger if PARSEC is efficient enough, or may be smaller if PARSEC is secure enough.

A new MAX_FD?

923233068449 fits within 40 bits. Since u32 is the next smallest after u64 it would seem that u64 is indeed a good choice for FD if such high values are needed. But is it possible to fit the largest realistic FD within u32?

Using the same methodology as above, FD of u32 (ie 4294967295) would result in:

Storage: the section is storing approx 4 PB. If the section has 1000 nodes, that’s about 4 TB per vault. That’s definitely too low. I imagine most vaults could store more than that.

Rewards: on a 10 Mbps connection it would take 3602879701 seconds per safecoin, or 114 years - far too slow. Either connections must be faster than 10 Mbps or FD lower than u32.

Price: the cheapest possible price would be 32 PB per safecoin. That seems really cheap, although with 1000 clients it becomes 32 TB and with 1 million clients it’s 32 GB per safecoin. So I’d say u32 is inadequate for FD based on price.

Considerations

Churn is not accounted for. It would reduce the available bandwidth for rewards via GET requests.

Cache is not accounted for. It would reduce the opportunities for vaults storing primary chunks to be rewarded if the chunk is instead served from cache. It would also reduce the cache-provider bandwidth available for serving primary chunks. It would also reduce the storage space available for the cache-provider. I keep saying it, but cache is a really interesting aspect of the network and I think it’ll come out as one of the key aspects of how the network is structured and rewarded.

Gossip is not accounted for, which will take resources away from responding to GET requests and have the effect of reducing the rate of reward.

I’ve assumed the statement “a safecoin will purchase an amount of storage equivalent to the amount of data stored (and active) and the current number of vaults and users on the network” means

  • more data stored = less expensive prices (ie the network is large so storage should be cheap)
  • more clients = more expensive prices (ie demand is high so storage should be expensive)

But that seems like an incorrect interpretation. If someone could do their own calculations of storecost and price for a specific FD, NC and GROUP_SIZE I’d appreciate seeing that for improving my own understanding.

Collisions between newly rewarded coins and the current issued safecoins is not accounted for, but it would have the effect of making rewards less frequent than calculated in this post.

I wonder if there might be a natural cycle evolve where farmers alternate between trying to get very cheap storage at one time and then very high reward rates at another time. I don’t know how it would evolve or what time period it would be over or what magnitude the cycle could have, but it would certainly be possible to create a cycle if vaults varied their TP:TS ratio roughly in sync with each other.

Summary

Overall it feels like the reverse approach of what I’ve taken here will be used by farmers to try to ‘set’ a suitable farming divisor.

First they’ll choose a storage cost that’s roughly what they’d like for themselves (ie the highest satisfactory FD).

Then they’ll balance it by choosing a reward rate that’s going to allow them to stay happy (ie the lowest satisfactory FD).

And finally they’ll provision enough storage so they can achieve the required FD (high FD requires lots of storage to achieve).

Would be keen to know your thoughts on the approach taken in this post.


#172

Yes I always said and thought that the u64 for FD was way too small/large for for any amount of spare storage.

Also I am not convinced max FD == TP is good thing either. We need to have a another function that is not the inverse law function. Maybe something along the lines of log (exp) and so as spare storage increases the FD does not increase dramatically.

for example if the base equation TS = 9x10^5 and TP=10^6 gives 100, then saying having TS increase to 10^6 then FD might be 100000 rather than max value. This would also have the effect that farmers are not driven away so quickly just because a couple of large vaults come online in the section. Under the current equation it is possible for TP to be say 10^5 and TS=9.9x10^4 ==> FD = 100 and then 2 10TB vaults added means TP=10^5 TS=1.19x10^5 and FD = 2^64

This is a little misleading I think. We don’t want farming to stop, its just that rewards effectively stop and we want to reduce the storage available.

EDIT: also I think using bitcoin difficulty is not real life either. Thats fruit and meat comparisons, there is not really common ground to compare the two.


#173

I am missing your point here. I know you do not mean physically set a price, but I am not clear on how you feel they will “set” a price? Maybe if you can expand on the notion there it will help.


#174

I don’t think is a good assumption. If Safe success most of the data will be small files.

Whatsapp, as the biggest IM, can give us some clues. In the New Year’s Eve 2017 the Whatsapp users sent 75 billion messages included 13 billion images and 5 billion videos. As a high percentage of these images or videos are the same ones forwarded over and over again, I think we can consider that more than 95% (being very pessimistic) of the messages are short text of a few hundred bytes. And this information cannot be aggregated.

And the decrease in reward, as the number of farmers increases, seems to me the right way. Much easier to implement if, in the end, the safecoin are something similar to the last @Fraser RFC.


#175

What I mean is there’s two main ways a vault operator could operate

  1. passively using defaults from the software
  2. actively trying optimize their return on investment

Active operators have a few areas that may be optimized (eg prefer cheap storage vs prefer maximising safecoin rewards).

The lever that’s available for farmers (for the rfc-0012 algorithm) to enact this preference is the ratio of TS:TP for their vault. Some farmers will passively leave it as whatever the default functionality is. But some will actively tweak it manually (some will have less choice than others if their spare storage isn’t very big).

Once the tweaks are understood well, they’d probably be automated and released as a patched vault (the patched vault might start being used by the passive operators, which possibly leads to a slippery slope).

If I were to sit down and try to optimize my TS:TP ratio, I’d probably do it as per the previous post, although I’ll restate a slightly revised version here (these steps are building more intuition of the algorithm, not necessarily provide a recipe for success):

  1. Look at the current average vault size and farm rate for the section I’m in.
  2. Decide if I value cheap storage or large rapid rewards, and how FD should change to meet that.
  3. Change my TS:TP ratio to best match my values.

Maybe the effect of changing the TS:TP ratio for my vault won’t be enough to completely achieve my targets, but it’s a push in that direction at least.

Realistically, active vault operators will probably be either 100% TS or 0.001% TS and not much in between. (Could a vault operator be storing >100% TS?).

I admit I’ve been as optimistic as possible about operators being able to adjust the TS:TP ratio, and in reality it may not be so easy to manipulate it. The mechanism for periodic adjustments to FD within the section is not clear to me.

I agree directly comparing makes little sense because they’re different domains but it makes the point I intended - what in real life might FD become? If we’d asked Satoshi back in 2009 what they’d consider to be a high mining difficulty I wonder what they’d have said?! 923B? So rather than pluck a number I used something from real life (for lack of anything more directly comparable).

Yes I agree and your examples are very convincing. But I wonder if people will feel it’s a waste to spend 1 PUT for 1 KB when it could be used for 1000 times that amount of data. Might be a difficult perception to manage. In that case the message of ‘cheap to store’ may be detrimental, and rather it could be conveyed as ‘cheap to participate’.


#176

The vault does not decide the ts or tp values. That is on a per section basis and includes all vaults


#177

I think this is the part I am unsure about.

This is key really. It is not well described, but essentially these ratios are agreed by the section who all must vote on a safecoin being farmed. So a single vaults not using the correct ratio would not be able to adjust the ratio.

You can think of this as the ratio is defined by what a vault observes in the section.

This will need more detail as PARSEC or similar will need to enforce consensus on the ratio so that nodes can generate a reward request for a vault that satsfys the criteria for that reward.


#178

Yes thanks for the clarification, I’ve been using the terminology of TS and TP for everything when it should only be for section measurements. I’ll now use TS for total sacrificial (ie the section total) and S for sacrificial (ie an individual vault number of sacrificial chunks).

Each individual vault does have influence on the TS:TP ratio in the direction of their choosing (just maybe not very much). This is demonstrated below.

Initially

There’s a section with 100 vaults and each vault storing 100K primary chunks (P) and 99K sacrificial chunks (S). Numbers from thin air… is 100K:99K realistic? Let’s go with it.

The TP for the section is 100 vaults × 100K P = 10M TP

The TS for the section is 100 vaults × 99K S = 9.9M TS

The initial FD is 10M/(10M-9.9M) = 100

Default behaviour

If one new vault joins that section and just does the normal thing and receives chunks from the section (which evenly distributes existing TP), then it’d have 99,010 P (due to dilution of 10M TP into 101 vaults) and initially 0 S. FD stays the same.

Over time the vault will take on S, I’ll assume they take on 99K S like the other vaults have (this is definitely up for debate!) giving new section totals of 10M TP and 9.999M TS.

In this case new (eventual) FD is 10,000 which is quite different to the FD when the vault joined.

Of course in the time taken for the new vault to acquire 99K S the TP will have changed (and probably also TS) but is the general idea here correct?

Aiming for more reward by deliberately lowering FD

If one new vault joins that section and aims for maximising reward, then it’d have 99,010 TP (due to dilution of 10M P into 101 vaults) and 0 TS giving new section totals of 10M TP and 9.9M TS (ie identical to before joining)

As new chunks come in the decision not to store any S will start to effect the ratio (but only very slightly).

Is this right? Or does storing no S have no effect?

Also, if this vault can continue taking on new P but declines all (or almost all) S, would it be booted off the network for not having enough storage space?

Aiming for cheaper prices by deliberately increasing FD

If one new vault joins that section and aims for cheapest prices, then it’d have 99,010 P (due to dilution of 10M TP into 101 vaults) and 99,010K S (correct me if wrong here), giving new section totals of 10M TP and 9.99901M TS

The new FD would be 10,101 which is a big increase from 100 before the vault joined.

Generally

I think I’ve probably overstated the impact of individual vaults on TP:TS. But I think it’s still worth some further careful analysis.

Would be good to understand the way new vaults are issued S by the section and how TS is measured through time. If vaults are prevented from manipulating their true quantity of S to the section there should be no problem. But if they can rapidly change how much S they have it could lead to some strange behaviours under some conditions.

Thanks everyone for the clarifications and input so far :slight_smile: It’s an interesting algorithm to pick apart.


#179

People won’t worry as long as the PUT price is low enough.

Which brings us to a balance problem. In order for this price to go down, there must be a continued surplus of space which, possibly, will only exist if the price of safecoin grows and attracts many farmers. But, despite this possible increase in the price of safecoin, we need that the PUTs per $ ratio increase at a faster pace.

Which leads me to question how they are going to distribute that initial 85% of safecoin which, in some way, is like the rocket fuel that has to project the network to a stationary situation. This is a key point which is not being considered and which should also be included in the equation.


#180

We’re still a bit away from having any details on this I’m afraid, but I agree; it’s critical that individual vaults aren’t able to (significantly) affect any of the inputs used to calculate the farming rate, just in the same way that they shouldn’t be able to affect the generation of rewards in their favour too.