Next step of safecoin algorithm design

Traktion · September 26, 2018, 7:13am

Pondering this - asking a friend who is already on the network to throw you a few pennies is also workable. This would give them a foot in the door and likely gets a friendly introduction into how the network too.

If people don’t have friends, maybe they could make some on the forums, etc. The amounts we’re talking about to get started should be small enough to only really cost people time.

nevel · September 28, 2018, 9:57am

I just wondered, do you then actually still need account creation?
Let’s say you use the Bitcoin address format as Safecoin address. Then you could use your Safecoin(Bitcoin/MAID) address as account for the SAFE Network.
You log in then with your private key. If that address has a balance, you can use it (for buying PUTS). If it has no balance, then you can just browse or so.

And when you use Bitcoin addresses then you could maybe even use hardware wallets like Trezor for logging in, no need for an extra password at all.
Also the MAID->Safecoin conversion is then seemingless.
For marketing it gives an opportunity for performing an airdrop on exisiting bitcoin addresses. Just a small amount to perform some small uploads. This can create a real buzz and gives the SAFE Network a jump start.
Lots of possibilities

neo · September 28, 2018, 10:03am

The account is the SAFE account which holds all your account info. Like ID key pairs which includes but not limited to, coin-account IDs, services IDs etc. Also the account holds pointers to your private files. And some other stuff

Now you may have misunderstood and taken the account to be coin-account.

Well then the coin-account ID is a key pair that points to a simple record holding your balance and spending options.

But your SAFE account can and probably will hold the IDs for more than one coin-account that you use. EG you might have one for your vault, one for savings, many for anon spending. Even if most are zero most of the time

I expect it would be a negative point to have to remember private keys for many IDs, which is why there is the SAFE account record to do that for you.

nevel · September 28, 2018, 10:23am

That SAFE account will be still there, but it will be created when you login the first time with your private key. When your safe address has a balance, then the background SAFE account will be created. Your coin address will then be mapped to that SAFE account.
After that you might create an alias or so for logging in next time to make it easier.

Well it’s just an idea. Maybe not feasible at all, but on the other hand things are still changing, so there’s still room

Sascha · September 28, 2018, 12:08pm

This sounds very nice to me.

drehb · September 28, 2018, 12:40pm

David has also supported this idea

mav · October 10, 2018, 11:45pm

RFC-0012 vs Dropbox

If SAFE had a similar adoption to dropbox, what would the safecoin ecosystem look when using algorithms from rfc-0012?

Assumption: Dropbox charges USD 9.99 per month for 1 TB of storage.

How do we compare this to the lifetime storage of SAFE?

Assumption: lifetime storage is equivalent to 50 years of dropbox storage.

So 50 years = 600 months = 9.99 usd × 600 months = 5994 usd per TB for lifetime storage.

Data about dropbox suggests 500M users and 400B files.

Assumption: 400B files is equivalent to 400B chunks (1 MB per chunk) on the SAFE network. Lots of details to pick apart in this assumption but let’s go with it.

Assumption: All users run the dropbox software to access network storage, so would likewise all run vault software.

Assumption: Users allocate 1 TB storage for dropbox to use, so users allocate 1 TB for their vault to use.

Storing 400B chunks, with 8 copies of each, spread across 500M users means each user stores 400B × 8 ÷ 500M = 6400 chunks per user.

That’s about 6.4 GB which easily fits within their 1 TB allocated space.

Assumption: Safecoin price is USD 0.22, for today anyhow.

The storecost is specified in rfc-0012 as StoreCost = FR * NC / GROUP_SIZE

The group size is 8. The number of clients is active clients per section, but how many sections are there?

Assumption: There are 100 vaults per section.

This means if there’s one vault per user the number of sections is calculated as 500M vaults ÷ 100 vaults per section = 5M sections.

Assumption: All users are active users for the purpose of calculating storecost.

500M active users spread over 5M sections means 100 active users per section.

The farm rate can now be calculated.

The storecost of dropbox is 5994 usd per TB.

That’s equivalent to 5994 ÷ 0.22 = 27245 safecoin per TB.

Assumption: 1 TB is 1024 × 1024 PUTs.

Converting the price per TB to safecoin per PUT gives

27245 ÷ (1024 × 1024) PUTs = 0.02598 safecoin per PUT.

Farmrate can be calculated from the rfc-0012 formula to be

0.02598 = FR × 100 ÷ 8

Solving for the farmrate gives

0.02598 storecost × 8 groupsize ÷ 100 clients = 0.00208

Sounds reasonable enough. What does this farm rate mean for storage of sacrificial chunks?

Considering each vault would be storing 6400 primary chunks, we can work out how many sacrificial chunks they’d need for this particular farm rate.

From rfc-0012: FR = 1 - (TS / TP)

0.00208 = 1 - (TS / 6400)

Solving for TS gives 6387 sacrifical chunks.

Combining sacrificial and primary chunks gives 12,787 chunks or about 12.5 GB per vault.

Summary

1 TB of storage would cost 27,245 safecoin at current prices if SAFE operated using dropbox parameters.

Vaults would store 12.5 GB data, almost equally split between primary and sacrificial chunks.

Overall it feels very reasonable for the current dropbox situation to exist on the safe network.

Here’s the spreadsheet used to calculate the figures in this post if you want to tweak anything or change assumptions, I’d be interested to hear your findings.

So I’d have to say rfc-0012 passes the ‘dropbox viability’ test. Did I screw up somewhere? Is this reverse engineering a valid approach?

neo · October 11, 2018, 12:45am

EDIT:

Haha I should have read this correctly before posting. So feel free to ignore my post if it doesn’t fit in

How is the money used that Dropbox make used.

Profits? This is at least 40% since 40% is considered the “break-even” point for business profitability in the tech world. Most businesses whose headline profit is less than 40% usually don’t survive long term.
Future infrastructure withholdings. This is usually significant. Maybe 10-15% for a mature company
Advertising
wages
And lastly actual cost of supplying the product
- electricity
- data centre rental

I mention the above because SAFE does not have any of that since the network is designed for vaults to be run on the spare resources of the suppliers. If one uses additional equipment (eg data centre, special setups) then they expect much less rewards after costs.

So I wonder if we can really use a commercial storage supplier like dropbox to get a valid estimation of what SAFE should charge.

I suggest that we can use Dropbox as something we should be way under but not to aspire to.

jlpell · October 11, 2018, 12:54am

First, nice detailed presentation as usual.

Maybe. Not sure. You may have things reversed.

I don’t think treating the cost to store for X years up front is a valid premise. Philiosophically, the price for a PUT is not based on how long that data will be stored for into the future. Instead it is the unit cost required to maintain and secure all current data on the network in addition to your new PUT, at that instant. Since the quantity of data grows exponentially into the future, the proportion of current PUT price that can be attributed to storing and securing old data becomes negligible over time.

neo · October 11, 2018, 1:14am

After re-reading you post and especially the first line (the premise) my response is that its definitely an interesting calculation. A few comments though

If each user supplies a TB of storage but is using only 12.5GB then is the calculation for the cost of 1TB valid? The sacrificial chunks would be a lot more since there is the space for it and the cost to store would be extremely low. Yes I understand the reason you did that calculation though. This is a nitpick really.
The cost of 1TB should remove the profit margin at least from the drop box cost. Maybe even the future infrastructure withholdings. This would at least halve the cost for dropbox to less than 5$ per month.
can we really make this comparison? Yes it does show that SAFE can match Dropbox charges and can do better. Is that the real comparison here that SAFE is better than Dropbox in charges?

mav · October 11, 2018, 5:11am

Dropbox 2018 second quarter results; but putting my wisecrack link aside, I definitely agree with your general point that many costs of dropbox won’t be needed on SAFE so it should be substantially cheaper for users.

I guess the analysis wasn’t so much about would dropbox-style figures exist as about could they exist. And sure, there’s nothing ludicrous about the final numbers. It could happen. But would it? That’s a much harder question.

There are loads of ways to extract indirect meaning from the analysis, mostly by thinking “some particular thing seems wrong here, so what does that wrongness imply?”.

For example, storage that’s orders-of-magnitude cheaper than dropbox would require the farm rate to be even lower than the calculated value, which in turn means sacrificial chunk storage must be even higher than 99.9% - so to me the mechanism of using sacrificial chunks is suspicious. Having extremely cheap storage means having extremely high rates of sacrificial chunk storage. It makes me question the sacrificial chunk design.

Another example, sacrificial chunk storage cannot be more than primary chunk storage or the equations go negative (well, it’s not that simple, but it gets the point across). This seems like an unnecessary boundary and puts some limit on how cheap storage can become.

Another example, if vaults could somehow store more sacrificial chunks than primary chunks storage without prices going negative then redundancy could be significantly higher than 8 (although with weaker confidence of the permanence of sacrificial chunks).

I dunno. The analysis is sorta backwards because it asks ‘could’ dropbox-style conditions exist when really it wants to ask ‘would’ dropbox-style conditions exist. But the second one is really hard to answer, so we can instead ‘get some feel for it’ by asking the first one and seeing where it feels a bit wrong.

For me the main wrongness is the amount of cheapness is limited by the sacrificial chunk design.

Yes, I think you’re right with this point. But hopefully the takeaway is more than just ‘can’ it be better (obviously yes it can). There’s also a bit of ‘why’ is it better and maybe some insight into the structure / characteristics of rfc-0012 that’s hard to get without some point of comparison via dropbox.

neo · October 11, 2018, 5:30am

Not bad headline profits there

6 months to june 2018
revenue $655.5 million
gross profit $445.4

Thats a headline profit of 67.9% so they got above the 40% mark.

Although I wonder why they have cost of revenue and then such large operating costs. Sounds like they wanted the headline profit to be sensational (for the shareholders) and then add in true losses that really should be part of the cost of revenue.

Then their operating losses swamped them. Of which SAFE will have none since its the users themselves who pay if they use any more than spare resources.

The effects of user costs is to reduce their rewards.

There is an exception for that in the RFC setting the farming rate to a very small number, so negative is not allowed but catered for.

if TP > TS {
    FR = 1 - (TS / TP)
} else {
    FR = approximately 0
}

First price cannot go negative but nothing stops sacrificial chucks being higher than stored chunks if the code allowed that.

I don’t see that as a necessity that need to store more than 8 chunks as primary chunks. Just because you can does not mean it should be. There would be other factors to determine if more or less than 8 should be stored and ratio of sacrificial chunks should not be the determining factor. Safety/redundancy versus network traffic/transaction cost should be the greater controlling force in that decision.

Yes that seems more accurate than my one liner.

Traktion · October 11, 2018, 7:01am

Great post (again)!

Perhaps inflation of USD and potential deflation of SAFECoin should be taken into consideration? Compound inflation over 50 years would substantial to say the least.

Mendrit · October 11, 2018, 9:11am

The 50 year is too much anyway. Just check size and price approximate data storage 50 years ago.
If price of data storage will decrease every 5 years to half, then after 35years it is less than 1% of original price.

mav · October 11, 2018, 9:25am

neo:

There is an exception for that in the RFC setting the farming rate to a very small number, so negative is not allowed but catered for.
if TP > TS {
    FR = 1 - (TS / TP)
} else {
    FR = approximately 0
}

Picking up on this formula a bit and looking further into the farm rate (well, I’ll use farm divisor which is just the inverse of farm rate, since it’s clearer for the explanations below).

At the inception of the network, maidsafe will start a bunch of vaults. Since it’s a private network at this stage, the vaults can have any rules they like, and storage can be free. This allows some initial primary chunks and presumably some initial sacrificial chunks to be present on the network.

Once the network is ‘ready’ the old vaults can be replaced with new vaults running the correct farming rules and pricing will begin in earnest as community vaults join the network.

What farm rate will maidsafe choose for network launch?

How vulnerable is the early farm rate to manipulation by early vault operators?

The two questions are quite important because if early adopters can satisfy the else condition in the code below (from rfc-0012) the benefit is massive: FarmDivisor = u64::MAX means extremely cheap PUTs.

if TP > TS {
    FD = TP / (TP - TS)
} else {
    FD = maximum possible value (u64::MAX)
}

To give an idea of just how cheap, look at the farm divisors for different primary vs sacrificial chunks (bigger farm divisor means cheaper storage; I picked 1B chunks to simulate a fairly big section which would be able to offer very cheap storage under ideal conditions):

Primary Sacrificial Farm divisor
     1B      1B     18446744073709551615
     1B      1B-1   1000000000
     1B      1B-2   500000000
     1B      1B-3   333333333

That first row really stands out!

The difference in cheapness between 1-in-a-billion missing sacrificial chunks vs 2-in-a-billion missing sacrificial chunks is quite a lot (double!). But if there’s 0 missing sacrificial chunks the farm divisor becomes extremely large (ie storage is unbelievably cheap).

I think there’s very low chance of sacrificial chunks being 100% available in real life but if they are, storage becomes a bit too cheap. Maybe the else value should be something like FD = TP instead of FD = u64::MAX, ie

if TP > TS {
    FD = TP / (TP - TS)
} else {
    FD = TP
}

This would change the table to

Primary Sacrificial Farm divisor
     1B      1B     1000000000
     1B      1B-1   1000000000
     1B      1B-2   500000000
     1B      1B-3   333333333

Also worth observing is the maximum cheapness is limited by the number of chunks in a section, so larger sections means the possibility of cheaper storage, but also might possibly create dangerous centralizing forces. There can be extremely cheap storage with extremely large sections, but not extremely cheap storage with small sections.

Storage can’t be extremely cheap in the early days of the network because the farming divisor is naturally limited to a fairly low value (by nature of a fairly low number of chunks). Not a bad characteristic per se but one that I think warrants further consideration.

I think the two questions are quite important (especially if the existing rfc-0012 proposal with u64::MAX is retained)

What farm rate will maidsafe choose for network launch?

How vulnerable is the early farm rate to manipulation by early vault operators?

neo · October 11, 2018, 10:20am

Yes very important and I’ve always said while spare space is plentiful the store PUT cost will be very low. approx 5x10^-20 * 100 (NC) / 8 ~= 6.5x10^-19 OR 1.5 million Terra chunks per safecoin.

But of course it would not take long as people store data for that to drastic become more expensive. So a quick way to attract people to store files.

Yea, I’ve suggested in the past that there is a better minimum pricing. You suggest FD = TP which is good because its not a magic number and the minimum price is related to how much storage there already is. Excellent idea and one that should be suggested when the algo for pricing is being developed.

Since the users want to store their data (sort of random distribution) I doubt there would be a centralising force because of normal user data. There might be an attacker who only stores chunks with an XOR address that would hit the section, but really all the attack does is increase the price and then there is no difference in pricing to other sections.

I’d say that would be left up to whatever algorithm is used. The network will already have data since we’ve been (release candidate) beta testing it.

A lot if little data stored. Like adding many vaults with huge combined storage then removing it. Early on a single user could provide a significant amount of storage.

JoeSmithJr · October 11, 2018, 11:16am

Let me join the fun and have a go redesigning that formula.

FR = MAX_FR * FR_BASE ^ (-TP/TS)

This gives us the following with FR_BASE=(1.01, 2) and MAX_FR=1B for TP=1B and varying TS:

       TS           FR_BASE=1.01          FR_BASE=2
-----------------------------------------------------------
     infinity      1,000,000,000        1,000,000,000
4,000,000,000        997,515,509          840,896,415
2,000,000,000        995,037,190          707,106,781
1,000,000,003        990,099,009.93       500,000,001.04
1,000,000,002        990,099,009.92       500,000,000.69
1,000,000,001        990,099,009.91       500,000,000.35
1,000,000,000        990,099,009.90       500,000,000.00
  999,999,999        990,099,009.89       499,999,999.65
  999,999,998        990,099,009.88       499,999,999.31
  999,999,997        990,099,009.87       499,999,998.96
  750,000,000        986,820,512          396,850,263
  500,000,000        980,296,049          250,000,000
  333,333,333        970,590,148          125,000,000
  250,000,000        960,980,344           62,500,000
  100,000,000        905,286,955              976,562.5
   50,000,000        819,544,470                  953.6743
   25,000,000        671,653,139                    0.0009094947
   10,000,000        369,711,212                    infinitesimal
    5,000,000        136,686,381
    2,500,000         18,683,167
    1,000,000          47,711.85
      900,000          15,793.33
      800,000           3,965.361
      700,000             670.8244
      600,000              62.76396
      500,000               2.276420
      400,000               0.01572409
      300,000               0.000003939315134
      200,000               infinitesimal
      100,000               infinitesimal
            0               0

This formula doesn’t have exceptions or sudden changes at the point were TS overpasses TP, but instead it smoothly varies across the full range of possible TP/TS ratios.

NOTE/EDIT: This formula has different dynamics than the original one. While FR remains relatively low for much of the range and then rise quickly to infinity and beyond with the original, it grows quickly to near the maximum and then stays there with this one. How quickly it does that can be adjusted by the base for the exponent (the higher it is, the slower it grows).

mav · October 11, 2018, 11:08pm

Sorry I think I need to clarify. I don’t mean that large sections cause centralization as in lots of data centralizes into a single section. I mean large sections cause centralization as in only a few centralized people are able to run vaults so consensus becomes more centralized. My meaning was that large sections pose a risk to consensus and power centraliztion, not data centralization.

This poses a bit of a conundrum. Extremely cheap prices attract Jane Smith as an uploader but discourage Jane Smith as a vault operator because of the large vault sizes (which are required to have cheap prices). The pricing algorithm has a natural tendency for uploaders to say ‘I’m just going to use these sweet cheap uploads and let the big operators worry about the vaults so they can make my uploads even cheaper by getting even bigger’.

I’m sure some interesting analysis could be done about how hard it would be for a group of ‘communist’ vault operators to combat large vaults and keep sizes relatively small and consensus distributed…

One other change that may be helpful in achieving one of the general aims of rfc-0012: “the farming rate decreases as the network grows”

Currently the farm rate decreases as the section grows, since it depends on the size of TP and TS which are specific to each section.

I think a better way to capture whether the network has grown is to include section prefix length in the calculation of farm rate. That way the overall size of the network can be calculated which better achieves the goal.

To illustrate why this matters, consider two networks with very different sizes but the same farm rate:

10 sections and 100K:90K TP:TS chunks per section (rfc-0012 gives a farm rate 0.1)
vs
1000 sections and 1M:900K TP:TS chunks per section (rfc-0012 gives a farm rate 0.1)

The second network is overall 1000 times larger than the first (100 times more sections and 10 times more chunks per section) but has the exact same farm rate. So farm rate has not ‘decreased as the network grows’.

I think it’s a mistake to incentivise increasingly large sections. Including section prefix length would allow farm rate to decrease as the network grows without also needing sections to get large at the same time.

neo · October 11, 2018, 11:37pm

But perhaps my reply would still have some application here. Consider these points.

Data being stored is expected to be in a fairly random distribution.
Thus sections should get a fairly even spread of chunks.
if using rfc0012
- then if too many farmers join one section then yes price really low and farmers will pull out if they end up there.
  - and therein is the expected solution, farmers will be pulling out and rejoining to get a better section.
  - problem expected to solve itself
- ELSE if we adopt your FD = TP or @JoeSmithJr’s idea then is there a problem of “centralisation”? The price will not be too small.

For that to be truly successful then the “big” operators have to be in a large majority of sections otherwise any uploads will either be marginally cheaper for large files or randomly cheaper for small files.

Also the large operators are also getting small rewards and from the other topic energy costs alone are not insignificant when trying to do scale. Add to that cost of operations (see drop box figures) will mean that very large operators will want bigger rewards than a home user just to cover costs. So a problem for the large supplier of vaults who would cover a large percentage of sections because they need to recover costs.

Actually this is in the RFC. coin scarcity, but maybe not enough for your purposes. After the initial growth period when the network starts to mature it is expected that the number of coins will be increasing and thus the farming reward success rate reduces proportionally to the number of coins existing. And since it is expected that the number of coins existing is increasing then the effective (not actual) farming rate decreases.

Thus compared to the actual farming rate the effective rate is

early 15% of coins exist EFR = 85% of FR
say 1 year 20% coins exits EFR = 80% of FR
say 5 Year 40% coins exit EFR = 60% of FR
say at 10 year 60% coins exist EFR = 40% of FR
say at 20 Year 80% coins exist EFR = 20% of FR

Yes currently the calculations assume that a section is representative of the whole network and thus the figures can come purely from the section.

I am not sure that the section prefix is much better since again it is assuming the section is representative of the whole network. For instance one section prefix maybe 20 long yet others who have not seen anywhere as much splitting might be 10 long. Maybe these are two sections at the extremes of the average prefix length. Pretty much the same sort of thing that happens with other variables of the section.

I am not so sure that a large section will just keep increasing

the section size increase due to spare space increasing since the storage of data is to be assumed fairly randomly distributed across all sections.
If the section grows due to more spare space then FR decreases discouraging farmers from remaining in the section (ie they just restart)
Node churning moves vaults around anyhow so would you ever get a section remaining so large its an issue?
Basically the larger the section, the more spare space it has the lower the FR and thus the lower the desire to remain farming. Thus a positive force reducing the section size.

Nigel · October 12, 2018, 2:27am

Nice catch! Something that isn’t obvious.

Topic		Replies	Views
Current unsolved SAFE questions Beginners	21	2450	May 5, 2019
Is farming viable? Safe-Node	65	3694	July 22, 2019
Fraser's Safecoin Alternative Design (Postponed State) RFCs	162	5879	October 1, 2018
Vault size = usage? answer: usage (upload) = safecoin = GB of vault * time Beginners	7	586	May 26, 2019
What algorithm defines the value of vaults? Development	1	744	October 5, 2015

Next step of safecoin algorithm design

Related Topics