Next step of safecoin algorithm design


#81

Clever solution!

The cheapness of actually storing is compared with the cheapness of plugging the gaps from cheating. The cheapness of cheating depends on the cost to regenerate the required gaps. If that cost involves a signing function then it creates an incentive for users to have fast signing (as an alternative to a lot of storage) which also benefits normal network operations. Could be quite synergistic.

One downside is the tester cannot test for more storage than they have themselves. Well… actually they could since they can discard all non-testable data (assuming they are only testing and never being tested themselves).


#82

Most nodes will not become elders but still be earning off the vaults. Children and adults will also be running vaults and farming. And infants could be made that they don’t relocate when becoming children and then its not useless. And also the filling of the disk can be crypto problems that do not require bandwidth to be used up but does require the vault to be filled up so it can answer correctly to a challenge. That is if filling the vault with sacrificial chunks is deemed not useful enough.

EG something like this


Also if you ask neighbouring sections their median vault size and incorporate that then you also get each section being reasonably close to what would be a network wide average.

Also it would prevent a section reducing its median because of an unusual number of small vaults being assigned to the section

I’d say that there needs to be a mix of actual chunks and generated chunks. The actual chunks inbetween force the algorithm to store the others. It’d take some time to work out the method and algorithm but by supply at least 1/2 in actual chunks then even if it could bypass the algorithm and still do it on the fly (computing) it still has to store 1/2. Then it will be found out at some stage and rejected from the network.

And as David said we expect that 2/3 are good nodes reporting correctly because that is what the code does.

Oh well solved by @JoeSmithJr who did the clever thinking. Good on ya.

But what are the benefits from cheating? This is the real issue to solve. I’d suggest the biggest benefit from cheating on space is to disrupt. The network finds out that you cheated when you are asked to do the actual work of storing then retrieving. Cheating can be detected if a simple test is included in the code to request a “random” chunk stored in the least used vaults and check it can be retrieved correctly. I suggest the least used (by measure of GETs done) because the more used vaults are being tested by normal operations.


#83

Yes unfortunately we have to have a mechanism that works strongly against the spamming of the network with writing chunks and attempting to fill up the network. Price is very much a strong deterrent and a disabling force against this attack. Someone mentioned such an attack (yet again) as a sure fire way SAFE will fail, but ignored the cost of such an attack.

For your photo that ends up with one of its chunks in a “filling up” section then the cost impact will not be that much unless that section was getting very low on spare space. But for an attacker trying to attack a section with chunks then it’d cost them dearly as each chunk they write that goes to that section will be costing them more and more and more till the attacker runs out of funds.

Yes I’d hope that the client could have a warning mechanism for when a PUT would be above a certain threshold and allow pausing of the writing of the file/image and maybe the application could apply a (different) compression of the file which means the chunks will be stored in other sections.

While this might be nice I’d say that if you do not know what data you will be storing then there is no feasible way to set aside the space because the network does not know which sections the chunks will be stored in and thus cannot pre-allocate the space. The actual data determines which section each chunk will be stored in.

Also an attack could be done where the attacker waits for the price to be very very low and pre-allocates 100’s of PBytes of space costing a relatively small amount of coin. And then imagine a few do this. Then they store the data and since pre-allocated the spare space in a section is of no consequence and if they only send off chunks when the generated chunks are close in XOR space thus attack one or two or three sections with 100’s of PB. If they did it without pre-allocating the space then it might be costing them 1 coin per TB part way through the attack then 1 coin per GB then one coin per 50 chunks and so on. Thus the section survives and lives another day.


#84

That’s a beat idea, but it would be a shame to reserve the space ahead of it actually being needed. Maybe if vaults are going to be some sort if fixed size, it doesn’t matter - you have dedicated that space to the network.

Howevet, if vaults are to be dynamically sized, then it will be a bit wasteful. People could be using that space for other things until the network attempts to use it.

I suppose if vaults were small enough that you could dynamically add new ones when needed, it would provide a hybrid solution. Each vault would need dedicated storage space, but you wouldn’t have to reserve loads of space until the network needs it. This way the host is held to reserve what they committed to, but can commit to more as needed/agreed.


#85

Remember though we are going for no more than 33% bad actors. Other things can happen if there are 50% or more bad actors, but > 1/3 is a good metric to keep in mind in these thought experiments. The how do we ensure 2/3 at least are good is “probably” best kept to it’s own discussion. Then it allows us to reason much more about all of the other system components.

Ofc this can be debated quite a bit, but perhaps as a start, if we get the algo (edit, thanks @JPL it was spellcheck, the language of the internet :smiley: ) for safecoin solid with that assumption, we can push the assumption to routing and malice detection/node age etc. to provide us that as a requirement.


#86

aloalgog

Is that Gaelic? :wink:


#87

So this likely means that for a penny-stretching users there will be an app, that always creates a few slightly different files and checks what kind of offers it will get from the network? Actually this could be a default behaviour of the client, and it would ensure that sections fill quite evenly.


#88

Got an interesting question in the “price and trading” topic about if the current supply of Safecoins on the network will be known.

Will it be possible to know the current supply or aproximat current supply of Safecoins on the network?
From earlier discussions there seems to be a way.

But will it be possible with Safecoin as integers instead of as data items, will that affect the possibility to know the current supply?


MaidSafeCoin (MAID) - Price & Trading topic
#89

Using MDs we can only know an estimate by querying a portion of the safecoin address space and projecting from there. This is not accurate and possible to manipulate by anyone with a significant number of coins since they can select which coins to spend. Thus making the address space not follow any mathematical model. How much can the bias the distribution is the question.

Might be as easy as querying the client managers since the network needs to do this anyhow.


#90

Yes good point. The initial design phase can suffer if too much thought is spent just on malice / greed.

Looks like there are three possibly useful ways to measure spare space: sacrificial chunks, median voting, proof of storage prototype… so I think the chance of being able to successfully measure spare space is high enough that I’ll include it in my thinking again (whereas I had originally considered it unfeasible).

I think knowing spare vault capacity mainly affects the joining / disallow rule. Maybe it also means the target vault size idea suggested in the original post can be removed, since the network can simply pick new vaults in a way that best trades off size vs inclusion based on known spare space. Gotta think a bit more about how knowing spare space might change my previous assumptions…


#91

I think that this actually does improve the idea @mav so really good to hear. I also think this can be much more dynamic later on where not all nodes need the same resources, but for now, the assumption they do, at least in a group, will help us get to the optimum safecoin farming rate algorithm etc.


#92

Not really. If you have 2,000,000,000 coins and you spend the half that has ID’s that fall into the lower half of the ID range, a uniformly random sample will still pick up your remaining coins with the same probability and give a correct estimate. You can’t manipulate that in any way.


#93

I agree with @mav.

Indirection is needed, but to solve another problem: prevent vault operators from gaming the reward system by issuing get requests on the chunks they own. The solution would be a constant indirection where the group managing the source id of the chunk re-encrypts the chunk and send it to another group (the one corresponding to new id of the chunk).

This was discussed at length in the topic about Data Density Attack. In my mind "constant indirection with encryption" was the final solution found for this problem in this topic, but since then I didn’t see any proposal from @maidsafe about this important security feature. Sorry to highjack this topic but is this going to be implemented?


#94

A chunk that is requested often, wouldn’t that be cached in other groups? So not that much payments for the vaults in the original group of that chunk?
And maybe it would be possible to ‘redirect’ the payment to a vault in another, random group, instead of the chunks?


#95

Without constant indirection with encryption, the attacker would loop over all the chunks of his vault with a slight delay between the get requests so that when it is again the turn of a chunk it has been removed from cache.

I gave some example figures in this post.


#96

Some questions about RFC-0012 Safecoin Implementation, especially around the idea of sacrificial chunks.

This is all brainstorming on the assumption that sacrificial chunks are used, which may be an invalid assumption!

Stopping Farming

The calculation for farming rate will “ensure that farming stops if the sacrificial count is greater than or equal to the primary count”.

Why should the farming stop? Stopping seems really extreme. I know technically it’s just very very low chance, but the word used is “stop” so … why stop?

Optimum Rate

This is the proposed reward schedule based on how many sacrificial chunks are being stored:

Why would anyone store any primary chunks? They only get less reward by storing sacrificial chunks.

current_farm_rate

Old reward rate

The wiki says rewards will be based on a Sigmoid curve and “20% above average” will be the point at which the reward rate will “start to level”.

Is this idea still going to be used?

450px-Safecoin_farming_speed

Removal

@tfa I’ve noticed you have an uncanny memory for the history of features so am tagging you here… do you (or anyone else) remember why sacrificial chunks were removed from the code?

Proposed modification

Vaults receive the optimum reward for storing chunks in a ratio of 5:1 primary:sacrifical (ie 1/6 of their total is consumed with sacrificial chunks for optimum reward).

proposed_farm_rate

This is based on the Kumaraswamy distribution with A=1.365 and B=9 for a peak at approx 1/6; these values need to be engineered properly, and possibly be dynamic to keep the original rfc goal for “the farming rate to drop as the number of chunks increases”.

Old psuedocode

if TP > TS {
    FR = 1 - (TS / TP)
} else {
    FR = approximately 0
}

Proposed new psuedocode

FR = KD(x;1.365,9) where x = TS/(TS+TP)

Rate Granularity

Is the farm rate calculated per vault, ie each vault is individually tested for primary vs sacrificial chunks and then given an individual rate?

Or per section, ie the section adds up all primary chunks and all sacrificial chunks in the section, then calculates a farm rate?

Or per neighbourhood ie the section uses stats from neighbouring sections to accumulate stats on primary and sacrificial chunks being stored…

Or per network(!) ie nested neighbours of neighbours…

The process of determining and agreeing how many chunks of each type exist is not clear to me.

Another final disclaimer: I’m ignoring store cost in this design. Originally store cost is tied directly to farm rate, but I’ve ignored it here.


#97

Saved on traffic since safecoin was not implemented. And it was considered that a solution for spare space would be forthcoming. But since that is such a long time ago, many things have changed and I assume is no longer the considered reason, if indeed there is one.

Under the sacrificial system there was a incentive for farmers to only supply enough storage so that they got decent rewards. If reward rate was low they would downsize voluntarily or just stop farming since it was not worth it. So then collectively the space would (assumed) to reach a point where enough people were happy to keep farming.

On that graph (Fig 2) that is the “Network Reward Ratio” is that rewards vs space provided for your vault (normalised).

The reason I mention this was that doesn’t human nature take the sacrificial chunks system and basically size their supplied storage so they get a reward rate that they are happy with. So while not the optimal amount, over teh whole network would tend towards that.

Is your proposal one where the sections only allow a ratio of 5:1


#98

The intention of the KD is to reduce excessive waste.

The sigmoid curve doesn’t reduce the reward for storing a lot of excess data.

Whereas the KD actively punishes users that store a lot of extra data (by reducing reward).

With the sigmoid there’s virtually no difference in reward between a single 400%-above-average vault and two 200%-above-average vaults and four 100%-above-average vaults. They’re all at about the same 0.99 network reward ratio. I think that’s a bad way to go about it.

Is punishing for excess storage appropriate? I think so. The point of measuring spare space is to know when the network is stressed. There’s a point where supplying more spare space is meaningless to the stress measurement (I’ve used a preliminary estimate of 16% spare space being the point of diminishing returns, but could be anything). Rather than continuing to reward spare space (as per the sigmoid curve) it should be punished for not using those resources in a more efficient way (as per the KD curve).

In reality I’m sure both curves would work well. But the sigmoid concept has almost zero traction anywhere in public maidsafe materials, compared to some (still small) traction of the RFC0012 curve, so I just put all the curves in there for consideration.

The ratio can be changed to whatever works best. The ratio may be static (hardcoded into the network like 10m bitcoin blocks, pre-optimized by simulation) or dynamic (fluctuates according to demand like bitcoin difficulty).


#99

But the rewards per supplied space is reduced since the vault is less used. If the vaults fill up with the same number of chunks the average size vault earns more per unit of resource than the large one. The small one losses because a lot more %age of it’s space as sacrificial chunks (ie full and 1/2 sacrificial unearning chunks) <---- anyhow that is what I was wondering if that what the curve was saying.

I suppose the question was is the ratio enforced by the section to be (whatever) ratio the section determines is to be used or by the vault somehow itself without regard to the section (and that would be penalised wouldn’t it)


#100

Yes, this is much better than a straight line approach. I suspect confusion of 2 main connected notions here though (by all of us).

  1. The rate of rewards

  2. The amount of storage a node has

  3. is what we are discussing, but 2 was initially proposed as a mechanism where the rewards are influenced by the vaults storage being at least the group agreed quotient. i.e. smaller vaults get a chance to farm, but not a great on and larger vaults that store beyond the group’s data (moving to archive capability) have a higher rate, that tends towards stopping.

I mention that as another view of the same issue really.

Group personas altered in the code to simplify it. Data managers were the indirection layer that knew where chunks were stored and Node Managers could penalise/reward individual vaults. These are both important and likely to come into play much more soon with the data layer in vaults. It was just to remove complexity for the alpha releases that this was removed and also as we did not have a solid RFC. This will be fixed in discussions like this though :wink:

Nice, this does look very promising.

Yes, I agree we should here but not overall as this is the data item we need to balance the resource supply/demand :+1: