Consensus 28/32 or 8/8?


#81

It’s not really a drop, it is just a deceleration. I would say this can be explained because there are less and less sections that are not controlled and so additional bad nodes are more likely to end up in an already controlled section than in an uncontrolled one.


#82

100% agree - misunderstanding :wink: i was talking about the drop that my earlier graph showed for ‘randomly selected group checks’

where when we reach 100% control the probability that was meant to represent probability for [at least one section] suddenly drops to 0 …which is a super strange move …

EDIT/ps:
well … and without thinking too much i said to myself ‘oh maybe it would be enough to get the signature of one elder as proof’

now that raises probably a couple of questions at once … yes if you’re in control of 40% of elders and probably in control of a section … if you pick a random elder you have roughly a 40% of chance of picking one of your own … so if you are in this situation and want to have a group checking on another group … chances are low for the second group to be in your control … but it probably will be disrupted by you … so it’s not really as good of an idea as i thought at first :innocent:

pps: and a big question is as well to me … what will still be able with a disrupted section and what not? if at least data retrieval,split/merges and coin transfer would still be able in kind of an emergency mode with n² communications … that would at least enable a heavily attacked network to let you watch the news of yesterday and let you do the shopping you need to not starve to death because of a disrupted safenet …


#83

Applying similar formulas in Excel I get the same results as @riddim.

Prob1 = Probability of having BlockingMinority or more bad elders in a section
      = ∑ C(SectionSize,p)*p**BadElderProb*(SectionSize-p)**(1-BadElderProb)
where p = BlockingMinority ... SectionSize

Prob2 = Probability of having at least one bad section
      = 1 - Probability of having only good sections
      = 1 - (1 - Prob1) ** Sections

For section size 31:

For section size 64:

Conclusion: 10% or 15% of bad elders able to disrupt one section, that’s not very high. The network needs to grow very quickly at the beginning to make an attack impracticable in terms of absolute number of elders.


#84

Or if disrupted sections could just be “passivated” and some Form of emergency mode would keep them alive and maintain the most basic functionality the rest of the network would still work even if locally sections could be disrupted…


#85

Is that section size 31 or 31 elders?


#86

In theory no. Because if elders in a section are 31 or 64 and you can only add say a high figure of 5 new nodes then you can only add 5 new nodes till that first lot aged. Then another 5. Of course others are trying to join as well and in the early days at a fairly high rate %age wise.

So you need at least 6 lots of adding and with others adding to it could be 9 or more lots of adding nodes network wide. And you would require sections to split more than once to have a real hope of getting many of your nodes through adulthood to elder and have enough to disrupt.

Thus it might require you to add more than 100% of the then current size of the network (# nodes). Maybe even 300% to ensure the multiple splitting of sections to get enough of your bad actors to elders in one of the sections.

Cannot really nuke a group because what happens to the data it controls? More likely a merge is the better way and nuke the bad actors that become apparent when the two sections are acting as one.

I was going to mention this but see that you did already.

I am wondering if you could explain C(SectionSize,p), and Explain how BadElderProb comes about.

Is p = BlockingMinority … SectionSize actually
p = BlockingMinority divided by SectionSize ?

eg p = 11/31

Are the ** in the equation to the power of

I haven’t been following as closely as I should and also others might want to know too.


#87
  • We sum the expression after ∑ with p going from BlockingMinority to SectionSize. For example, for BlockingMinority = 11 and SectionSize = 31, p takes the following values: 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31.
  • C(SectionSize,p) is the number of ways we can select p elders among SectionSize elders.
  • ** is power of
  • BadElderProb is the probability for an elder to be bad
  • p**BadElderProb is probability for p elders to be all bad
  • (1-BadElderProb) is the probability for an elder to be honest,
  • (SectionSize-p)**(1-BadElderProb) is probability for (SectionSize-p) elders to be all honest
  • For the second curve, we replace BlockingMinority with ControlMajority = 21.
  • For the second graph: SectionSize = 64; BlockingMinority = 22; ControlMajority = 43

#88

Hmhmm couldn’t the good nodes rejoin the network proving with their data chain the data they offer is valid?

Hmhmmm - if the logic really stays like this the network looks to me a little bit fragile (especially in the early days) to be honest… Might be beneficial to learn some rust and give it a go … It’s not like servers would cost a lot if you only need them for a limited amount of time… And one of them could host a couple of vaults… (+in the worst case scenario you would grab some of the early coins being farmed and still don’t loose all of your money…)

… You even can test and debug this attack by starting a ‘local’ network with standard vaults +add your malicious nodes until you are strong enough to perform the attack…


#89

What do you base the figures you use on (why the range used is valid?). That was the query and I think some might not see why you used that range other than to get a result.

Thankyou for the explainations though.

How does the neighbour determine this. All it can really determine is that there is disagreement for less than 2/3rds and if the section is controlled then how could it ever find out if the attacker did it 1/2 right. The bad actors in a controlled section would just let the neighbours see that everything is fine and there are a few bad nodes. (the bad nodes in this case would be the good nodes)

If just disruption then the merging of the 2 sections may result in a good section out of the 2. But if the neighbour has itself 16/31 bad actors (maybe acting good at that time) and the disrupted section has 16/31 bad actors. Then the merged section has 32/62 bad actors that could all then act bad and disrupt. Rinse & repeat.

So I doubt this neighbour (or watchers) can solve this problem

The problem is mostly solved by the ageing, limiting new nodes to a very small number, say 3 or less, and having the largest size network in the beginning as possible.


Of course we also need to realise that there might be a few persons/groups trying to attack the network at the same time and the coordination between the groups is simply not there because of competing interests. So even though there might be 11 or more bad actors in a section with 31 elders, they might all act good waiting till their group has a majority.

So in effect by not coordinating they are actually helping the network. But we cannot rely on this.

So detection of bad actors is the better way and having as large as possible network before going live with no chance of reset is absolutely necessary.

As you pointed out the attacker may need to add 100% of network size (nodes) in order to attempt this disruption because of limits on new nodes and ageing.


#90

It’s not like I would have thought that through - tbh I don’t even know what events will be stored in the chain - I just thought that if restarts are supposed to be enabled at some point nuking one section would be like some nodes that join the network after rgeboot


#91

Well surely if we have to restart the network because of bad actors taking over sections then the data would be wiped too. This is reasonable before the network goes live, but unthinkable after it goes live.

True. I often say things for others who may not have thought of it, and making sure we are on the same page or someone can point out how silly I was in thinking a certain way.


#92

This all started with PARSEC announcement: since then it has been mentioned several times that an attacker must control 1/3 of the elders to disrupt the network. This is simply not true and a control of about 12% of the elders should be enough to disrupt one section, which IMO is fatal for the whole network.

So, I first revived an old Excel file to show this. The problem with it is that it shows that on average 12% are needed but it doesn’t give a probability.

Then I found out the R statistical package whose qbirthday function gives the number of elders needed for a given target probability and confirms the result.

But then @riddim did better than that with a Python program that generates a curve giving the probability of disrupting at least one section in function of the fraction of elders controlled by the attacker.

And lastly I did the same with an Excel file. Both of our curves also confirm the result.

In summary, BadElderProb = probability for an elder to be bad = fraction of elders controlled by the attacker = x axis of our curves that demonstrate this lower threshold.


#93

Just noticing this one. PARSEC states that (as with many sync or async consensus protocols) that it is secure with up to 1/3 attackers (colluding nodes). I suspect you are looking at this differently from me right now.

If we consider the network has only elders and only one section, then as long as over 2/3 are honest there is no problem, or are you seeing this part differently (just trying to understand the point here). Be good to get on the same page.


#94

Yes, this is what I did.

The problem is that 1/3 and 2/3 thresholds are valid only within a single section. The network has many more sections and the posts above show that the attacker needs lower fractions of elders to disrupt or control at least one section.

This can be explained with the birthday paradox and this is what I am trying to show with different methods since 5 days in order to correct this naïve misconception about the 1/3 and 2/3 threshold.

For example, in a network with 3226 sections and a fixed section size of 31 elders he only needs to control:

  • 13% of elders to have a 99% probability of disrupting one section
  • 40% of elders to have a 99% probability of controlling one section

(results given by the latest curves in the topic)

For an average section size of 31 elders the figures become respectively 12% and 32%.

(results given by the qbirthday of R statistical package)

Personally, I think this a problem only when the network is still small. When it’s bigger (several million elders) controlling 10% will be impractical.


#95

Not so sure this is naive at all. If we take a network of one section then this holds true. This is the starting point I was trying to get to.

So the statistical thing about the birthday paradox will mean we need to consider more sections and look at it a bit different IMO. We are looking at a situation where there could be lots and lots of nodes trying to join and with the birthday paradox in play then we are saying an attacker who had 13% of all nodes wanting to join. The chance they all join the same section is what we are looking at here, so X sections against where they join. This is where the bday paradox situation arises.

Assuming you agree with the above then we have a few considerations to make. If we ignore infants and adults as well as node age (to take time and effort out of the attack vector) and have lots of sections then you can carry out such statistical analysis. I do agree with that.

So PARSEC is a single section network as described in the paper/rfc etc. This does hold that it is secure against 1/3 bad actors. That is how these papers work AFAIK, assume a single static network with no changes and there you go. PARSEC for an apples to apples comparison of other consensus algorithms does this. It goes a little further to show a common/concrete coin to deal with adversaries trying to stall consensus.

Then back to the bigger question I think you are raising, assuming no node age and a population of nodes to join a network. If we restart the network X times (say 3226 times as per your example) then we probably can say that given an equal chance of joining there may be a 99% probability a 13% holder of nodes could disrupt a section.

I hope I am on track here @tfa Are you in agreement so far?


#96

In my mind PARSEC was a protocol within a section. You made me doubt about this, so I looked for some confirmations and found these ones:

  • first sentence of RFC:

    In this RFC, we propose an algorithm which will allow a section of network nodes to reach consensus on the validity and order of network events by voting on events and using a gossip protocol to disseminate these votes amongst themselves.

  • node states in routing/src/data_chain/node_state.rs which is linked from the topic in dev forum

/// The state of a node in a given section as voted for by the other elders in that section.
#[derive(Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy, Debug)]
pub enum State {
    /// Node has been accepted to this section as an elder.  May have been relocated here, or may
    /// have come back via a merge or via a promotion from adult.
    ElderLive,
    /// Node has disconnected.  Can rejoin here once in order to be relocated.
    ElderOffline,
    /// Node has been accepted to this new section (indicated by `to`) as a result of a section
    /// splitting.
    ElderSplitTo {
        /// The section prefix after splitting.
        to: Prefix,
    },
    /// Node has been demoted to non-elder due to an older node being made `Live` here and
    /// displacing this one.
    ElderDemoted,
    /// Node has been chosen to be relocated to another section.
    ElderRelocated,
    /// Node was marked as `ElderOffline` and has rejoined here.  Only one rejoin attempt is
    /// allowed.
    ElderRejoined,
    /// Node has been accepted here as a non-elder via a relocation, a merge or a demotion.
    NonElderLive,
    /// Node has disconnected.  Can rejoin here once in order to be relocated.
    NonElderOffline,
    /// Node has been accepted to this new section (indicated by `to`) as a result of a section
    /// splitting.
    NonElderSplitTo {
        /// The section prefix after splitting.
        to: Prefix,
    },
    /// Node has been chosen to be relocated to another section.
    NonElderRelocated,
    /// Node was marked as `NonElderOffline` and has rejoined here.  Only one rejoin attempt is
    /// allowed.
    NonElderRejoined,
}

Clearly, the protocol is for reaching consensus within a section and so the birthday paradox is applicable.


#97

It really is crossing a lot of perhaps confusing lines to think/state that. I suspect we are talking at cross-purposes. Our code for sure will consider sections, let’s call that sharding.

So the process we are taking is

  1. White-paper plus RFC for a static permissioned network (i.e. no churn or sections). This is where we are today in papers, but not code as that has progressed much further.

  2. White-paper for dynamic membership of a network (or a single section in a single section network). This is currently being worked on. Hopefully to be published very soon.

  3. White-paper for full sharding (many section networks with dynamic membership and secure message relay). Published last.

You are sort of considering point 3 using the info we have provided in point 1, if that makes sense.

For SAFE we would have to look at point 3 plus node age, penalties, and safecoin.

I really think what you are doing is valuable, but it is just a bit confusing right now. A few years back we considered the birthday paradox quite a bit when nodes restarted direct onto the network with a “random” id, This is the case you are currently looking at, but we do need to factor in all the other parts.

This is less to do with valid quorums and much more to do with the random creation of subsets of a population. It is valid but requires ignoring certain aspects of the network. In saying that it is well worthwhile to do this a bit at a time, a bit like mentioned above (the 3 steps) and add in random relocation and node age etc. IF we can analyse each step like this then it will help everyone to be clear on every step and more importantly why penalties, age, relocation etc. all matter and hopefully we can calculate how much.


#98

Thinking further @tfa I suspect it is the 1/3 2/3 thing that is being confused with a birthday paradox situation. It’s not quorum, that just means you need more tries (days in year in the paradox) as a higher quorum is kinda equivalent to just requiring more birthdays in common.

I suspect that is the confusion.

So if you then look at it like this, we are onto something. So forget quorum % and consider that even with points 1, 2 and 3 in place there exists a birthday paradox if there are no other mechanisms in play. That may be easier. The Byzantine figure t in the papers (1/3) then only affects the calculation of that paradox for a given percentage probability of controlling/disturbing a section.

Sorry, all this is quite short in terms of description, I am trying to take a week or 2 on holiday, so trying to limit my forum/slack etc. time and get some headspace. I will hang around as usual though for interesting things like this and Mav’s other brainstorm topic(s) :wink:


#99

well … but all the dynamic membership and penalties, age and relocation doesn’t really matter imo … an attacker would launch the attack only if he’s certain he will succeed … so he’ll behave good as long as he thinks he has a chance to succeed … and if he’s around from the beginning his chances are much higher to supply a significant amount of elders …

if you constantly keep throwing vaults at the network and keep aging those without misbehaving and just waiting for your chance… it is like throwing dice i would say who in the network is a good guy and who is one of the evil nodes … so here we go with a monte carlo simulation … 20 rounds with 100k sections … 32 elders per section and 35% network share of the attacker

sure - the standard-numpy-random-numbers-generator is probably not super-random … but still it meets the expectation to be successfull once in a while … (but not super-super-mega often)

ps: and sorry for distracting you from your holiday :-/


#100

Agree. Parsec only defines, clearly, the percentages necessary to avoid reaching consensus or to control a section but it does not modify that, in a permissionless network, if the attacker has enough power, and patience, it will end up bothering or controlling a section.

BTW, all your calculations are consistent with those made by @mav months ago