Why this many Elders? I don't grok the Seven

IIRC it comes from routing simulations. Lots of details here:

3 Likes

It is worth remembering the one-time key situation here. So we pay nodes wallet address. If they cash in at any time then they cannot cash any more payments to that wallet. This means that nodes that cash out early, lose any more payment.

This feature shows nodes they are getting payments continuously but they need to wait, otherwise, they lose out. So it allows an early cash-out, but that in turn prevents any more payments. i.e. a kinda automatic cash out in churn if that makes sense.

6 Likes

It’s a bit dated in some ways, e.g. no PARSEC anymore.

The conclusions though don’t seem to justify Seven elders.

Only a ratio of 10% elders to total section size and the larger the section the better. Unless I missed something - I only skimmed the OP and focused on the conclusions.

There are a few reasons for 7. We want the lowest number possible as it’s actually more secure (with node age), which seems weird, but when you think of it, only the most trusted get to vote. You need to gain that over a while. This is what simulations were showing, small elder group and large section size (a lot of Adults) give security. The larger the section then the more secure.

Taking only node age/sybil then a section size of 60 showed that sybil attack was unfeasible. As I say though, that looked narrowly and did not take into account checks/demotions and killing of bad nodes. Therefore we can go less than 60 as long as there is monitoring and penalties, the more monitoring the smaller the section can be. It’s all a balance.

So why 7?

It’s all about supermajority (>2/3) and how many elders can we lose at once without losing agreement capability. So 7 means we need 5 to agree, so we can lose 2 nodes and still get an agreement (to add a new elder etc) . If we were less than 7 then we lose 1 node and get into trouble as losing node 2 means the section fails.

This assumes also we cannot repair broken sections which likely we can and also we should be able to get down to an elder count of 1 and still recover, but it means a fork in the section chain, which we can do. This is not in code yet, but section repair and recovery will need to happen. Perhaps not for launch but soon after.

22 Likes

Just to parrot and expand upon what @dirvine said, seven is quite a magic number, but magic for good reason. Let’s look at some examples if two (2) Elders are lost. For maximum theoretical safeness we need to assume the two Elders which are lost are “good” Elders and the maximum possible number of adversaries remain. If the 2/3 rule yields floating point numbers, we do standard rounding for the probabilities of how many malicious nodes exist in the elder group (2.33 malicious → 2, 2.67 malicious → 3).

Here goes:

  • 6 Elder Group Size with 6/3 = 2 malicious nodes → 4 Elder group size, 2 malicious nodes survive. This yields: 2 out of 4 consensus = 50% < 2/3 … FAILED with no simple majority.
  • 7 Elder Group Size with 7/3 = 2.33 malicious nodes → 5 Elder group size, 2 malicious nodes survive. This yields: 3 out of 5 consensus = 60% <= 2/3 … GOOD with a simple majority close to 2/3.
  • 8 Elder Group Size, 8/3 = 2.67 malicious nodes → 6 Elder group size, 3 malicious nodes survive. This yields: 3 out of 6 consensus = 50% < 2/3 … FAILED with no simple majority.
  • 9 Elder Group Size, 9/3 = 3 malicious nodes → 7 Elder group size, 3 malicious nodes survive. This yields: 4 out of 7 consensus 57% < 2/3, a MODEST but VALID simple majority.
  • 10 Elder Group Size, 10/3 = 3.33 malicious nodes → 8 Elder group size, 3 malicious nodes survive. This yields: 5 out of 8 consensus 62.5% < 2/3, but a STRONG simple majority very close to 2/3.
  • 11 Elder Group Size, 11/3 = 3.67 malicious nodes → 9 Elder group size, 4 malicious nodes survive. This yields: 5 out of 9 consensus 55.55% < 2/3, the LEAST but VALID simple majority.
  • 12 Elder Group Size, 12/3 = 4 malicious nodes → 10 Elder group size, 4 malicious nodes survive. (6 out of 10 consensus 60% < 2/3, same GOOD strength as an elder group size that started with 7, but more communication overhead required to support.

When you look at these examples, group sizes of 7, 9, 10, and 12 stand out as “magic” numbers with good properties. The examples show that:

  • An initial group size of 7 offers the lowest communication overhead and can withstand up to 2 lost elders while maintaining a simple majority.
  • An initial group size of 10 or 12 offers a modest increase in communication overhead, but could handle a loss of up to 3 Elders and still maintain a decent simple majority. (10 with 3 bad nodes → 4/7 = 57%).

Now, that being said, I question the validity of using standard rounding rules for determining the theoretical number of malicious nodes in the Elder group. For example 7/3 malicious nodes = 2.33. To me, this means that there is a finite probability that there could be 3 malicious nodes in the Elder group and consensus would FAIL if two good Elders are lost. (I suppose one could argue though that it already failed if we require 5 out of 7 consensus in the original group… maybe one of the bad guys plays nice and always votes like a good guy until it detects a weakened state). From a different perspective, accommodating only 2 out of 7 malicious elders (2/7 = 28.5%) means that the network isn’t living up to its theoretical maximum possible resilience of 33.33% malicious node handling capability. This “living up to its theoretical potential” perspective is probably a better way to view it…

If we presume a worst case scenario and always round up on the malicious node count probabilities then the outlook changes. Under that scenario an Elder group size of 12 is optimal for both minimal communication overhead and the ability to handle up to 3 good elders being eliminated. ( 12 with 4 malicious → 9 with 4 malicious, thus 5/9 = 55.55% consensus and a valid simple majority in the degraded state).

TLDR; An initial elder group size of 7, 9, 10, and 12 are the only good options if you want to keep the elder counts as low as possible. A size of 7 will work well to prove things out with the lowest communication overhead. Larger sizes may be more optimal long term, if you care about max malicious node handling capability and stability during elder churn.


Final Edit: A second look at this in consideration of the percentage of network failure needed to cause an instant loss of elders and consensus strength. This shows some interesting trade-offs. Consider the natural progression in malicious node handling capability, number of lost Elders the section can handle, percentage of instant network destruction this implies, and the resulting degraded consensus level.
  • 7 initial w/ 2 malicious (28.5% ‘baddies’), lose 2 = 28.5% network failure → 3/5 = 60% consensus
  • 9 initial w/ 3 malicious (33.3% ‘baddies’), lose 2 = 22.2% network failure → 4/7 = 57.1% consensus
  • 12 initial w/ 4 malicious, lose 3 = 25% network failure → 5/9 = 55.5% consensus
  • 15 initial w/ 5 malicious, lose 4 → 26.7% network failure → 6/11 = 54.5% consensus
  • 18 initial w/ 6 malicious, lose 5 → 27.8% network failure → 7/13 = 53.8% consensus
  • 21 initial w/ 7 malicious, lose 6 → 28.5% network failure → 8/15 = 53.3% consensus
  • 24 initial w/ 8 malicious, lose 7 → 29.1% network failure → 9/17 = 52.9% consensus
  • 27 initial w/ 9 malicious, lose 8 → 29.6% network failure → 10/19 52.6% consensus
  • 30 initial w/ 10 malicious, lose 9 → 30.0% network failure → 11/21 = 52.3% consensus
  • 300 initial w/ 100 malicious, lose 99 → 33% network failure → 101/201 = 50.2% consensus

The choice of 7 looks really great here. You get 28.5% malicious node handling capability and can withstand an instantaneous failure of 28.5% of the nodes. You would need 21 elders for 33% malicious nodes and 28.5% instant failure tolerance. The number of nodes required for 33% malicious handling and 33% failure tolerance is staggering, and less secure since we nearly lose consensus when degraded to 101/200 = 50.2%. Based on this quick look, 7, 12, and 21 are the best contenders imo. It may be a toss-up on communication overhead between group sizes 7 and 12 if there is a fair amount of Elder churn. I really like the fact that a group size of 12 ensures that the network can handle up to the theoretical limit of 33.33% malicious nodes, and a 25% instant failure handling is essentially the same as 28.5% if ~1/4 of the planet goes offline. We’ll probably need a restart at that point anyhow.

27 Likes

Excellent breakdown on this @jlpell. Thanks!

9 Likes

Well, that’s my interpretation to date. If it’s wrong, I’m hoping someone from the team will set it straight.

9 Likes

Along the lines of communications the general formula for 2 way communications in a group is n*(n-1) and for channels of communications (take 2 people talking as 1 channel) its n*(n-1)/2

For different number of elders we get (where the numbers are Count-elders 2-way (channels)

  • 5 - 20 (10)
  • 6 - 30 (15)
  • 7 - 42 (21)
  • 8 - 56 (28)
  • 9 - 72 (36)
  • 10 - 90 (45)
  • 11 - 110 (55)
  • 12 - 132 (66)
  • 13 - 156 (78)
  • 14 - 182 (91)
  • 15 - 210 (105)

As we can see there is significant increase in the number of channels between elders as the number of elders increase. Also the number of messages increase at twice that rate.

The communications between elders has to account for any lost messages and as the number of elders increase the chance of lost messages increases quite fast.

This is another reason for keeping the number of elders low, in addition to what @jlpell points out with his good analysis

14 Likes

@dirvine - thanks!
@jlpell - just WOW!
@neo - cool!

Now to read over it all a dozen times to try and grok it … mind is blown.

4 Likes

Thanks for the reminder on the handshake formula. I made a few graphics to help describe the previous posts.Here’s what the objective space looks like for initial group sizes of 4 to 30 elders. The colors indicate the group size.
p1

This is what happens to the feasible options when we demand <= 25% malicious node handling, and <=25% node loss /failure handling. Infeasible candidates are gray.
p2

A 2D slice of the data showing a pareto (non-dominated) front in the upper left corner for maximal malicious node handling and minimal communication overhead. Infeasible candidates are gray.

p4

A 2D slice of the data showing a pareto front in the upper left corner for maximal node loss/failure capability and minimal communication overhead.Infeasible candidates are gray.

p3

If you increase the criteria to 33% malicious node handling and 25% loss ratio, then an initial group size of 12 elders is the feasible candidate with the least communication overhead. This is shown below in black. I find this to be a rather elegant candidate, since it also meets the sybil resistance criteria when 8 to 12 adults per elder.
p5

17 Likes

This doesn’t make sense intuitively. But @dirvine , you have any rationale for this get.

‘The most trusted’ of the text means that the trust of the elders has been obtained for the longest time, and the attackers need a long time to gain trust to attack the network. This means SN obtains security by using time. The security of SN seems to take advantage of the asymmetry that it takes a long time to gain trust but no time to lose trust.

Also, in terms of security, dividing the entire network into several sections has the effect of reducing the risk of the entire network.

I want to think about this.

9 Likes

I thought because of what you said the 7 elders might be because SN uses a Byzantine releasable broadcast (BRB). I read about this in the book below.

FYI

4 Likes

I get confused between rfc-0058 Reliable Message Delivery and Byzantine Reliable Broadcast (jlpell link above). Are they different things?

2 Likes

Very thx, @jlpell I’ll read it carefully…

I’ve only had a superficial read/scan through the document. At first glance they appear to share the same underlying principles and the approaches seem very similar if not identical if you match up analogous features and nomenclature. I won’t have time for a detailed review until the weekend…

TLDR; I think the paper is essentially telling us that Safe will exhibit a secure and high performance communication system where the probabilities for any disruption have already been investigated by academics and shown to be very very low.

4 Likes

Very thanks @mav I read it carefully…
but, in the RFC, it says that BRB uses PARSEC… is it right?

The RFC is for Reliable Message Delivery (RMD) that someone in Maidsafe proposed. I don’t know if that is still used or not (though I think so) but apparently it is quite similar to Byzantine Reliable Broadcast (BRB)? which was something picked up from AT2 and I think adapted to the Conflict-Free Replicated Data Types (CRDTs) and more.

Mind you a lot was gleaned from AT2 like pushing sig aggregation client side, the introduction of BRB, etc but ultimately AT2 is not involved anymore and the direction for the economic system has been towards DBCs, as you are aware.

As for RMD using PARSEC, that would no longer be the case. PARSEC is the deprecated full order consensus mechanism that has been replaced by the likes of BRB, CRDTs, etc. at least to my knowledge.

If I have any of that wrong, anyone please correct me but I think that is pretty close to accurate.

5 Likes

These are different.
BRB is where a node tries to get a SuperMajority agreement. (so 5 of 7)
RMD is where we send a message to minority + 1 (superminority) (so 3 of 7)

RMD uses the assumption that since we say only 2 nodes are byzantine then send to 2 + 1 means we must have sent a message to at least one honest node who will progress the message. That means at least 1 honest node will send on to the next 3 nodes and this repeats until the destination. No parsec involved.

12 Likes

why set elder size to 5 again?

1 Like