Consensus, async, Ae : What does it all mean and where does it matter?

The increased tolerance in such a case is fragile, because it could be only a matter of time before the malicious actors realise a way to collude and profit. Being malicious to start with is the basic problem. Finding ways to collude (and profit) are to be expected.

But that brings in the probabilistic view of number of byzantine nodes.

The assumption is that no more than N/3-1 nodes will be malicious.

But I take that to be a global assumption.

A (long held) thought:

The assumed and expected number of malicious nodes cannot(?) be expected to be uniformly distributed.
With a normal distribution, there will be sections where both lower and higher than this ratio are malicious. With many sections, it is not unlikely to exceed N/3-1 in some section (one of the tails).

All that is needed then is for those to realise this and to start colluding.

I don’t know if the reasoning holds, but I’ve thought it interesting to try get to the bottom with at some point.

5 Likes

A local majority imho is very likely at some point in the life of a real network… If we manage to get a a section checked by another randomly selected section then we should be okay… But maybe that is an issue for an optimization when we have our first stable network…? =D

3 Likes

I should say @Bogard I really enjoyed this interjection. This is what makes us strong. I Am not sure if you have completed the critique here, but it has been really helpful in checking all angles under all those stones. We MUST do this kind of critique and continually check and recheck.

Especially in this case as I belive the industry has feked up choosing 3f+1 as population sizes. I think some have realised, but not said so or realised the impact.

This is a great one to get many involved as it has some math, but not heavy duty statistics or calculus etc. I think this has been engaging and should help everyone understand the depths we must look at for this network. We need commodity computers being the keystone here and it means we need to squeeze every wee benefit we can from logic. It’s way too easy to get complex and say you must buy a huge computer setup to make this work. That v bad in my eyes. We need to get the best from what we have and this helps.

An important distinction is being made here as we show we don’t need a supermajority in agreement, but instead a straight majority. It buys more safety, liveness and termination likelihood.

12 Likes

The answer to this question is not relevant for our purpose. Let’s take a step back. The primary concern I was addressing about the main post is whether consensus is needed (yes it is). Once we settle that consensus is needed, then what’s the number of deviant nodes that can be tolerated by the network? That number is f, with N being the total number of nodes in the system in N = 3f+1. That equation describes a BFT world and you can’t add CFT allowance on top of it as was being proposed because it’s already inclusive of it, i.e., CFT is a subset of the superset that is BFT. A simpler way to think about it is that BFT is a galaxy and CFT is a solar system within the BFT galaxy.

In the CFT subset, because a single entity is in control of all the nodes, there is clarity that crashes won’t lead to forks as the controlling entity will resolve the fork given its control over all the nodes to begin with. But in the BFT superset, that assumption can’t be made because for instance crashes may be just the initial volley in an attack cascade that will result in a fork. Even a mild crash case can result in “forks” because a stalled network can mean having to build on a different branch depending on the severity of the crashes. But the CFT case is better protected from take-over attacks because only one entity is in charge anyway (in a practical sense).

This is exactly what I’ve been saying. It’s great that we’re in agreement.

I appreciate that @dirvine. Only mean to help, so I’m glad it did.

7 Likes

This is the way we’ve seen it always so far (afaik).

I think @dirvine has been exploring a possible nuance miss in there.

I’m not sure myself if there is one or not. It’s intriguing for sure.

But about this…:

…I must say that I think that is formulated with great clarity and flexibility of imagination, and it resonates with my thinking.

3 Likes

Can we get a refresher on the requirement for 2/3 super majority during consensus?

For example, consider 12 elders with 4 colluding nodes and 3 faulty/failed nodes. After loss of the 3 failed nodes we end up with 5 good nodes vs. 4 colluding nodes. A 5/9 = 0.555 affirmative vote is still a simple valid/correct majority… isn’t that good enough? To require a 66% super majority after removal of the failed nodes allows the attack nodes to disrupt consensus, no? Please offer some clarity on this.

8 Likes

Good question

With N==12, we still only have a tolerance of <N/3 i.e. 3 BFT nodes.
We still need 2f+1 (7) as decision maker size (this is the key we should always focus on)

Therefore in a broadcast consensus algorithm with N==12 we can lose 3BFT nodes and 2CFT nodes before trouble.

Your question then has us look at N larger than 9 to see what happens (interesting)

If we look at various ranges for f

f min N (3f+1) max N MAX N as f
2 7 9 4f+1
3 10 12 4f
4 13 15 4f-1
5 16 18 4f-2
6 19 21 4f-3
7 22 24 4f-4
8 25 27 4f-5
9 28 30 4f-6
10 31 33 4f-7
11 34

We can see the pattern. As the min N grows by (3) each time it constrains the max N which is not 4f+1 when you get past f==2. The Max N is 3f+1 + 2 :slight_smile:

So this table the MAX N continues 4f-8 4f-9 and so on. It’s an interesting one to plot out and look at why? Min size is 3f+1, decisions are always 2f+1 and bft is always <1/3. These are the constraints. So the Max N is constrained by the next 3f+1.

It’s 5:30am here. So some math guru can add more explanations on the bounded by <N/3 population min size (i.e. bounded by steps of 3).

Great question @jpell I hope this helps. It’s the strictly less than 1/3 that catches us out so many times (hard to write unless expressed as a fraction)

15 Likes

It does help. To rephrase my earlier question in light of your response… why isn’t the min size equal to decision size at 2f+1? Perhaps I’m unclear on your definition of “min size”…

Why can’t you tolerate f=N/3 ? ie. 4 BFT nodes in that case? In other words, why does the example I’ve used above: N=12, with 4BFT and 3CFT not work?

I ask because there would appear to be some interesting trends when BFT = N/3 and CFT = BFT-1.

N BFT CFT BFT/N CFT/N N*(N-1)/2
3 1 0 33.33% 0.00% 3
6 2 1 33.33% 16.67% 15
9 3 2 33.33% 22.22% 36
12 4 3 33.33% 25.00% 66
30 10 9 33.33% 30.00% 435
300 100 99 33.33% 33.00% 44850

That is the decision size. if we made that min size then we need 100% participation in all votes.

Interesting. So you are looking at being able to increase BFT nodes by 1 when N==12. That then sets f==4.

So we then have 2f+1 go from 7 to requiring 9 votes. with 4BFT. In this case we still defend against a double spend, but we do allow BFT nodes to stall the voting. i.e. we need 9 votes but only have 8 honest nodes. So we depend on the BFT node. i.e. more votes are required and all votes will include a BFT node.

If we stick with the BFT==3 then we need only 7 votes (more liveness) and we have 9 honest nodes to make that decision.

1 Like

Yes. I suspect this offers better guarantees on max BFT and CFT ratios for any max N. But it means that you must require only a simple majority instead of supermajority during votes.

Does this really stall voting if you only require a simple majority, though?

I don’t understand why this matters. Worst case, all you need are at least 5 honest nodes to outnumber the 4BFT after the 3CFT have crashed, no?

Yes, but there will often be BFT nodes in any random voting pool of size 2f+1 regardless of f=3 vs f=4 for N=12.

I see your point. But this is only because your BFT/N ratio is only 25% instead of the 33%. My question is do you really need 9 honest nodes to make the decision? It’s a question of tradeoffs, what is more important, BFT or CFT?

If BFT is more important you could go with N=12, BFT = 4, CFT = 3 and decision size = 9. On the other hand if CFT/liveness is more important you could go with N=12, BFT = 3, CFT = 5 and decision size = 7. These choices trade-off the ideal 33% BFT/N and 25% CFT/N (f=4,N=12) compared to 25% BFT and 41%CFT (f=3,N=12).

I think we are missing each other here somewhere :smiley:
I will try and break this into 2 parts

BFT tolerance of <=N3 with N==12

f == 4
f+1 == 5 (min set that must include an honest node, used in many algorithms (pbft etc.) as a commit)
2f+1 == 9 (Consensus agreed)

BFT tolerance of <N/3 with N==12

f == 3
f+1 == 4
2f+1 == 7

Are these the figures you are using?

2 Likes

Yes. I’m with you.

4 Likes

Cool, thanks.

So if we look at the f==4 above we can see that 3f+1 (min size for consensus) is 13 (i.e. we need more nodes than we have).

The reason being The consensus requires 9 votes here and we have 4 BFT nodes. We have 12-4 == 8 honest nodes. Therefore the BFT node can stall this vote. He has control over the vote completing or not.

with f==3 we see that honest nodes (9) can and will reach consensus (we only need 7), assuming we get messages to them all (as in the work @Mostafa615 and @davidrusu are doing right now).

That ability to stall could prove dangerous, so we should try and avoid that. i.e. an attacker could stall promoting any node until we select his pal or similar.

This is also the liveness property that we need to maintain. That says basically that the protocol will continue in the presence of BFT voters. Also correctness, i.e. an honest node proposing something honest will be agreed on by all other honest voters (termination).

9 Likes

Thank you for that clear explanation @dirvine. I understand your view. We’re getting to the crux of my original question about vote stalling and super-majority votes. Controlling elder vote completion is where my understanding of your recent work falters.

Typically the “3f+1” Byzantine Agreement criteria is a foregone assumption a priori for consensus thanks to Fischer, Lynch et al. Yes, I know that 3f+1 has many “proofs” out there, but I’m not sure the main assumptions that go into that holds here…). For brainstorming purposes (dangerous, I know) I was questioning whether or not a “3f” criteria could suffice given the network scenarios we’ve discussed in the past…

3 Likes

Always good, always.

3 Likes

@dirvine, I’d like to continue this brainstorm. It may be incorrect, but I wanted to finish the thought. The reason why I am taking the time to write out this long post is because I agree with your basic argument in the OP that an interesting perspective on liveness may be obtained when honest crashed nodes are considered separately from dishonest byzantine nodes. This is mostly a theoretical argument, so the degree it applies to a real network will vary. Reading over some of our previous posts it seems to me that the reason why we have been talking past each other is my fast and loose use of the classical ‘f’ term for representing faulty nodes, and by not explicitly discussing byzantine faulty (“bf”) nodes vs. crash faulty nodes (“cf”).

Here goes:

  1. Let us agree that a crash faulty node is an honest node that is lost due to a network communication problem or perhaps simply suffered a power outage. Once a crash faulty node goes down, it is gone forever. A byzantine faulty node is one that is maliciously trying to subvert consensus. It can be dishonest, choose to appear like a crash faulty node to a subset of honest nodes, or it can intentionally crash forever. Some may argue that the crash faulty behavior is a subset or weakest form of byzantine behavior. One could also argue that it’s good if the byzantine nodes decide to become honest nodes that crash forever. :wink:
  1. Above you have described the classical argument. Let’s dig deeper. Imo the “partially synchronous model” of Dwork, Lynch, and Stockmayer, “Consensus in the Presence of Partial Synchrony” (1988) is a good representation of communications between elder nodes. The partially synchronous model consists of an asynchronous “attack” phase followed by a synchronous “normal” phase. We assume a public key infrastructure so that malicious nodes cannot impersonate the signatures of honest nodes. The model guarantees safety and liveness in the normal phase, and safety during the attack phase. The paper also lays out the specific conditions on synchrony vs. asynchrony for which N >= 2f + 1 compared to when N>=3f+1 is required.

The classic proof that you referred to showing f=N/3 is impossible , thus requiring N>=3f+1, is typically summarized thus for binary agreement:

  • Three nodes A, B, and C need to reach consensus, B is byzantine aka “faulty”. (N=3, f=1).
  • B interacts with A as if was an honest node passing a 1 message.
  • B interacts with C as if it was an honest node passing a 0 message.

Next comes the sneaky part: the partial asynchronous phase implies/requires that a SECOND ADVERSARY has stalled communications between A and C !

  • From the perspective of A, the node C has crashed or is byzantine.
  • From the perspective of C, the node A has crashed or is byzantine.

Thus:

  • A eventually agrees to 1.
  • B eventually agrees to 0.
  • Consensus between honest nodes has therefore failed.
  1. Based on this result. The classical conclusion is therefore that N >= 3*f + 1, where f is the number of byzantine nodes (f = bf).

  2. TLDR; It looks like there has been some brushing aside of an important detail here. Partially synchronous communication is approximated by always dealing with two simultaneous adversaries, not one. The second adversary is not usually described in an explicit manner during the conclusion on minimum group size, N. Imo, the classic impossibility proof more correctly applies to the stipulation of (N=3, bf=1, cf=1). This is intuitive since for either cases of (N=3, bf=1, cf=0) or (N=3, bf=0, cf=1) consensus, ie. agreement between honest nodes is possible since N >= 2 f + 1.

With two explicitly defined adversaries, the impossibility proof implies the following three constraints on N that must always hold true:
(EDIT: Fixed error in the original relations…)

  • N >= 2*bf + 1
  • N >= 2*cf + 1
  • N - cf >= 2*bf + 1

Rearranging the third constraint gives:

N >= 2*bf + cf + 1

We are now faced with three scenarios, 1) where bf = cf, 2) where bf < cf, and 3) where cf < bf. I interpret these as such.

Classical Impossibility Conclusion : (balanced byzantine and crash fault tolerance):
cf = bf = f
N >= 2*bf + cf + 1 → N >= 3*f+1

Alternative Impossibility Conclusion #1 : (strong crash tolerance, weak byzantine tolerance):
cf >= bf + 1
N >= 2*bf + cf + 1 → N >= 3*bf + 2
N >= 2*cf + 1 → N >= 2*bf + 3

Alternative Impossibility Conclusion #2: (strong byzantine tolerance, weak crash tolerance):
bf >= cf + 1
N >= 2*bf + cf + 1 → N >= 3*bf
N >= 2*bf + 1 → N >= 2*cf + 3

We can see that the classical conclusion of the N >= 3*f + 1 rule is dominant for the degenerate case where ( f = bf = cf ).

It offers a rule for equally strong byzantine and crash fault adversaries. The alternative cases offer an example where the resident byzantine adversary is assumed to be slightly weaker than the crash fault adversary or vice versa.

The performance of the alternative rules converge to that of the classical rule for large N. However, for small group sizes these three rules have interesting implications related to the crash fault tolerance vs. byzantine fault tolerance of the elder group.

Final note: Perhaps this is the wrong perspective. In the end, the classical argument offering a balanced result is still likely the preferred configuration. Even so, I considered the thought process to be rather relevant to the OP.

4 Likes

I am not sure. I really focus on the f being strictly <N/3

Here’s a note I made yesterday in slack to see if it helps.

What I was saying is that we want

  • f < N/3 for max byzantine tolerance
  • Each vote we want Byzantine nodes in minority i.e. 2f+1 give us that.
  • We also want any 2 votes with different members to have an overlap of at least f+1 to ensure at least 1 honest node (This is when we have N == 3f+1 + 2) This overlap means no double voting.i.e. the byzantine nodes vote different in 2 votes, can they do damage?

I Claim no as worst case we have N=3f+1+2 (i.e. “range” of N is 3f+1 >= N <=3f+1+2)

  • a vote is 2f+1
  • We only then have f nodes who have not yet voted and worst case our f byzantine nodes will vote again
  • The most votes this can be is 2f so it does not work.i.e. stick with 2f+1 to ensure majority honest nodes.
  • f <N/3 In this way the byzantine nodes have no influence over the voting in this group, even if they vote for 2 separate outcomes in the one round.
4 Likes

A minor follow-up note here:

I found a problem with my previous post and updated the 3 constraint equations for each impossibility result. Your (N=3f+1+2) plan satisfies Alternative #1 above for strong crash tolerance and weak byzantine tolerance. It is equivalent to (N >= 2bf + cf + 1) for bf=2 and cf=4. As you have already pointed out, given the interval between (n=3f+1) and (N=3(f+1)+1), this puts a cap on cf, where (cf <= bf+2). TLDR; I’m basically just trying to describe what you’ve been saying in terms of the simple (N >= 2*bf + cf + 1) constraint I showed above.

1 Like

Anything brewing in regards to this part 2?

2 Likes

We are doing a ton of work with stateright to confirm a few points. It’s looking very promising for this though.

8 Likes