Update 30 June, 2022

Elders use Distributed Key Generation (DKG) to decide between two possible alternative courses of action. Each elder generates a key-share and sends it with its vote, and if a supermajority (5 out of 7) of these key-shares are received they are aggregated to create the full key required to validate the decision. We developed our own system to do this using BLS cryptography, which works fine most of the time - but not all of the time. Unfortunately when it fails the network can grind to a halt. This week @anselme provides a beginners’ guide to DKG, and he and @davidrusu explain how we are looking to make our DKG implementation more robust.

General Progress

@davidrusu and @joshuef have been digging further into making sn_node single-threaded as far as possible to get rid of the problem caused by locking. This will require a little refactoring, moving Comm - the code that handles messaging between nodes - out of sn_node so it can stand on its own better and potentially have its own thread.

Meanwhile, @bochaco has been working on the minting of the initial genesis DBC by the first network node, the handling of spentbooks so that nodes don’t need to confer with other sections to see if a transaction is valid, and other steps in DBC integration.

@bzee is getting stuck into information dispersal algorithms (IDA), which, as you may remember, is how we are reducing the storage and bandwidth requirements of the network, and @roland is working on some write module tests for DKG and assisting @Anselme with the new sn_sdkg crate (s here meaning synchronous).

Towards a more stable DKG for Safe

Distributed Key Generation (DKG) is necessary for the Safe Network to ensure that each elder in a section has a unique shard of the section key: the key-share. It allows the network to generate a secret key for each section, without ever having a single node know the entire section key. This makes the Safe Network more resilient to Byzantine nodes as it doesn’t need to trust all the elders.

The current DKG implementation has a flaw that’s been bugging at us for many months: The use of timers. DKG requires full participation from all nodes who wish to be able to sign. In times of high network activity, nodes can become backed up and fail to participate in DKG sessions in a timely manner.

With our current implementation, if nodes take too long to contribute to DKG, then the DKG session will fail. We’ve been keen to remove any time-related behaviours in Safe Network, embracing concurrent and asynchronous flows as a default.

We’ve been looking for an alternative to timer based DKG for a while now and we think we finally found something that will work by building off of the Membership and Handover work we’ve done.

DKG for the rest of us

Although DKG can seem like a very technical and complex thing, I feel like its usefulness and basic mechanisms can be explained to a 6-year-old.

Imagine a kingdom with a king. The king has an ivory stamp that no one can fake. Any letter with the king’s seal of approval is seen by the whole kingdom as trustworthy.

ivory stamp == secret key
seal of approval on a letter == letter with the king's signature

One day the king realises he is old. The king has four heirs, and he wants to be fair, so he asks his master craftsman to break his ivory stamp into four equal pieces that can each print out a part of the king’s seal. He then gives out one of these tiny stamps to his four children. Now in order to recreate the king’s seal of approval, the children need to use their tiny ivory stamps and each print out their seal parts next to each other until one can see the full seal again.

4 ivory stamp pieces == multisig
tiny stamp == key-share

After a while the heirs realise this mechanism is slowing the kingdom down because requiring all the seal pieces every time is a tedious operation, some of them are away and travelling at times, so getting all the seal parts can get pretty complicated. So they have an idea: they agree that as long as one can see more than half of the king’s seal on a letter (3/4+ seal), the seal is considered to be valid and carry the kingdom’s authority.

3/4+ seal == threshold cryptography

One day the heirs realise the master craftsman was dishonest and kept a copy of the king’s original ivory stamp and that he was selling them throughout the kingdom to some evil doers. The heirs then decide to create a new stamp for their kingdom. But in order to avoid last time’s mistake, they each hire a craftsman to create a single new piece of a new stamp. Each heir has their own seal that they created on their side. By joining them together they obtain a new royal seal that is unique and that no single stamp can fake. They decree that the union of more than half of these seals represents the kingdom’s authority. Since no one ever saw the full stamp, they can be sure that no one will cheat this time.

separately created stamps that make up a unique seal == DKG

In the Safe Network, DKG is used by elders to generate the section key that carries section authority. In SN terminology, the genesis key (key of the first node in the network) is like the king’s stamp. Once more members join, the genesis node and new nodes perform DKG to create a new section key. They each create their own key-share (tiny stamps), but they have never seen each other’s keys, so no one ever has the entire key in his hands. Every time there is a change of elders (the heirs), the nodes perform DKG to create a new section key. This is pretty safe because one would need to control a supermajority of the elder nodes in order to have section authority (more than half of the kingdom’s seal).

Poanetwork’s DKG

Poanetwork’s hbbft (Honey Badger Byzantine Fault Tolerant consensus algorithm) has a rather straightforward implementation of DKG that does not make use of timers. Their code has been audited by some knowledgeable crypto people, so that’s also a big advantage over ours.

HBBFT’s key generation algorithm in a nutshell:

  • Participants each generate Parts that they broadcast to the others
  • Participants check all the Parts and generate Acks (acknowledgements) for each of them, also broadcasted to the others
  • Finally, they can each generate their own keyshare from the Parts and Acks

There is a catch though! They all need to process the same sets of Parts and Acks!

From their docs:

If Alice’s Ack was handled by Bob but not by Carol, Bob and Carol could receive different public key sets, and secret key shares that don’t match.
One way to ensure this is to commit the messages to a public ledger before handling them.

Although there are no timers involved, this algorithm does require nodes to wait for each other.

Integration with Safe

We need to ensure that all participants have access to the same sets of Parts and Acks. Poanetwork’s docs suggest using a distributed ledger like a blockchain to ensure that all participants have the same sets. We don’t have a blockchain in Safe - but that’s fine. Our team had an idea with broadcast signed messages that doesn’t require full consensus! It works as follows:

  • Send Parts in a signed message
  • Wait until all 7 Parts come in, broadcast the signed set
  • Wait until all 7 identical signed sets of Parts come in
  • Send Acks
  • Wait until all 7 Acks come in, broadcast the signed set
  • Wait until all 7 identical signed sets of Acks come in
  • Once a node receive 7 signed sets with the same Acks, they can Generate the Key Share from the agreed on Parts and Acks

The fact that we have that additional round of all-to-all broadcast to get all nodes to see that all nodes have seen the same Parts and Acks makes this algorithm resilient to a certain amount of Byzantine nodes. It is also easy to prove that a node cheated by showing two different and contradictory messages that they signed.

Although this simple algorithm makes sure nodes agree on the same Parts and Acks, it does not guarantee termination in case of a certain amount of message loss. That’s fine, we embrace it and don’t see it as a failure.

If enough messages are lost, we assume that this set of candidates was not fit to become elders in the first place. At every churn event (node joining or leaving a section), the elders check if the oldest 7 members in the section are the current elders. If that is not the case, they ask the oldest members to start DKG. This process is called Handover.

Churn events can happen at quite a fast pace, so multiple concurrent DKGs are possible. It is a race between multiple DKGs, and whichever finishes first wins the right to go through the handover process. Many DKGs might be left behind either not finished yet or stuck due to message loss, but that’s fine, it’s part of the game. Each node keeps their current DKG statuses in a temporary set that is reset at every Handover.

This algorithm makes DKG a passive process: it is only message handling. There are no timers, no fetching data on other nodes, just a temporary set of ongoing DKGs waiting to be completed or discarded after the next handover.

If for some reason several messages are dropped and a DKG cannot be terminated this is easily handled. We can leave it to Dysfunction to eventually track down inactive/slow nodes and vote them off. This in turn triggers churn, and that churn triggers a new DKG round.

Links


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

57 Likes

Oh yes… first!

If I understand we have two critical bugs:

  • a deadlock which could be fixed by the thread reduction work
  • a stress based failure in the current DKG, being fixed using a modified hbbft algo

As we close in on those working continues to improve efficiency and add the missing features for SNT DBCs.

This is great news. Thank you team and good luck. :clap:t2:

29 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

What city is the top photo from?

16 Likes

Bronze! On the podium! Now to read.

14 Likes

ups, image links fail there… 2 secs and it will all make even more sense :smiley:

edit: sorted!

17 Likes

Nothing wrong with being fifth

13 Likes

or sixth :slight_smile:

Excellent explanation, I feel if I read it another 5-6 times, I could understand it.

14 Likes

Excellent progress! Work really seems to be coming together.

A little disappointed to be honest not seeing anything about sled removal progress for 2 weeks in a row now. Is there any news on that front?

Keep at it guys, thanks for all the hard work! Happy to help with new testnets when the time comes.

18 Likes

Oh how I look forwards to Thursdays. Liking the progress. Nothing to say but have at 'er.

15 Likes

Thanks for the update and all the work again! :beers:

A question:

If no one has seen the full stamp, then how can anyone know what the 3/4 should look like?

13 Likes

Excellent write up. Best dev update in a long while. Thank you to whomever wrote this.

12 Likes

Clue’s in the intro…

9 Likes

Ha, missed that. I went right to the meat and skipped the potatoes :nerd_face:

9 Likes

This was a very simplified explanation. What is available to anybody is a stamp checker that allows to verify that the message was stamped by 3/4 heirs, though the stamp parts are not publicly available.

If i am correct, in threshold cryptography the stamp checker is called a public key set and the stamp parts are called secret key shares.

17 Likes

you are correct. It’s not an easy explanation, but 3/4 of stamps are in fact the same as 4/4 i.e. when you have threshold of enough shares they represent the whole key.

It’s very similar to the IDA thing we have where enough parts make up the whole.

In threshold crypto it’s about pairs of points on a curve meeting at a point (pub key of section) whereas in ida each of the 3/4 shares represents a bit more than 1/3 each, it’s more like they are 1/3 + a bit more and the bit more is enough to create the whole file.

18 Likes

That’s New Orleans :wink:

7 Likes

I like the idea of leaving all the time(r)-based stuff behind, and moving to passive sieve of pure logic. Feels very, very right.

Where are you in terms of implementing this? I’m not asking for timeframe, but is this more like plan for the coming weeks or a description of something that’s well under way already?

9 Likes

In the initial stages. @Anselme is taking this one and has already been doing a good amount of background research and prodding at this. But the coding + integration into the code base is starting now.

12 Likes

Is @AndreasF still at POA? Miss that fella. Was such a good fit at Maidsafe. I think I remember him helping in some capacity with DKG initially for PARSEC while at POA.

7 Likes