Update January 20, 2022

A key area of focus at the moment is membership, how elders keep track of the adults and other elders in their section, so they can handle new joins, splits, churn and promotions. This functionality is handled by the sn_membership crate. The algorithms in this crate are currently undergoing rigorous testing before being integrated into the network. This is not a new feature, as such, but rather a forensic tightening up of the halfway-house we’ve had thus far and a formalisation of the algorithms. Most of the team are currently working on some aspect of membership.

General progress

The sn_membership crate is being handled by @davidrusu. Almost all tests are passing now, so it’s mostly tidying up.

@bochaco is looking at flows within membership: what is the order of events when a new node joins or the oldest adult is promoted?

One aspect covered by membership handling is elder handovers, making sure that newly promoted elders have all the right information, keys and so forth. @anselme has got this to a good place now and it’s pretty much ready to go.

Away from membership, @chriso has been working through niggles with nightly testing and, having finished CLI documentation, is now writing up NRS.

On the data front, @yogesh is focused on testing data replication, and @joshuef has updated qp2p to the latest version of quinn.

Meanwhile @danda is plugging away at DBCs. Integrating Ring CT is pretty much done, and mints are the next step, although more work is needed to get mints working properly with Ring CTs.

@happybeing has made a lovely wee PR updating node logging. This should help save space on any nodes started and make logs easier to follow with commands like tail -f.

Membership

Membership operations cover new nodes joining the section, managing capacity, ejecting misbehaving nodes or reducing their node age, promoting adults to elders and ensuring new elders are properly equipped.

As a reminder, sections have seven elders and (unless future testing shows otherwise) 60 - 100 adults. Adults churn frequently, with dropouts, new joiners and older nodes coming when relocated from other sections. The elders need to keep track of section membership so they know when to allow new adults to join, and also when elders churn and an adult is promoted. They retain a list of all current adults and elders in their section.

Section membership is constrained by max section size. When we have such constraints in a distributed system, we often need to resort to consensus to decide between competing options. In our case of membership, elders need to decide which of the (many) nodes waiting to join a section should be allowed in.

We were not using a consensus algorithm up to this point, the current processes for managing membership sometimes gets tripped up when unexpected events occur.

The new sn_membership crate provides a leaderless BFT consensus algorithm providing good performance in an eventually synchronous network model. Following a merciless testing regime, it is now ready to be integrated into the network.

sn_membership works together with anti-entropy (AE) and distributed key generation (DKG) to manage the section membership. Here are some flows.

Node joining

A joining node interacts with an elder, exchanging JoinRequests messages until it’s provisionally accepted and receives the resource proof challenge and returns it to the elder.

Under the old system, once it had passed the test the node would be in, but this was a security risk and could cause blockages (see below). Now the elder sends a proposal to add the node using the sn_membership protocol to other elders of the section. sn_membership completes once we have Super-Majority over Super-Majority, that is, a super-majority of elders see that a super-majority of Elders have accepted this proposal. Once sn_membership has started, it’s guaranteed to complete (as long as our eventually-synchronous network assumption is not violated).

Once the proposal reaches consensus among elders, the elder sends back the approval to the joining node.

Adult promotion and elder handover

If an elder notices that the current elders are not the seven oldest nodes, then it sparks a vote on promoting the oldest adult(s) and demoting the youngest elders to make way.

The Elder Handover algorithm controlling this process, which is now ready to be integrated into the sn_membership crate, goes as follows.

An elder receives a supermajority of completed DKG shares to check the current elders are the oldest seven members.

The elder proposes a new set of elders.

The elders follow a sn_membership style consensus to decide on a single NewElders message. This step is required when we have a complicated chain of events that end up with multiple groups of nodes racing to complete DKG and become the next elders.

A list of current section members is passed to the new elder, the section authority provider (SAP) is updated and a new block added to the section chain.

The role of consensus

So why is consensus necessary for membership when other parts of the network rely on AE to stay updated?

Here’s an example. Let’s say a section is nearly full. The section size limit is 50 nodes (just for example) and there are currently 49 members. A new node sends a JoinRequest to an elder. Under a system without consensus the elder checks the capacity and sees there is capacity, exchanges AE messages, and provided the new node passes the Resource Proof test, it’s in.

But at this stage the section is ready to split, which when multiple nodes are trying to join can lead to conflicting priorities:

Let’s say, in the extreme case, all seven elders receive JoinRequests simultaneously from seven different nodes. All seven elders see that we have room for one more node, and since each of the seven nodes had passed their Resource Proof tests, each elder will allow their node to join the section.

But, upon anti-entropy gossip between elders, they find that their fellow elders will not accept each other’s new nodes as it would push the section capacity over the limit. The elders find themselves in a split-brain situation where each elder has a different view of section membership.

To prevent this issue from happening, the elders come to consensus on which nodes will be allowed in. With sn_consensus, each of the seven elders can propose up to one change. This means up to seven changes (join/leave) can be decided in a single round.

In the case above, sn_membership will mean some extra work compared to the elders acting on their own, but this overhead is only visible when we have many competing choices. sn_membership gracefully scales down to Byzantine Reliable Broadcast (BRB) when elders are in general agreement about the actions to take, we only pay the consensus overhead when elders start having disagreements. sn_membership is a peaceful method to resolve disagreements :slight_smile:


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

70 Likes

First comment muahahahahahaha

22 Likes

FYI, you need to use tail --follow=name followed by the path to the logfile, for example:

tail --follow=name ~/.safe/node/local-node/sn_node.log

The default is to follow the file descriptor but this changes with each log-rotation so you must tell tail to track the log file name.

Having a tiny fragment of my code in each Safe node makes me a very happy being! Next will be to update vdash to handle the changed logfile format.

28 Likes

Great update team Maidsafe! Any thoughts on when next official testnet will be? Seems like things are coming together nicely!

20 Likes

Thanks for the update, thanks for all the hard work that went into it.
Congrats to @happybeing on his PR

Im looking forward to seeing some of these developments in code that we can test ASAP either as an “official” testnet or in the community testnets.

I particularly liked

In the case above, sn_membership will mean some extra work compared to the elders acting on their own, but this overhead is only visible when we have many competing choices. sn_membership gracefully scales down to Byzantine Reliable Broadcast (BRB) when elders are in general agreement about the actions to take, we only pay the consensus overhead when elders start having disagreements. sn_membership is a peaceful method to resolve disagreements

The edge case is handled elegantly if with extra work but we only do the extra work in specific circumstances , normally joins should be fairly straightforward.

14 Likes

This is exactly what it felt like reading today’s update.

12 Likes

It isn’t clear how elders are chosen after a split. Do you or have you considered a variable elder set that grows as the section grows?

Example: A section starts out with the minimum number of nodes (N=35) that include a minimum number of elders (E=7). This provides an Elder to non-Elder ratio of 1:4. As the section grows, roughly the same ratio of elders is maintain until the section reaches the maximum size of N=70 with E=14. At that point the split occurs and half the elders go left taking half of the non-elders with them according to XOR addresses. The the other half (Elders and Non-Elders) go right. This starts the process over with two sets of 35 nodes, each with 7 well known elders.

7 Likes

I believe the elder count is fixed at 7!

8 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

10 Likes

My understanding is that when the section gets to its size limit, a supermajority of the seven elders vote for a split. Then the oldest seven adults are promoted to elders based on node age, with some kind of sorting hat if there’s a clash. Then records are exchanged as in Elder Handover mentioned in the OP. After that the split happens depending on name, so seven of the nodes that are closest in XOR terms split off to form a new section taking the closest 50% of the adults with them. I guess some sort of data redistribution would have to happen too to ensure redundancy, but I don’t know if that would be pre- or post-split, and keeping similarly named nodes together should minimise that.

14 Likes

Yeah, that’s what I figured. I suspect/suggest/hypothesize that a continuous elder increase over time from 7 to 14 as the section grows, rather than a step change right when the split is decided, may have some benefits. Everything would be setup and already working properly with abundant resources as one big section, the split is quick and painless after that. Think mitosis.

6 Likes

That could make the voting process / supermajority considerations very complex though. I think you worked out before 7 is a magic number (in a good way).

5 Likes

its been discussed for reasons of supermajority plus malice control that 7 elders is the sweet spot

(I will post here the thread that explains that)

5 Likes

Not really that complicated. Here’s a super-majority lookup table:

5 of 7
6 of 8
6 of 9
7 of 10
8 of 11
8 of 12
9 of 13
10 of 14

Also, the magic number depends on various trade-offs. Although 7 looks good, so do others in the range from 7 to 14.

6 Likes

But you’d probably have to change the voting algorithm with each node you add, and would any advantage be that great? The elders don’t store primary data (just a planned cache). All they do is make decisions, and the elder handover ought to be pretty quick and easy, certainly quicker than any data redistribution among adults. What would be gained by increasing their number incrementally as the section grows?

4 Likes

Stability? Security? It’s a hypothesis :wink:

Rather than ask a group of 7 adults to immediately switch roles and become elders, you bring one in at a time. This allows the current set of elders to vet an initiate.

5 Likes

Thank you, @happybeing. That is really appreciated.

I’m full of admiration for everybody involved in the project and capable of handling the complex mathematics of it. You make me feel small, but also proud to be even a tiny member of the group contributing to the common good.

20 Likes

I can feel the maidsafe’s fire!

11 Likes

Isn’t it possible to float these join requests around with a timestamp, then signatures are collected and the one that has the lowest timestamp and super majority of signatures is allowed in. After that pending requests have to be terminated by broadcasting the final joining request along with the already collected signatures of super majority to clear it out of the memory?

4 Likes