Update May 12, 2022

Many thanks to @stout77 for another cover image :bowing_man:


Update May 12, 2022

One of the simplest but also most fundamental and important features in Safe Network design is Node Age. Essentially, Node Age replaces systems like Proof Of Work in rewarding good behaviour, punishing bad, and making life very difficult for a Sybil attacker. It provides an important measure of the quality and ongoing trustworthiness of every node, and is our featured topic this time around.

General progress

On the back of our groundbreaking work with DBCs, in which @davidrusu and others have taken the ‘digital cash’ concept in a whole new direction making it Byzantine fault-tolerant and thus fit for a decentralised network, we are happy to announce that David Rusu will be heading up a new Safe Labs division. This will be our R&D umbrella for state-of-the-art cryptography, networking and more. Research will be primarily Safe-oriented rather than blue sky, but we want to pull in expertise from wherever it may exist in a more formal and structured way.

@Anselme finalised a PR for checking the SAP on handover, and has started looking into Byzantine behaviours in handover (the process of redistributing data on a churn event).

David Rusu gave a presentation on conflict-free replicated data types (CRDTs) at a Toronto CompSci meetup, mentioning what he’s been doing at MaidSafe (naturally!). Lots of interest in the topic and plenty of contacts to be made. He’s going back for another one on CRDT trees.

@Bochaco has completed a PR to check permission at the client side when performing operations on registers (mutable data) and is also working on the spent book client API.

And @Chriso has been looking at testnet failures caused by the temporary removal of features like max-capacity.

Testing internally, Metricbeat has shown us some nodes creeping up to some verrry high mem usage over a day or so. Diving in, we realised that there appeared to be quite an edge-casey deadlock occurring there (centred around clean-up of connections). We’ve a few fix options here and so are just looking and testing to see what makes the most sense there.

Meanwhile @Qi_ma gave a talk to the team on Node Age.

Node Age

Every node on the network has an address which is decided by as its ID, which is actually a key that’s generated when it joins the network. This node ID is essentially a very large random number. Its first few bits (eg. 0101101…) determine what section the node will be in and therefore what data it will look after, while the last eight bits (e.g. 00000101) signifies its Node Age - in this case 5.

When a node is first accepted into the network it is given a Node Age of 5, so its ID ends …00000101 (the joining node must keep generating ED25519 keys until it gets one with the correct ending and the correct prefix, generally a sub-second process).

The longer the node remains an active participant on the network, the larger ite Node Age will grow, up to a highly unlikely maximum of 255. But there are a couple of catches: (1) its Node Age will only grow if it proves itself reliable at storing data chunks and giving them up when requested over a certain time period. (2) Each time its Node Age is incremented, it must move to another section.

But Safe Network has no concept of time, so how can we track how long the node has been behaving? The answer is we use churn events (section membership changes) as a proxy for time.

Churn ID - The Decider

Each section will contain 7 elders (decision-making nodes) and 60+ adults (storage nodes). Each time a node goes offline or joins the section, which happens frequently with adults, elders vote on what has happened. Each churn event has a 256-bit ID, which is the combined BLS signature of 5 out of the 7 elders. This churn ID is also effectively a random number and cannot be predicted beforehand.

If the new node proves itself to be dysfunctional within the first few churn events it will be ejected and will need to ask to join again. No point in wasting resources on a dead weight.

On the other hand, if our new node performs its duties properly for a few churn events we want to reward it and increase its age by 1, but we don’t want to have to track it and record when it joined etc. So we use the churn ID as a sort of lottery ticket.

The churn ID (a random number, remember) provides two functions so far as nodes are concerned. First it provides a way for nodes to get their Node Age increased, and second, since we don’t want nodes to build their reputation in just one section because of the risk of malicious behaviour, the churn ID also decides which random section the newly promoted node will join.

Chance of promotion

If the churn ID is modulo divisible by 2 exp Node Age (churn ID % 2^age == 0) we will get promoted. So for our new node age 5, if the churn ID is divisible by 32 - which will happen on average once every 32 churns - it gets its Node Age bumped up to 6 and moved to a new section. It will then likely have to wait another 64 churns in its new section before it gets promoted again - promotion becomes exponentially more difficult the longer it remains. This means that Elders, the oldest 7 nodes in the section, have been around a long time and proved themselves in many different sections before achieving their voting status.

How does it work? On every churn event, the elders divide the churn ID by age, starting with the oldest (255) and working down to the youngest (5). When one of those ages matches a set of nodes in our section, then we relocate up to elder_count/2 nodes which have that Node Age. There will usually be only one in that age bracket, but in the case of an excess we select the nodes with a node ID closest to the churn ID .

Nodes can also be demoted for dysfunctional behaviour (bad performance in comparison with their peers). In this case, Node Age is halved before they are relocated.

Benefits of Node Age

This scheme has three main benefits. The first is Sybil resistance. In order to control a section, an attacker will need to control at least three elders. The process of becoming an elder is long and hard, and it’s impossible to know which section you’ll end up in. When the network is large, a 7 elders to 60+ adults ratio will make such Sybil attacks extremely difficult. In addition, new nodes are only allowed to join a section when more storage is needed, so attackers cannot flood the network with new joiners.

The second aim is to avoid undue work. If a node fails, it will likely do so early, so we kick it out before it can progress any further.

The third is general randomisation. Forcing the nodes to hop from section to section to gain trust also has the benefit of distributing capability evenly.

Relocation flow

@Qi_ma has been working on the implementation of Node Age including the messaging flows between the section elders, the candidate for promotion, and the elders in the target section. He gave a talk to the team this week. Here is one of his slides.

Elders in the source section

  • Agree on a churn event (membership change) and sign it (Churn ID)
  • Check if there are any candidates for relocation
  • Pick the oldest candidate(s)
  • Calculate their destination sections from their node ID combined with the Churn ID
  • Increase their age by 1
  • Cast a vote for each one to be relocated
  • When enough vote shares have been gathered, inform each candidate node

Candidate node

  • Receives message from elders
  • Acknowledges relocation process starting
  • Generates a new ID with correct initial bits (section) and trailing bits (its new age)
  • Bootstraps to the new section [it has authority to do so from its original section]

Elders in destination section

  • Check the source section’s knowledge is up to date (the SAP)
  • Update them if not and tell them to resend
  • Check relocation signatures and details are in order
  • Vote on the candidate joining
  • If all goes well, candidate joins new section

Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

58 Likes

boom First !!!
now to read as always well done to the team and looking forward to the next playground :wink:

23 Likes

I suppose I’m number 2 :smirk: Exciting update! Particularly thrilled to read about Safe Labs (looking forward to learning more) and that @davidrusu was able to mention this project at a Toronto CompSci meetup.

24 Likes

Bronze again… ill take it

19 Likes

again out of the podium :stuck_out_tongue: no worries though, good to read the updates!

14 Likes

Could evil nodes decide that a fellow evil node gets to age quicker? As in, could they conspire to deliberately assign a divisible-by-256 ID to their outgoing ally? Or, is there some number that is generated by a method requiring blind input from all the elders that makes it impossible to rig?

4 Likes

This is the key. The churn event is signed by threshold of elders and the sig is used as the churnId. It’s not guessable up front.

20 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

13 Likes

It seems that although Elders are super-important to the network, once a node reaches that pinnacle, it ceases to earn tokens via storage. Sort of a variation of the Peter Principle except that instead of “employees rising to their level of incompetence”, they sink to their floor of earning power. What if a node decided it didn’t want to make that trade, wealth for status? Could this be an attack vector?

Example: When a node discovers it is no longer earning tokens maybe it will choose to shut itself off temporarily so it will be “demoted” when it comes back online.

4 Likes

Could/should we reward Elders at say 105% of their last 6 months avg farming income? Or something?

2 Likes

Great update, thank you everyone. :pray:t2:

Will we get more on Safe Labs as it progresses? Good luck @davidrusu. If I can help by reaching out to my contacts (eg security and cryptography experts) let me know, the advent of Safe Labs could be an excuse for me to ping them once priorities become apparent.

Very nice to have Node Age set out like this, thank you @qi_ma. Quite a lot to understand there so my questions are very basic…

I think you would need five for control (allowing you to control the outcome of any vote) while three would allow you to disrupt the section by blocking consensus. Is that correct, and if so are there any incentives for an attacker to do the latter?

14 Likes

How does this churn count work at the inception of the network? If we start with 7 elders and 60 adults, all adults will be @ 5, correct? How is it selecting which adults are being promoted and where is it promoting them to in a “churn” event if there is only one section at the beginning? Is it going to randomly select 7 nodes to promote and they become elders of a new section?

Also, churn events seem very resource intensive (dumping current section data and reacquiring new section data). If the node age starts at 5, how many churn events are we expected to go through to increase the age? What is the optimum age for a node to be promoted to elder? Say, ideally it gets to a point that a node age of 40 is the average to reach elder status. If you are averaging ~3 churns per increase in node age, we are talking about 105 “churns” or dumping of current data and reacquisition. Even in a datacenter with 99.99% uptime, one bad coincidence of system maintenance timed up with a churn even could halve a node age. It just seems like advancing node age is setup to fail.

3 Likes

How many is “few”? 2, 5, 10? Will this be one of these “magic” numbers? Not that I see anything detrimental in making this “few” a fixed number - most probably decided after trial and error to see what makes the network most responsive vs overall security.

2 Likes

I’m prepared to bet a fair amount of MAID that the network will start with several hundred nodes and perhaps up to a dozen sections. These nodes will be DO droplets and “real” nodes from us punters will be added to these sections. As sufficient nodes are created by users then the initial Maidsafe-owned DO droplet nodes can be slowly retired.

2 Likes

Almost every post a question :wink: Let me try

Playing with various standard deviations there for now.
\

At start age is increasing in section null, we have options, but in any case relocating to the existing section is always possible and would be the case mathematically as it’s the closest section to the churn id. The relocation/age increase is based on the churn ID and closeness of those nodes of the eldest matching age. So we have promotions available to exactly a single node closest to the churn ID. The eldest 7 are Elders.

We don’t dump section data, but we will relocate data in the range of the event to keep replication up.

2^age i.e. 32 for 5, 64 for 6 and so on

Yes :wink:

Just vandalism AFAIK

ATM the payment would all go to the elders and they pay forward to adults. payments are made in relation to age where the eldest gets most.

I am sure there will be more questions, but dinner is on and I am rushing :smiley: :smiley:

19 Likes

Excellent. Happy to hear that. To be clear though, the Elders pay everyone, including themselves?

5 Likes

Fair enough. It was my previous understanding that sections were maintaining specific sets of data replicated to certain adults within that section and once a node gets “churned” to a new section, it would be acquiring a brand new set of data from that section. I see that I was incorrect.

3 Likes

Yes.

No that is correct. The data is always held at the 3 closest adults to the address of that data. So when a node churns it will get new data and its current data is copied to new replicants. Sorry, I seem to have misunderstood your point. I thought you meant we dropped all section data.

9 Likes

I always thought that there was to be an absolute minimum of FOUR copies of each chunk. When did this get reduced to three? Im sure this change would not have been made unless there was solid maths that proved 3 was more efficient but just as resilient as 4 but I cannot remember any such discussion.

4 Likes