Update 25 November, 2021

We’ve all agreed that this week would be a good time to look at what Safe brings to the table in terms of distributed consensus. Actually, a couple of us thought it was a rubbish idea, but a supermajority voted to press ahead, and that’s all that matters :stuck_out_tongue_winking_eye:. So, this week we explain how Safe bridges the gap between blockchain and the protocols that underlie distributed databases like Cassandra and AWS DynamoDB.

General progress

@ChrisO has completed the task of continuous release for sn and sn_api in the safe_network repo. This means we’re back in a place where we can freely merge commits and be confident in the subsequent release. With this in place, @chrisO is moving on to looking at the CLI and stabilising that before including it in releases.

David Rusu has been putting together a demo on ring signatures and Safe. Expect an update soon (warning: it will be ‘heavy on the math’). :running_man:

@qi_ma , @yogesh and @lionel.faber have been refactoring the DKG crate and safe_network (our main current source of head-scratchers as elder promotions are stopping at earlier than seven) to reduce the number of messages to be broadcast and increase stability of section startup and split. Work continues there, but we’re definitely reaching the underlying issues.

There’s also been some work on enabling message handling priority for nodes, though this is still being evaluated. This should help us ensure that larger changes are handled more keenly than less important messages. This blocking of low priority messages was unintentionally present in the less concurrent codebase, but was lost as we freed up nodes to take advantage of more of the CPU. So this should hopefully bring a bit of order back to node’s operations.

Distributed consensus and how Safe bridges the gap

The core of Safe Network, and what makes it uniquely useful for a wide variety of purposes, is how it achieves agreement between distributed nodes that may be unreliable.

In the 1980s, decentralised agreement without reference to a central oracle was thought to be impossible, but then along came Leslie Lamport to prove everyone wrong. In fact, he was actually trying to prove that it was in fact impossible to have consistency and fault tolerance, but then he stumbled upon an answer, which was to promote temporary leaders and ensure system-wide order of operations.

The resulting algorithm was Paxos, for which Lamport eventually won a Turing Prize. Paxos guarantees eventual consistency in a distributed network where some machines may be unreliable. It means networks can avoid having a single point of failure – any node, even a leader, can drop out and the system will still work.

Paxos has proved massively successful and spawned hundreds of imitators and variants. It is mostly used for reliably replicating data between distributed machines and is the basis for cloud databases like AWS DynamoDB and Google Chubby.

It does this by trying to decide a total order of operations. There are two stages in a Paxos operation - promise and commit. A promise is an agreement to do something and a commit is the action of doing it. A majority of nodes need to agree before making a promise to mutate state or change permissions. A majority is also needed to move from a promise to a commit, for example a write or a change of ownership.

Because distributed networks are asynchronous, order is maintained by giving each operation a numerical ID and increasing the IDs over time. In the case of a clash over a promise, newest (highest ID) will generally win.

In Paxos there is always a leader node. In the case of a commit, the leader appends each op to its log and asks the other servers to do the same. Once it has heard back from a majority of the servers in its cluster it commits the change.

A new leader is elected when the current one fails to respond within a certain time. When this happens any node may put themselves forward. To become leader, a new candidate must receive a majority of votes. It must then update its logs by synching with other nodes. The voting nodes send their logs with their votes to make this easier, but it may still take a long time to update, during which time the cluster is inactive, which can be problematic.

Raft (2014), which is based on Paxos (or more accurately Multi-Paxos, the updated version) simplifies the voting process and other operations, but is essentially a modification of Paxos. Raft has now overtaken Paxos in terms of project activity.

So far so good, but while Paxos/Raft are elegant solutions to distributed consensus, their implementation requires a lot of extras if they are to work as planned.

Cloudflare is based on Raft and in 2020 it suffered a major outage due to the leader election process going into an endless loop. To guarantee liveness, PreVote and CheckQuorum optimisations are also required, plus plenty more others beside. Not so simple now.

Extending the functionality of Paxos and Raft and other variants makes them complex, which in the tech industry means they become proprietary. If only Amazon knows how to reliably set up DynamoDB, Amazon isn’t going to give that secret away in a hurry.

But the major drawback to these algorithms is their assumption that nodes do not collude, lie, or otherwise attempt to subvert the protocol i.e. Byzantine failures don’t occur. If you are Amazon you might be able to guarantee that since you control your own hardware and software, but for mixed public networks, there’s no way you can do this. Which is why you don’t hear much about Paxos et al in the crypto world even though it underpins the tech Goliaths.

Enter Satoshi

Byzantine fault-tolerant (BFT), decentralised algorithms are dominated by blockchain, which solved this hard problem thanks to the genius of Satoshi. Blockchains are highly ordered and BFT. But total order is essential and that blocks or slows down multi-access and concurrency. Blockchains are not general purpose systems, but ones designed specifically for value exchange. Blockchains cannot replace Paxos and its derivatives because they are optimised for securing and sharing transactions, not data operations.

Step aside Satoshi, Safe Network’s here

Safe is different. It bridges the gap between the store-forever transactions of blockchains and the distributed data management of Paxos and Raft.

Data is stored forever without heavyweight PoW.

There is no overall leader to attack, not even a temporary one (like the PoW winner in a blockchain). Instead, accountability is distributed between section supermajorities and these consist of constantly changing (but agreed) groups.

Then we have the issue of fault tolerance. Paxos and Raft use the concept of ‘crash’ fault-tolerance (CFT). But public networks require Byzantine fault-tolerance. The difference seems subtle but it’s not. Crash fault tolerance covers how many nodes can fail between maintenance runs, whereas Byzantine fault tolerance is how many nodes can go rogue at any point in time. CFT is calculable, BFT is not.

Safe is BFT and designed to be ready for real-world conditions where things fail and ill-wishers roam. Node age ensures Byzantine nodes are quickly disempowered. By contrast, Paxos/Raft are easily attacked by promoting a leader that doesn’t reply or replies too fast. They also rely to an extent on server clocks to manage consensus. They seem to be simple (as do Kademlia networks) but they need a lot of pre-conditions to make them work, including trusted hardware and software, no NAT traversal etc.

Safe is fully decentralised. AE, BRB and CRDTs do away with the requirement for network wide agreement to achieve consistency, and there is no need for total order. Even Amazon admits that for large distributed systems, the game is eventual consistency not total order (speed of light is a hard limit). Trying to achieve total order at massive scale inevitably leads to low performance.

So, Safe crosses many boundaries: Eventual strong concurrency on a public network, general purpose BFT, and permanent data.


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

75 Likes

I am first today yaaaa

Thanks for great contents. :+1:

22 Likes

At least second…early post this time…

15 Likes

There’s a bridge in the pic, it was meant to be

16 Likes

That was a very interesting read! thanks :+1:

Safe’s potential knows no bounds.

I’ve had a good week breaking the back of Tauri and now have Rust holding state and Javascript-React looking pretty… allowing more focus on what functionality might be. Given that much was a pipe dream a couple of months ago is a nice place to be.

:thinking: Will users be able to login two Safe accounts aside each other?.. or will that be limited initially to one account at a time… rather hoping two active accounts will be possible.

16 Likes

Fifth again being to think by has a delay on my Internet!!

Well done team now looking forward to reading it.

10 Likes

What you thinking use-case wise here?

12 Likes

Profile manager to limit user error.
… and in any case, users might want to have two apps with different profiles.

9 Likes

We’d be thinking along the lines of users using SafeIDs, and then a profiles layer to manage this, rather than multiple Safes, TBH.

Obviously users can do as they wish, but I think it will be a worse experience overall if we used separate Safes to manage this rather than a layer within.

16 Likes

Yes, that could work equally… it’s just addressing the pina of logging out and in again and what users are doing with which profile. So, the same question is - Will users be able to login two Safe profiles aside each other?..

8 Likes

Yeah, that’s the goal.

12 Likes

Just getting a chance to catch up now. Great update, nice to see SAFE compared and contrasted with related work and nice to see its superiority. An excellent well written piece, thank you.
I have to explain SAFE to a couple of academics in the next week or so and I will be relying heavily on this and previous updates to keep it simple but ready to cite other work if required.

14 Likes

Great progress and nice write-up on distributed consensus. :+1::nerd_face:

11 Likes

Nice write up! It illustrates what is at stake here. Once reliable, it is game changing.

15 Likes

Brilliant write up. Well done.

But dying for a stable testnet - hoping we are close. :crossed_fingers::crossed_fingers:

I know we are closer each week but has internal testnets progressed significantly since last public testnet?

11 Likes

Really enjoying the Thursday pieces! Every week you guys blow my mind with the complexities you deal with to make the Safe Network happen!

11 Likes

Great update! This is quite insightful even for non-technical people like myself. Thank you Team!

I’m excited we’re at this stage, where we can explain Safe in a way that more people can understand and see the value.

15 Likes

42 posts were split to a new topic: The Nasty Neighbor Attack: DDOS and OOB social vectors

This provides a very good summary of consensus (thus leader), eventual consistency and total order in a decentralized environment. I like this article because it gave me a whole picture of consensus or eventual consistency in a distributed environment. Thx @maidsafe team, @dirvine

14 Likes