Setting the scene
With SAFE-Fleming, we’ll be releasing a fully-functional permissionless decentralised Network based on a combination of features that we’ve developed. As we discussed in ‘What is SAFE-Fleming?’, a critical component of this is PARSEC which solves one of the toughest challenges that any decentralised Network faces: consensus between peers.
Let’s talk about some of the discussions we’re having around implementation post-Fleming. Whilst these topics don’t translate to functionality you’ll see in Fleming itself, it’s crucial that the work we’re doing today takes account of the future requirements of the Network. So let’s kick off by giving you a high-level view of our approach to each question.
Thinking in layers
Let’s start with a general point: Routing in the SAFE Network (the way that nodes connect to each other) involves many interconnected design aspects. We introduced the concept of ‘layers’ to frame our approach to each design challenge. By considering what impact any decision might have on other layers in the Network, it allows us to work more efficiently as we approach each new question with clear focus and understanding of the impact.
Network Upgrade: Handle change
Once live, the Network will need to evolve in order to satisfy user demands and adapt to changes. Like all software, it’s critical that it has the ability to upgrade. However upgrading peer-to-peer Networks comes with its own set of unique challenges. For example, the Network needs to be fully functional from the moment of that upgrade. That means all stored information intact and all peers communicating with each other.
This can be broken down into a number of questions:
- How do you ensure that upgrading peers doesn’t disrupt the Network?
- How do the Network’s peers make the decision to upgrade (or use a newer version of the protocol)?
- Should the Network handle multiple protocols and software versions and how?
Many other concrete decisions need to be taken as well, for example:
- What data format should peers use for communication to ensure interoperability? Which protocol versions should be used?
- How do the Network distribute upgrades to the peers?
So, in summary: how does the SAFE Network handle all future upgrades smoothly?
Network Restart: Handle massive failure
The internet is very resilient yet large services still fail on occasion. That may be due to accident or attack, but they tend to recover fairly quickly. Often this is because they can rely on a centralised infrastructure. But a decentralised Network can’t afford this luxury, which presents a unique challenge for any restart.
How can the Network handle and recover from a temporary catastrophic failure or a large number of peers leaving the Network at a similar time?
The peers in each Section of the SAFE Network together hold a specific part of the information of the Network and they are trusted by the rest of the Network to keep it. So as it recovers, peers need to not only communicate with each other, but also ensure that the information in each Section is recovered, trustworthy, and managed by a functional Section.
Also, losing a large number of peers may impact on other functionalities. For example, peers will join and leave on occasion. If the number of peers grows or shrinks sufficiently, the Network adapts its structure by merging or splitting Sections to maintain its overall integrity. We need to design to prevent unintended consequences when a large proportion of peers leave only to rejoin after the event.
Having the ability to recover from a full shutdown is essential for the reliability of the Network. We see the design of this functionality as an opportunity to also support upgrades.
Scalability: Handle growth
The SAFE Network is an infrastructure for everyone to use. It needs to handle a very large number of peers that will provide resources to the Network. As a result, we need to identify and mitigate the limitations of any components that would restrain the Network’s ability to grow.
While designing parts of the Network, we need to constantly consider how each component will scale within the constraints that are required for the Network to operate efficiently.
Consequently, we’re looking at larger Section sizes. This would impact on different part of the system such as connectivity with other peers and the efficiency of our consensus algorithm PARSEC. As a result, we need to ensure that all of these components are able to handle such requirements as part of this Fleming work.
Connectivity: Handle Networking constraints
There are certain strict constraints in the way that computers can connect. For instance, many consumer computers and routers can only maintain connections with a limited number of other computers (from tens to a few hundred).
In a peer-to-peer Network like SAFE, each computer needs to communicate with many of its peers. We need to work out the optimal way forwards here to avoid any hindrance to scaling and a negative impact on our considerations for network upgrade and restart.
There are many ways to deal with the issue and each has its own pros and cons. For example, Direct connections can be considered scarce and should be used sparingly. In addition we can exploit certain capabilities of some computers to enable them to connect with more peers. We can also shield some parts of the design from these technical details with the right abstractions. On top of that, we can also combine some of these different techniques.
History Pruning
Each peer needs to retain information about what the Network stores, who to trust, and have proof that the current state is the result of valid changes. As the Network changes and provides services, more and more data points are created, each with their own overhead (each data point needs proof of validity for example), and some older ones start becoming obsolete (data about a long gone peer for example).
In the blockchain world, all transactions since the inception of a network would be stored in an ever-growing ledger. Compare that to our data chain approach which allows nodes to forget information that isn’t needed by them as the Network evolves. SAFE can use the trusted history of a piece of information to establish that the information is valid. In an asynchronous network, where nodes will be receiving information at different times and in a contrasting order, it needs to preserve some history in order to be able to convince nodes that are slightly behind in their view of the world. Exactly how much can we prune? Where exactly is the cutoff point? How do we maintain proof that the pruned history is correct? These are some of the questions we’ve looked into.
Assignment of identities to peers
As we integrate the existing design for Node Ageing and Address relocation, we must consider when and how to persist a peer’s identity. For instance, stripping a peer of its identity may be an appropriate punishment for malicious behaviour. It deprives them of their valuable age. However, we also want to incentivise nodes to run the latest version of the software so clearly a node should also maintain its identity after a Network upgrade. When considering such nuances, we need of course to remain mindful of the threat model for any chosen approach.
Opportunities of an open culture
As we are answering all of the questions above, we favour simpler solutions that have a clear and well-defined impact on a small number of layers of the Network. If solutions already exist in the wider world, it’s worth stressing the point again: we’re always open to taking inspiration and collaborating rather than reinventing the wheel for the sake of it. We’re keen not to miss any insight from other teams in our space so we’ve been spending some time analysing how other projects have answered similar challenges.
What’s next?
Now we’ve given some context to the questions that we need to take account of on the Road to Fleming. Whilst they may not be implemented today, each needs to be considered now in order to ensure that the system fits together in the most efficient way after Fleming is out in the wild. On that note, next up we’ll be diving into a topic that’s of interest to everyone in the decentralised space: how the SAFE Network defends itself against Sybil attacks.