Here are some of the main things to highlight since the last dev update:
- Happy New Year!
- We are excited to announce that the Dynamic Membership work has been completed and we have today published several Byzantine Reliable Broadcast (BRB) crates on GitHub . All are linked from the central brb crate.
- We’ve been working hard, even over the holidays, to improve our errors and comms layer, including creating a new
- We are in the process of preparing some minor fixes for a new patch release of the CLI.
- We are also close to having the CLI and
authdbuilt with musl, meaning compatibility with a larger range of platforms.
- We have a new stress testing tool for routing with the PR currently under review. This new tool is designed to discover the limits of routing in terms of how it handles membership changes (churn) and has already brought some issues to our attention.
Thanks to everyone who has taken time to try out the testnet code released before Christmas. With all of the MaidSafe team now back at their desks, we are continuing to work through some issues we identified before release, and others you highlighted for us. Once we are satisfied that these issues have been resolved we will announce another iteration, with the intention being that we will host a public testnet for everyone who can, to connect to.
Safe Client, Nodes and qp2p
We’ve been working to improve our Error story throughout our libs, and have transitioned to using
thiserror throughout node/data_types/client/transfers to provide a better error chain and some greater encapsulation of functionality. Previously, we were using a lot of mixed errors, pulling a lot from
sn_data_types into other libs. Now we have specific errors in each lib, for that lib, and only propagate errors from lower libs as another version of the current lib’s errors.
On top of this, we’ve extracted
sn_data_types into its own crate in order to separate out our comms layer, as well as errors that we’ll be sending to/from other nodes and clients. This is a small step towards more clearly defining a network ‘API’ of messages and errors and it provides cleaner separation of errors from internal libraries to the client.
As part of this effort, we are exploring different serialisation types, with the end goal of having one which is programming language agnostic. We are at the moment focusing on a simple JSON serialisation (as opposed to currently used
bincode), but also playing around with Msgpack.
The knock on effect of all this has been some cleaner code, and much clearer error flows throughout all the involved libs, which is great.
In tandem, we’ve been removing client “challenges” from the node/client bootstrap flow. These were previously used to verify a client was holding keys, in order to prevent message replay attacks. But with idempotency coming from AT2 and CRDT data types, this will be handled there. Yet more simplification for both the client and the node, and further clarifies network operations as signed messages only.
Previously, to prevent key-selling attacks on the network, we removed all SecretKey exposing APIs from
sn_routing and contained them only within their crate. However, we found that there were multiple complications down the dependency tree caused by this removal, and so agreed to bring back those APIs to allow us to move ahead quickly with the testnet during the holidays. We have decided to tackle this problem head-on right away, and have started refactoring the
sn_node crates, where we hold and use those SecretKeys outside of
The signature-aggregate work carried out by
sn_node during exchanging messages among KeySection and DataSection is possibly duplicated with routing’s consensus accumulation work, as both are actually being undertaken by elder nodes. We are investigating and carrying out some refactoring work trying to remove this part from the
sn_node crate, and to trust the consensus messages from
And a final wee bit of work is underway removing
stream management from nodes. This was put in place to maintain comms with clients, but with recent
qp2p changes we can rely on connection pooling there to handle this for us, and so remove a lot of complexity from the node’s client handling. We are also in the process of refactoring the
qp2p examples into separate parts to demonstrate the echo_service and messaging systems clearly and distinctly. We are doing trial runs with these examples with manual port forwarding to potentially support routers not compatible with IGD in further testnets.
API and CLI
We have been focusing on changes and improvements on the network side, however, we have still been working to take care of some minor bugs that have been reported by the community while using the testnet and so are in the process of preparing some minor fixes for a new patch release of the CLI.
Also, we are trying to get our next release of CLI and
authd to be built with musl, which as we know will allow us to run these applications on many more platforms using the same released artifacts. We were able to build them manually already (thanks a lot to @mav and @tfa for their input and contributions to this), so we are now looking to get this into our CI in the coming days.
BRB - Byzantine Reliable Broadcast
We are excited to announce that the Dynamic Membership work has been completed and we have today published several Byzantine Reliable Broadcast (BRB) crates on GitHub . All are linked from the central brb crate.
The BRB system consists of:
1. The core BRB broadcast protocol for members of a quorum to replicate data in BFT fashion.
2. The dynamic membership protocol for nodes to dynamically join and leave an active quorum.
3. Data type wrappers that encapsulate compatible data types (e.g. CRDTs) for transfer via BRB.
4. Comprehensive tests to verify correctness.
5. brb_node_qp2p: an example CLI app/node for manually invoking BRB functionality.
For those interested in digging into the details, slides are available and provide further insight into the system and protocols.
As we all know, relocation is good for the network, facilitating node ageing amongst other things. However, we observed that in some situations we were over relocating. For example, we were relocating even when we did not have enough elders due to churn, and also relocating nodes when they had newly joined. To resolve, we set up some criteria to avoid over relocation aiming to keep the network stable during certain scenarios.
An API change was also undertaken to return section info for specified target name. This is mainly for
sn_node usage for upcoming refactoring work.
We put together a stress test for routing (PR under review). It’s a little tool designed to discover the limits of routing in terms of how it handles membership changes (churn). It generates random churn according to a configurable schedule. It then periodically outputs various useful information about the network, and measures the network health. This tool will be very useful for the upcoming work of integrating the new dynamic membership solution. Running it on the current version of routing, it already discovered some issues we have around relocations and splits, which we will look into soon. This is actually good, because the first step of fixing a problem is knowing about the problem Here is a little screenshot of the tool’s output:
Feel free to reply below with links to translations of this dev update and moderators will add them here:
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!