Here are some of the main things to highlight since the last dev update:
- The testnet release is still on hold as we work to complete some message flow refactoring, which is blocking rewards from progressing.
- Message flow refactoring work is making good progress, with a draft PR in place. This will result in a cleaner, simpler, and more efficient message flow.
- We’ve migrated our testnet deployment/take down scripts to use terraform, resulting in a drastic improvement in time taken to create testnets of any size for internal testing/external deployment.
- Spending some time working with a
no-rewardssetup has allowed us to catch and squash some bugs that would have otherwise remained hidden until rewards were fully implemented.
- A new
$ safe networks setsubcommand is being implemented in the CLI which will allow users to more easily connect to networks by simply using their bootstrapping IP:port address, with the corresponding PR going through review now.
- We believe we’ve come up with a solution for section chain forking in
sn_routing. This solution is currently being implemented, and we believe it will help make testnets stable enough to cope with community probing.
- Community code contributions keep on coming in!
Testnet status - on hold
Again, we were aiming to include rewards in a public testnet this week, but some related work to adjust the message flows after we made the switch to only do message accumulation in
sn_routing, which affects all parts of
sn_node, has turned out to be more time consuming than we anticipated. This work currently blocks rewards from progressing.
Earlier in the week we decided to focus some energy on the alternative of having a testnet with no rewards, i.e. stripping out some of the rewards flow, something we made a start on last week. We made some good progress here, but hit several blocking issues along the way, which we have had to apply several “hacks” to temporarily resolve until rewards and the related functionality are in place. The resulting
no-rewards networks that we have been spinning up have lacked stability, reliability and consistency, so as it stands we don’t believe there is value in putting a testnet up without rewards. This alternative approach did have some benefits this week in that it allowed us to somewhat progress with multi-section testing, which led to us identifying and fixing a few issues which we would have otherwise not seen until rewards and the related functionality were in place.
We still expect to host a testnet asap, with focus shifting back to move forward with rewards again and release a reliable testnet with that in place. Potentially we will do a little more testing using the
no-rewards work to see if we can discover anything else lurking for us down the line.
Testnet prep and testing
Towards the end of last week, we were attempting to publish large testnets, and it quickly became apparent that our bash script for this was not up to the job - taking 30mins to launch 20 nodes or so, and taking a hell of a lot longer to launch 100 nodes! As such, we’ve been migrating across to terraform for managing droplet and node deployment/destruction, and that is muuuch better. We can now launch 40 odd nodes in a few minutes. We’ve been using this pretty heavily to iterate, and have it set up now to allow us to deploy custom node builds too. Which has proved very handy on the iteration front. The PR for this switch to terraform is in place and pretty thoroughly tested now, with some tidy up work intended before merging.
At one stage through the week we were fairly regularly seeing internal testnets wanting to split, but failing to do so. We started trying to debug with smaller sections (for example, 3 elders, a 5 node section size) to trigger more splits, but this didn’t help. It turned out that we were not seeing splits as our code was depending on Elders moving sections, but this is not actually required (as things stand). As with all things probability, even those unlikely events seemed to happen reasonably often…and so it was that all our elders were falling in one half of the section, and so forming a new section unto themselves, with no key-change needed, and none of our code being hit.
With more bug fixes in place there, along with removing some rewards functionality that is being reworked, we’ve squashed an issue that was occurring at network startup whereby on every churn in the genesis section, the newly promoted Elder would re-propose a genesis payout once again, effectively making the rest of the Elders await another genesis payout that was supposed to happen only once initially. With that nailed, we’ve also squashed another related bug where accumulation of the genesis payment proof wasn’t happening (we were storing all payment events except the validation ones); which helped get things moving.
After that, we also came across some looping in
sn_routing, which was observed to cause some high memory usage, potentially causing nodes to die. We know where the issue is, and we’ve put a temporary measure in place to prevent it from happening for now. A more permanent fix will come in due course.
With all the above rolled out with the
no-rewards branch, we finally got to the stage where we could see the majority of client tests passing, with the fails highlighting a few other issues such as occasionally hitting some code we shouldn’t be able to reach (some error handling required for that), and now we’re currently whittling down an issue with client wallets too. Getting the client test suite just a little bit greener in preparation for when the rewards flow is fully integrated again.
Safe Client, Nodes and qp2p
Over the last week, we have tested the new
qp2p API with all our crates. There were some issues initially but those are all ironed out now and we are in the final review and merge phase across the board. We will be including these changes during end-to-end testing and they will be a part of the next testnet release.
Recently we moved over to accumulate messages in
sn_routing only, from previously having also done this in
sn_node. To finish this refactor, a lot of code has been touched, but also about 1350 lines removed.
The result is a simpler, cleaner, and more efficient message flow.
Work to do this is currently ongoing, with a draft PR tracking progress. This will help us move forward with the message flows in general, but very importantly now also the rewards, which have been held back a bit by these updates.
API and CLI
A technical debt we had in our
sn_api crate was to make our
Error type/s implement
std::error::Error trait. This is something we completed and merged this past week with the help of the thiserror crate. We’ve also changed CLI codebase to make use of anyhow so all functions now return
anyhow::Result and error handling is made much easier without losing information or context about the root cause for each of the errors propagated.
$ safe networks set subcommand is being implemented in the CLI which will allow users to more easily connect to networks by simply using their bootstrapping IP:port address. The corresponding PR includes an update to the User Guide, so for anyone interested in providing some early feedback about this command please go ahead and take a look at the description here.
Community contributions kept coming this past week. There is a work in progress effort by @bzee to make the
nodejs package compatible with latest version of
sn_api, as well as a fix in the CLI to remove a flag name that was causing a conflict between two different commands (fix: remove short option used for dry run by b-zee · Pull Request #708 · maidsafe/sn_api · GitHub). A PR was also raised and merged to remove logging implementation from
sn_client as this should be left to the applications or binaries that use the library, making sure applications do not get unexpected output on stdout or stderr.
Since the majority of dependencies of our crates use
tiny-keccak v2.0.2, @mav has been sending PRs to update all our crates to depend on this same version. We all very much appreciate the effort from everyone who gets involved in whatever way they can
BRB - Byzantine Reliable Broadcast
We did some additional work on
brb_node_qp2p to get it working with bi-directional streams and the new (coming soon)
qp2p API. This enables each node to send and receive from the same port, instead of opening new connections over a separate random port for outgoing packets. In the process, we contributed a couple of small PRs to
qp2p. One in particular makes it easier to share a
qp2p Endpoint between threads, which should be a win for anyone building a p2p app with the library.
We have a design to simplify the Policy/Permissions logic governing access to network data, this is currently going through internal review. We also have a PoC for a new Chain CRDT which might prove to be a better underlying CRDT for our Sequence data type, this came out of the issue @mav raised concerning our Sequence data type.
This week we were exploring a promising approach to fork resolution. To recapitulate: we have something called a “section chain” which is a list of section keys linked together with signatures. It can be used to prove that a piece of data was signed by a group of nodes that were once valid members of a section, even after those nodes are long gone. Currently this chain requires that the section agrees on which key to append to it next. If there is a disagreement on that, the chain can fork into two (or more) mutually incompatible chains which could currently break the section. This can happen, for example, at times of intense churn. We were hoping we could get away without tackling this for a bit longer, i.e. until the testnet was out, but it turns out we are sometimes seeing forks even in relatively small test networks.
So we were busy discussing how best to attack this problem and we came up with a couple of promising ideas. One such idea is currently being implemented which we hope will help make testnets stable enough to cope with community probing. There are still some potential concerns about security and possible attack vectors, but those will be addressed later. Right now the focus is stability. Baby steps.
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!