Update 01 December, 2022

We know you’re itching to try the network out again, and having ground through some gnarly issues with the code we’re close to being able to offer formal testnets once more. With the entire team now focused on that goal, @joshuef explains what we’re up to and what to expect. So don’t worry, that itch will soon be scratched!

General progress

The team is working on ways to get testnets out more regularly to the community. It may seem that recently we’ve been a bit bogged down in the theoretical realms of consensus algorithms. In fact this is far from the only area we’re working on, and these efforts are, of course, tested internally, but not always on a full testnet environment and not always in a way that’s easy to share. However, @chriso has been toiling away on improving the release process so that we can roll out testnets more easily, and the rest of the team are now focused on ensuring all their work is testnet-ready, in the spirit of agile development.

Mostafa has now completed his implementation of simplified ABBA, the coin-flip consensus protocol we spoke about last week in the context of how elders come to agreement on membership matters.

Through the process of implementing ABBA, we’ve realised that the coin-flip protocol is not necessary when you have a preference towards a result. For example, ABBA is used to decide if an elder had proposed a membership change. If anyone sees a proposal from that elder, they vote YES, otherwise they vote NO. If there’s ever a split vote, that means that someone voted YES. Crucially, all YES votes come with a justification which demonstrates cryptographic proof that the elder in question did in fact propose something.

So if the question we are asking is “Did an elder propose a membership change?” Then a split vote would mean that yes! The elder did propose a change, and so we can resolve the split vote with YES.

In the original ABBA protocol, there was no preference between yes or no, hence the reason for the coin flip. Since we have a bias towards YES, then we no longer need a coin flip to resolve these splits.

Mostafa and @davidrusu are now putting the biased ABBA protocol through its paces. Next step will be to integrate VCBC with ABBA to arrive at the full MVBA (Multi-Value Byzantine Agreement) consensus protocol.

And @joshuef and @oetyng are looking into network knowledge issues which can occur after a section split when there’s a data query at elders. It seems to be due to a lack of knowledge sharing between the two new sections at the handover stage.

Testnets testnets testnets

After a few months getting deep into various network topics (membership, node state-locks, communication layers and responses), we’re looking forward to getting the code in the community’s hands once more.

We know that there have been sporadic comnets (and previously very frequent ones); and some community members may well be familiar with our testing tools in that regard. But here we’d like to go over what we have so anyone who wants to could have a go at setting up their own testnets.

The testnet tool

Our testnet tool is a collection of scripts and Terraform for setting up testnets. (Examples of commands are available in the readme file).

It allows us to easily spin up Digital Ocean droplets and run nodes on them. This is the basis of our WAN testing.

You have the ./up script, which allows for creation of a testnet of any size. It uses one droplet per node (the size is easily configurable in the prodiver.tf files).

If you want to enable heaptrack on the nodes, then we have a ./build script which spins up a separate droplet to build the sn_node code and safe bin (the node code with debug mode enabled so heaptrack can hook in).

You can then use these custom builds in the ./up script.

Lastly, ./down removes a testnet once you are done with it.

Easy peasy?

Okay, so I have a testnet up…

Once a network is running, we have several tools to help us.

A client droplet

The terraform setup can also create a client droplet (instance). This allows us to easily loop client tests, for example, and see how nodes hold up (./loop_client_tests.sh).

We also have a test-data folder which is pulled down to the client from AWS. We’re aiming to put this on the network at the start of any testnet. And this gives us a simple enough way to test for data integrity over the lifetime of a testnet.

Monitoring

We use Kibana and ELK to monitor the nodes. We have a (currently private) dashboard where we can see any memory or CPU issues, which helps guide any debugging efforts. For example, below we can see our current blocker: memory is rising over time. This appears to be related to the connection management… We have one potential solution that seems to solve this, but we’re looking for something neater.

Logs!

The last (and most cryptic) tool in our arsenal is pulling down client logs. ./scripts/logs does that for us. And then we can parse those with a tool like ripgrep or search for e.g. specific MsgIds to track what’s been going on in the nodes.

And so…

That’s just a small overview of how to use and assess a testnet. We’re hopeful if we can make this easier (we’re trying) and more public (soon!) we’ll be able to get more folk to monitor and check nodes and speed up debugging there once more.

So by all means, have a dive about the testnet tool. PRs are very welcome. There’s a lot of bash scripting just now, which may be up some folk’s alleys more than others… But at the very least, this hopefully gives you all an overview of how we’re testing just now. And maybe sparks some other ideas on how to improve such things!


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

60 Likes

Oh my! Cant wait to try it out!

21 Likes

what an update cant wait to have a go well done to all the team :slight_smile:

21 Likes

Testnets coming?? Great to hear! :grinning:

Looks like things are coming together for a major iteration of the network.

Image by stable diffusion

Great work Maidsafe team. This is a super Christmas present for everyone.

15 Likes

Thx 4 the update Maidsafe devs…

Sweeeeeet can’t wait 2 play :exploding_head:

Keep hacking super ants :crazy_face:

19 Likes

Is it Christmas already?? :gift: :gift:

20 Likes

It… just sound like music in my ears!!! yay!

13 Likes

Feels like the show is back on the road :rocket:

16 Likes

Thank you devs, Its all coming together,

Cant wait to pull it apart, << runs away very quickly>>

13 Likes

:clap:

So the elder that proposed the change does not actively participate in the round?

13 Likes

ABBA also made some ok music in the 70ties… Seriously, best update! It got my test-net juices flowing again, I will take a VR dev break, to join in the fun once more. Cheers!

12 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse: Looking forward to a new testnet! :racehorse:

14 Likes

Just noticed this…
feat(client): cap the number of concurrent chunks to be uploaded/retrieved for a file by bochaco · Pull Request #1826 · maidsafe/safe_network · GitHub

Wondering if this may go some way to solving the problems we have seen with uploading large files?

12 Likes

Great news! So glad to hear you did away with the coin flip.

12 Likes

Great to see Testnet :star_struck:

12 Likes

Hell… it’s about time.

6 Likes

Does this mean you will be joining in the testnets with gusto and alacrity?
Or do I have to message them seperately?

7 Likes

This is pretty exciting news.

10 Likes

Wondering if this may go some way to solving the problems we have seen with uploading large files?

May well. @bochaco is trying to get the concurrency to be stable, and larger files are larger problems on this front (more concurrency). We still see some adults failing to respond swiftly (which is 3s atm). Which (imo) should be heaps of time (in general they respond sub-second). So there’s something going on under there. Just digging about to see if we can find what that is… Meanwhile something like that PR may offer a workaround and keep larger files runnning in a stable fashion.

17 Likes