Safe Network Dev Update - October 1, 2020

Summary

Here are some of the main things to highlight since the last dev update:

  • The Safe Network filesystem POC took a major step closer to being made public, with it now being merged to master and going through the final stage of testing.
  • We renamed the Safe Client Libs repository / safe_core crate to sn_client.
  • We’ve laid the foundations for chaos testing in sn_node.
  • Props to @Scorch for 2 async contributions to sn_node, PR1105 and PR1090, in the last week. :clap:
  • A Continuous Delivery Pipeline has now been rolled out to just about all of our Rust crates, meaning more frequent releases for you!

CRDT

sn_fs

This week a pull request was raised and merged to push the local filesystem prototype into the new sn_fs repo. Once the rest of the team has had a chance to kick the tires a little, we plan to make the repo public so that community members running Linux can try it out, keeping in mind that this is only a proof-of-concept. Details should be forthcoming next week.

Safe Client (previously Safe Client Libs/safe_core), Nodes and qp2p

Safe Network Transfers Project Plan
Safe Client Project Plan
Safe Network Node Project Plan

Another repository and crate name change to announce this week - Safe Client Libs has now been renamed to sn_client. These libs have gone through a huge transformation over the past months with a massive refactoring and simplification push by the team, as documented in these updates, and we’re down to only the safe_core crate remaining. We’ve renamed both the repo and the crate to sn_client to bring it in line with what it now represents, and with the other renamed repos and crates. With this, the renaming task is now more or less complete, with the only outstanding actions being to publish a few of the renamed crates, namely sn_routing, sn_client and sn_node, which we’re holding off on until they are considered stable enough for a new version release.

We are progressing more on the Node/Client integration this week, taking down bugs as we re-enable e2e tests one by one to increase the code coverage. We found a couple of blockers at Replica Management that were hindering us from getting clients to pay for data writes. Therefore we set out to streamline the AT2 Transfer tests, focusing mainly on the Transfer Operations and removing all and any discrepancies among the section. Once we clear that, the next steps would be to cover the data layer operations and introduce chaos testing to strengthen the overall integrity. We’ve laid the foundations for the same in PR#1124 which introduces a new feature-flag chaos in the crate to enable chaos testing. Once enabled, it reads the chaos level from the SAFE_CHAOS_LEVEL environment variable which dictates the probability at which chaos is to be induced in the network by randomly dropping messages/not performing operations.

On another front, we have been working on adding some features to qp2p following the async/await paradigm. During the migration towards using async/await across all our crates, we only migrated the modules that were immediately required. On completion, we saw good results and it integrated well with the other layers. Now as we have already started testing the end-to-end system again, we will soon need features like usage of the IGD protocol and echo service for automatic port forwarding, which is an important component when running testnets. These features have been implemented and their integration is currently being tested with the other layers.

Shoutout to @Scorch for PR1105 which not only optimises but also makes the chunk store module follow the async paradigm in sn_node. Not content with this, @Scorch also had a PR merged this week making the statedb file I/O async :clap:

It’s always a huge shot in the arm for team morale when we see contributions like this and realise how much this project means to us all :muscle:

Routing

Project Plan

In Routing this week, we have merged the work tackling message resend and lost peer detection. This was part of the work to get routing working under the new async architecture. There has also been related work to remove message accumulator, which we expect will be merged soon.

Alongside the above, there is also ongoing work to simplify the codebase by removing the id.rs file completely, and to make our nodes keep a Keypair and SocketAddr for peers. The first stage of this was to get rid of the id traits within the bls_dkg crate - this has now been merged. The routing part of this refactoring is ongoing with a PR expected to be raised soon.

The effort to move to CRDT continued in parallel to all this, reaching a state where we have a very basic and isolated PoC implementation. In this first step we are trying to validate if such an implementation can successfully work as the container for a node to store all information it keeps in sync with other peers in its section, e.g. a list of Adults and Elders, where these sync up operations are treated as CRDT operations, thus nodes don’t need to worry about the order they are received as they all converge to the same state. We are now working to advance in creating a good set of tests which can prove this approach is solid, before we start trying to integrate it into sn_routing.

Continuous Delivery

Back at the end of July we implemented a Continuous Delivery (CD) pipeline in one of our Rust repos for the first time. After some initial tweaking, we were satisfied that the pipeline was solid and of great benefit to the project. This week we completed the CD rollout to all but one of our Rust repositories :tada:

Wondering what this CD pipeline process means? When any pull request is merged to master in a repository with CD enabled, our CD pipeline automation kicks in. The CD generates a changelog containing all the valid updates added since the last release, works out what version the crate should be updated to, and generates a PR updating to that version. It auto-merges that PR then creates a matching GitHub tag and Release. Finally, the crate itself is published to crates.io. All automated with no human intervention required. This means that you will begin to see much more frequent releases, bringing that feedback loop with the community down as much as is possible.

sn_api is the only exception for now due to the complication of multiple crates in the workspace, we’ll come back to that down the line.

Note that we have not merged the CD pipeline PRs for sn_routing, sn_client and sn_node just yet, preferring to hold these back until these repositories, which are currently in a period of flux, are considered stable and able to begin automated releases.

Useful Links


Feel free to reply below with links to translations of this dev update and moderators will add them here:

:bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

70 Likes

First :grin:20 char

20 Likes

Sounds great! Good work team, it continues. We are getting closer!

19 Likes

Thx Maidsafe devs,

Wow this is great news, can’t wait…

19 Likes

Thanks as always for your hard work and for these detailed, transparent updates - they’re the best in the industry imo! :innocent:

21 Likes

local filesystem will be fun

:+1:

16 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

17 Likes

:heart_eyes: :heart_eyes: :heart_eyes: :heart_eyes: :heart_eyes: :heart_eyes: :heart_eyes:

:heart_eyes:

18 Likes

Thanks for the update SN team!

It’s shocking to me how much optimization is going on - speaks to the overall complexity of a project like this that you aren’t down to a couple of lines of code by now with all the new ways devised to simplify the code base. From an engineering perspective I suppose the best part is no part if you can get away with it.

re: CD-pipeline - was that developed in-house or adapted from other projects? Seems like SN team is innovating the space for a small team of coders to produce efficiently on a large scale.

@Scorch you are fire! :wink: May none ever pour cold water on your flame!

18 Likes

Wow, thanks the for shoutout @maidsafe ; I’m glad I was able to be of some help! This whole project is inspiring on a lot of levels, and it’s awesome that the community gets to be such a big part of the development effort!

And by development effort, I don’t strictly as it pertains to code and open-source licenses. I’ve never seen a community so engaged in everything from app development, marketing, and even just people hanging out here in the forums and sharing their enthusiasm. As mentioned, it creates a positive feedback loop where community and team members feed off each other’s excitement and energy and then give back the same in kind.

Anyway, great update overall. While big, shiny feature updates are nice, testing, tweaking, and continuous delivery updates make me feel like the network is maturing. It seems like good things are on the horizon :smile:

52 Likes

It’s fantastic that you’ve dived in and are contributing to the core codebase @scorch, very inspiring to see. Thanks very much for your work.

29 Likes

After an exceptionally tiring and generally shitty evening shift, it was a real pleasure to read this solid update.
Thanks to all but especially @scorch He’s hot https://www.youtube.com/watch?v=mV9q_KdtQfc

21 Likes

Seriously long wait now, very technical updates… But hoping the finish line is at least appearing on the horizon?

1 Like

Gogo maidsafe! Its over there! :slight_smile:

4 Likes

Our CD pipeline was developed in-house, yes. There are of course other projects who have moved to a CD model, but we couldn’t find an open source project which matched our setup of GitHub Actions/Rust, that we could use as a template. We pieced together some general CI steps that are widely used, and combined with a bit of internal innovation, such as this rust-version-bump-branch-creator repository, which was our first in-house GitHub Action.

@joshuef was the real mastermind behind it :clap:

16 Likes

Another nugget for the ‘birth of the Safe Network’ biography? (code-ography?) book! :wink:

4 Likes

Thank you for the heavy work team MaidSafe! There is little time left until we reach the moon!

I add the translation into Bulgarian in the first post :dragon:

5 Likes

I wonder if we are going too far with this paradigm and we are using it for too small pieces of code which may generate too much asynchrony. Asynchrony is very difficult to master and is the source of many bugs. Rust is the language of choice for asynchronous programming but still doesn’t detect every bugs in this area.

For example in sn_routing getting a simple property of a node like its name is using await. I am not sure that all the consequences of using it for many narrow scopes are controlled.

One instance where there are not controlled is test_section_bootstrapping test case. I observe that it passes very rarely because one of the nodes isn’t properly initialized. It can be any nodes (6th, 4th, 5th, …) or none when the test case passes. This is typical of an asynchronous programming bug.

I would like that the team investigates this test case and gives an explanation of what happens. I may not be able to understand all asynchronous intricacies but I would like to get an idea of how this domain is mastered (or if it’s simply the test case that is wrong and not the core routing code, which would reassure me a lot).

For reference, sample outputs running this test case several times:

$ cargo test --test bootstrap test_section_bootstrapping
    Finished test [unoptimized + debuginfo] target(s) in 12.84s
     Running target/debug/deps/bootstrap-cbc56fc3275e21a8

running 1 test
test test_section_bootstrapping ... FAILED

failures:

---- test_section_bootstrapping stdout ----
OK for acbfa2..
OK for 222960..
OK for a98ec2..
OK for 5135c6..
OK for 9cb98b..
NOK for caa52e..
Error: The node is not in a state to handle the action.
thread 'test_section_bootstrapping' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/ubuntu0/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:191:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test_section_bootstrapping

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out

error: test failed, to rerun pass '--test bootstrap'
$ cargo test --test bootstrap test_section_bootstrapping
    Finished test [unoptimized + debuginfo] target(s) in 0.12s
     Running target/debug/deps/bootstrap-cbc56fc3275e21a8

running 1 test
test test_section_bootstrapping ... FAILED

failures:

---- test_section_bootstrapping stdout ----
OK for 455181..
OK for 99cc5e..
OK for 1bd155..
NOK for 674612..
Error: The node is not in a state to handle the action.
thread 'test_section_bootstrapping' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/ubuntu0/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:191:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test_section_bootstrapping

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out

error: test failed, to rerun pass '--test bootstrap'
$ cargo test --test bootstrap test_section_bootstrapping
    Finished test [unoptimized + debuginfo] target(s) in 0.12s
     Running target/debug/deps/bootstrap-cbc56fc3275e21a8

running 1 test
test test_section_bootstrapping ... FAILED

failures:

---- test_section_bootstrapping stdout ----
OK for 42651a..
OK for 26b579..
OK for 007355..
OK for 2f9c97..
NOK for 42dbf3..
Error: The node is not in a state to handle the action.
thread 'test_section_bootstrapping' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/ubuntu0/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:191:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test_section_bootstrapping

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 2 filtered out

error: test failed, to rerun pass '--test bootstrap'

The code modification to get these outputs:

--- a/tests/bootstrap.rs
+++ b/tests/bootstrap.rs
@@ -126,6 +126,11 @@ async fn test_section_bootstrapping() -> Result<()> {
     for result in nodes {
         let node = result?;
         let name = node.name().await;
+        if node.our_prefix().await.is_some() {
+            println!("OK for {}", name)
+        } else {
+            println!("NOK for {}", name)
+        }
 
         // assert names of nodes joined match
         let found = joined_nodes.iter().find(|n| **n == name);
3 Likes

I see async as a simple paradigm. So await() functions that take time only. So i/o, some big calculation and so on then the border cases perhaps even sort/dedup a vec etc. it can be a problem to create a task, exit the method, get the result and re-enter the method.

i.e. As async is in place now, we may have edge cases where not using it is faster. This is a good place though as we have the whole thing running we can then benchmark and optimise, However a good discussion to always have, “does that method need to be async, is it slow?” and so on. You look like you have found one thing

For example in sn_routing getting a simple property of a node like its name is using await.
There is a refactor this week to cleanly define Node, Peer, Section, Network This will make the logic extremely visible, but also makes the first 2 (Node and Peer) immutable. Reading from an immutable struct will not benefit from async. So we should see that clearly.

Nice feedback @tfa it helps us a lot. Let’s see what this week brings, I have been looking for this refactor (and it is a refactor, no logic change) for a long time as I find routing too hard to comprehend easily. I hope this will give yourself much more visibility of the logic and cleaning these cases will be simple (also no mock, so refactor can be as simple as press F2 rename symbol and press enter. We should be at even higher velocity now.

The big push now, the section/network data types (crdt compliant) and test the hell out of those. @adam last week altered messages to follow a pattern similar to that suggested by @oetyng and that allows messages to be tested independently. That again will give us clear visibility. We will push property testing in these areas to cover as amany strategies as possible. So much smaller/simpler and testable modules. It is a step change in direction and one that has been missing for at least 3 years. No longer though :wink:

16 Likes

When I think of the people in the world who are inspirational for their vision and what they do and for their apparent integrity and total relentless perserverance David Irvine is always going to be close to the very top of my list and that will hold regardless of how SAFE works out.

There are apparent qualities that I’ve noted from a long ways off over the last 7 years or so from visiting this site (I’ve no personal acquaintance or mutual connection and I’ve never owned any maid, and I am not even remotely equipped to actually evaluate the project.) The man is staggeringly open minded, given the other traits including expertise and competence it seems almost unthinkable that he is so willing to routinely and humbly take all suggestions and ideas that arise under apparently real consideration. Also flexible enough to reverse course at great cost on already acknowledged highest quality work- literally sunk cost be damned(!) if it be the right thing to do and has demonstrated a willingness to do this as many times as necessary because he’s apparently totally uncompromising every where it matters. His outlook is unwaiverling optimistic yet also always stoically realistic. And he always comes across as reasonable, level headed and kind. Countless trolls have tried to provoke him as evidenced by the record on this site alone (some of them heads of other projects) but in response he was always charitable and committed to the higher good and generally totally tollerant of a lot of strongly opinnonated posters. And even where the responses came quick they were always even keel. I looked at my stats yesterday and since the site started tracking it had me reading about 3600 posts over 6 years and I can think of only one instance where he expressed something like measured but quite appropriate anger- seemed so apt I can’t even remember what it was.

The word ‘leader’ is toxic for me but if we had to stomach the notion of it as a reality David Irvine would be an example at least in my opinion. This man is true and faithful, and although again I don’t know him it is surely also the opinion of those who do- that is the strong impression I get anyway. There must be “an invisible sun,” that sustains the human world and at least in part its through people like David. People like David are hope in the best sense.

22 Likes