Safe Network Dev Update - September 10, 2020

Summary

Here are some of the main things to highlight since the last dev update:

  • The main parsec removal refactoring PR has been merged into the upstream routing master branch. :tada:
  • The safe_fs FUSE integration is progressing well with us now having implemented a proof-of-concept in-memory (local) filesystem utilising crdt_tree that supports regular directory and file operations, plus symlinks and hard links.
  • The safe_vault crate switch over to follow the async/await paradigm is well underway, with the porting pretty much done, and integration with the other async crates progressing well.

CRDT

safe-fs

Over the last week, the team did some additional review of the crdt_tree code and we made some minor improvements, such as inlining functions, logging, and comments.

Perhaps more exciting, work is progressing on the safe_fs FUSE integration. We were able to implement a proof-of-concept in-memory (local) filesystem utilising crdt_tree that supports regular directory and file operations, plus symlinks and hard links. Much work remains, but it is a nice milestone to be able to actually mount the filesystem and interact with it.

Safe Client Libs and qp2p

Safe Transfers Project plan
Safe Client Libs Project plan
Safe Vault Project plan

This last week has seen further progress in quic-p2p (now: qp2p), with streams being exposed to enable listening to network events. The reuse of streams for multiple messages that was recently added required the length of the message to be known ahead of time. To facilitate this requirement, along with other metadata such as message flags, we have now formalised a message header which will be sent over the wire before sending the message itself. It is of the following format:

Version Message length User message flag Reserved
2 bytes 4 bytes 1 byte 2 bytes

This standardises how messages are sent over the wire, while also allowing us to introduce backward compatibility with the help of the version flag.

The process of integrating this into routing, vaults and clients is well underway. We now have small sections working again, are able to bootstrap clients to them and perform queries once more. So as we get event listening set up in clients weā€™re getting closer to a holistic network with all the changes in.

In parallel to this, weā€™ve tied up some loose ends in terms of Continuous Delivery, catching a Changelog generation issue in Safe-ND (now: sn_data_types), and fixing the GitHub release generation. That more or less completes the first CD process for one of our Rust repos so weā€™ll be looking to roll this out to others as it really does simplify things there.

With the first iteration of async routing just about wrapped up last week, up next is the safe_vault crate. We are working to make it follow the async/await paradigm and with the porting pretty much done, we are now integrating and debugging all of the async crates in unison to spin-up a section and start running real-network e2e tests (no mocks). This will help us identify the real-world connectivity issues and hiccups, and have them resolved before we put them out to the community.

Routing

Project Plan

This week we merged the main parsec removal refactoring work into the master branch :tada:

There is still some ongoing work to resolve a few remaining parsec removal issues but we are confident that these issues are a formality. Meanwhile, Adam is also working on improvements to the DKG process which should also resolve some issues we are seeing after removing parsec. To summarise, there is no need to ever have more than one DKG session per node, there should only ever be the most current one (corresponding to the most recent churn). More details to follow in the PR which should be raised in the coming days.

In parallel, the work to expose an async API is now based on top of the latest master branch. Once some final minor issues are addressed, this PR will be reviewed and merged. It is our intention that any further failing routing test investigation/resolution work will be based on the parsec removal and async API work. This new async API is already being used for testing vaults and clients, proving to us we are on the right track.

We have also discussed and started to investigate using real routing subcomponents for some network tests. Expanding the current minimal example will probably be a good starting point.

Adapting routing automated tests to the new async API is also a task weā€™ve just started this week. We will be focused on trying to split existing tests into two different categories, one suite for the most basic and general use cases, and a second suite where edge cases and more difficult to reproduce scenarios will be tested with the help of mocks and/or simulators for some subcomponents.

Standardisation

We mentioned a couple of weeks ago that we were undertaking some repository and crate renaming tasks in order to standardise. Anyone keeping an eye on our GitHub activity will have noticed this has taken a big step forward over the last week with the majority of our repositories and Rust crates now updated to snake_case, prefixed with sn_, if Safe Network specific, and a couple (so far) renamed to be more accurate (safe-network-signature-aggregator ā†’ bls_signature_aggregator and safe-nd ā†’ sn_data_types).

The remaining repositories and crates will be updated in the coming days, assuming we can agree on what to name them :smiley:

Useful Links


Feel free to reply below with links to translations of this dev update and moderators will add them here:

:bulgaria: Bulgarian

As an open source project, weā€™re always looking for feedback, comments and community contributions - so donā€™t be shy, join in and letā€™s create the Safe Network together!

:love:

72 Likes

The fiesta one
Go Maid go!!

19 Likes

Thx for the update Maidsafe devs and keep up the good work

place #2 :stuck_out_tongue:

17 Likes

Storms a brewin! Cut the fat, full on async, heavy clean up, real network testing AND a file system PoC working? Weā€™ll be in the eye of the storm within a few weeks I reckon :captainsafe:

26 Likes

Sounds great, whats the expected time frame for another community test network?

12 Likes

omg :open_mouth:

This sort of talk about basic relatable features is so exciting.
Can you imagine being able to mount Safe Network as a drive!!

Could do with more of the same detailing the ambition for ā€œstreamsā€

:+1:

:davidpbrown:

20 Likes

Another excellent weekā€™s progress. How many weeks in a row have you people just been knocking tasks down? I remember a much hazier vibe several years ago, but the last six months have felt like you all have just been machines. Itā€™s got to feel good.

15 Likes

Thanks again for all the hard work everyone! I think @Nigel may be a little optimistic with the timeframe, but who knows?! Itā€™s gonna happen one of these days!

8 Likes

:smile: Iā€™m an overt optimist for sure. Weakness and a strength at different times but I was really thinking more like a test net or something for the community to play with if that didnā€™t come across. Iā€™m def not suspecting Fleming or MVE. Thereā€™s a decent runway till then but I bet once that comes in sight the team could surprise us yet again with their speed and efficiency.

19 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

12 Likes

Keep up the good work :+1:t3:

10 Likes

Shout out to @Scorch for his PR!

19 Likes

:100: It was terrific to see that happening. Gives us all faith in open source and itā€™s also less maidsafeā€™s network this way. The more the merrier

20 Likes

Is it possible to get a really basic explanation of why this new async paradigm is important? Thereā€™s a lot of talk about switching over to it and it seems to be taking a lot of time and work and focus over the last several updates, but I donā€™t recall having seen a simple normal-person explanation about why this is worth doing. What benefit does it bring to the core codebase and developers? How will it affect app developers? What benefit does it bring to end users?


Really great to see this.

Some related info about how versions, protocol upgrades and signalling has evolved over time in bitcoin. We have 10 years of history and experience to guide us with this feature.

BIP-0034 - Block v2, Height in Coinbase

Clarify and exercise the mechanism whereby the bitcoin network collectively consents to upgrade transaction or block binary structures, rules and behaviors.

BIP-0009 - Version bits with timeout and delay

a proposed change to the semantics of the ā€˜versionā€™ field in Bitcoin blocks, allowing multiple backward-compatible changes (further called ā€œsoft forksā€) to be deployed in parallel.

BIP-0008 - Version bits with lock-in by height

an alternative to BIP9 that corrects for a number of perceived mistakes. Block heights are used for start and timeout rather than POSIX timestamps. It additionally introduces an additional activation parameter to guarantee activation of backward-compatible changes

BIP-0068 - Relative lock-time using consensus-enforced sequence numbers

The change described by this BIP repurposes the sequence number [within bitcoin transactions] for new use cases without breaking existing functionality. It also leaves room for future expansion and other use cases.

Interesting to see an unused field finding an alternative use in the future. Sometimes a bit of slack is handy to build in, and sometimes itā€™s hard to know which features will be useful and which wonā€™t be. In this case the unused feature turned out to be a handy substitute for a new feature.

BIP-0135 - Generalized version bits voting

a generalized signaling scheme which allows each signaling bit to have its own configurable threshold, window size (number of blocks over which it is tallied) and a configurable lock-in period.

BIP-0320 - nVersion bits for general purpose use

reserves 16 bits of the block header nVersion field for general purpose use and removes their meaning for the purpose of version bits soft-fork signalling.

16 Likes

Simplicity of code is a driver. So where we have recursive code or callback type things then async cleans that up. Making libs async makes code more readable. A bit like javascript callback hell is made much cleaner with promises.

Also with fs/io read write you get re-entrant functions/methods. so you can have stuff like

asnc fn something() {

get_web.page().await;
render_page().await;
do_a_quick_change();
store_page().await;
}

So you end up with synchronous looking code blocks but actually, there is a load of waits, so re-entering the method when each await returns like above means the code looks much nicer than say a load of loops to wait for returns etc. or even worse, callbacks etc.

[edit
While the await ā€œwaitsā€ the processor moves on and executes whatever other code it can, so think threads are to share work and async/futures are to share tasks in code]

18 Likes

Ok cool, so with the async feature

  • core devs can understand, maintain and extend the core network codebase and features more quickly and reliably
  • app devs will have a simpler time reading the api docs and writing code for their safe network apps, but will need to have at least some understanding about the aysnc way things happen on the network
  • end users will see features arrive sooner and more reliably with less bugs and breakage of parts unrelated to that feature, and the network will be faster overall

Is that about the right degree of impact for this change? Maybe Iā€™m trying to over-analyse or simplify the impact here; just I felt like when my friends or non-tech people read the update I want them to understand the reason for async happening.

17 Likes

Yes, also the code should be more efficient. By that I mean rather than us coding loops etc. async takes care of that work, but in the language, so implemented very efficiently.

An example thatā€™s good to show as an addition is this. In routing we wanted to try to send a message up to 3 times over 30seconds (say). So we had to change qp2p to take a token (u64) (should have been u8, anyway. qp2p took that token and routing set it at zero, if the message failed then we got it back and waited 10secs then sent it again with a token of 1 ā€¦ and so on. So we had to break an API , put something in a network lib that should not have been there and so on. now all we need to do is

async fn routing_send_msg(msg: &[u8]) {
for _ in 0..3 {
if qp2p_send(msg).await.is_ok() { return; }
sleep(10sec).await;
}
vote_node_offline().await
send_to_other_target_or_whatever();
}

Pseudo code obviously, but you see the point. So here it allows us to do more without passing stuff around APIā€™s as well.

But your points are all correct. A client could await 100000 times for chunks and so on and it will all be Ok.

[EDIT Also with the above problem thatā€™s now simple with async, we also had to create a Timer.rs, time.rs and a fake clock, none of which are now required, so we get better performance, much less code and a more efficient and understandable codebase]

16 Likes

Not sure how relevant it is to this work, particularly given that it is in Rust, but Iā€™ll add that async is generally lower overhead and less error prone than multi-threading.

10 Likes

For me itā€™s this about 1000x. Having started recently diving about the core libs (vaults/scl), and having a hard time there. Doing the shift to async drastically simplifies things. I could finally see what was going on. And when you can see whatā€™s going on, itā€™s much easier to reason about things and so make the jump to cutting big complex chunks of code (see recent scl refactor chopping out 18k lines, or the qp2p refactor itself).

It also simplifies a lot of multi-threading (maybe it could have been done otherwise, but I could not see how; but then I am still relatively new to rust). We had a lot of specific structures and indirection to manage things like qp2p and an event loop driving the whole of SCL which caused a lot of complexity. With qp2p going async, and SCL too, we were able to remove that and simplify the core structs massively. It is so much cleaner now. It should be much easier for folk to come in and look at the code and see whatā€™s going on, suggest improvements etc.

IMO itā€™s a very healthy thing for us to be doing and has already proved itself worthwhile in terms of enabling us to move forwards faster.

19 Likes

Just want to add that using async in Rust is very nice indeed, and makes doing multi threaded code easy compared to the rocket science it is without it. The result is much less code, far fewer bugs, easier debugging and maintenance, and greater efficiency.

Iā€™ve not done much yet, but the concurrency needed for my logterm-dash app was a breeze because of this. Getting the concurrent threads coded and working was literally fifteen minutes, instead of probably an hour reading and who knows how long writing and fixing the code. I was literally shocked at how easy it was.

Using Rust is in general an incredible experience, because the compiler wonā€™t let you write unsafe (buggy) code. So here I am, with a lifetimeā€™s experience of C, C++ in particular, learning how to write bullet proof code because Iā€™m forced to think it through at a whole new level. Having debugged compiled code at the assembly language level I thought I understood this, and I did in part - what gets put on the stack and the heap, and how each variable is stored and accessed in memory. But Iā€™d never thought about ā€˜borrowingā€™ in this way, even though it is fundamental to writing solid multi threaded code. I suspect that I developed a way of coding that avoided these issues to some extent, but would then end up having to spend literally days, sometimes weeks tracking down and fixing tricky bugs.

And with a compiler that can suggest cut and paste fixes for my basic errors, life for a newcomer is made much easier. I still struggle with borrowing, but each time I am learning a bit more how to go directly to the solution, and Iā€™m being taught why my first attempt was buggy.

Using Rust a really nice experience because I love to learn, and it is teaching an old dog new tricks!

23 Likes