Safe Network Dev Update - February 25, 2021

Summary

Here are some of the main things to highlight since the last dev update:

  • There will be a Community Safe Chat virtual meeting on Friday 26th February (tomorrow) at 9PM GMT. Full details here.
  • Bug identifying and squashing continues, along with several efficiency improvements as we make significant progress this week on the path to a public testnet.
  • There have been some significant sn_routing PRs merged this week, specifically PR #2323, PR #2328, and PR #2336. Full details below.
  • Two further significant sn_routing PRs, #2335 and #2336, both critical for a stable testnet, are now raised and should be reviewed and merged in the coming days.
  • @scorch has submitted a PR to (finally!) resolve ā€œthat issueā€ where the versions of the CLI itself and the external binaries such as sn_node or sn_authd where being confused.
  • Check out @bzeeā€™s excellent digging and analysis here as he looks to improve and update the bindings to Node.js.

Community Safe Chat: Friday 26th February, 9PM GMT

Community driven marketing efforts continue thanks to the work of @sotros25, @m3data, @piluso and others. Tireless efforts we must say!

Thereā€™s been discussions and strategising on the forum, and in small zoom meetings over the past few weeks, including this one where we discuss some of @sotros25ā€™s market research on adjacent projects.

We are delighted to have another virtual hang-out this Friday, with an open invitation for all community members to participate.

The first 45 minutes of the conversation will be about the community marketing strategy. After that, weā€™ll open up the floor for broader discussion on the Safe Network. The aim is to help define the marketing strategy, and to also use content from these discussions as video to build awareness and engagement.

Please be aware that this call will be recorded, streamed, and rebroadcast, so those whose schedules or time zones donā€™t quite work wonā€™t miss out.

Full details and the link to join will be made available here.

Safe Client, Nodes and qp2p

Safe Network Transfers Project Plan
Safe Client Project Plan
Safe Network Node Project Plan

Work here this week has all been about stabilising networks. We started the week investigating some odd behaviour, only seen with certain logs enabled, which eventually led us into a wee snippet in routing where we were holding a lock for the duration of a long async call (sending a client message), which was causing a deadlock in responses. It was a tricky one to figure out; but once we realised that the lock was called as part of the if statement, we just had to move that out and ensure the lock was dropped once we had what was needed there. This got the client responses coming through again.

With the changes to messaging infrastructure completed last week, nodes do not include any logic for aggregation of signatures anymore, fully relying on routingā€™s ability to do so. Previously we only had aggregation at source, where Routing aggregated the signatures at the Elders before sending it to the target, resulting in the message with the aggregated signature being sent to the destination multiple times, with duplicates being filtered out at the target node. By moving the signature aggregation to the destination, we can reduce some load from the Elders and significantly reduce the number of messages being exchanged. We added support for destination accumulation in routing and used it in sn_node for the communication between a section and its chunk-holding Adults. With the above two fixes, we now have all client tests passing against a single section, with a massively simplified node code. However, a follow-up PR is needed to cover an additional use case, which is comms between Elders in one section and Elders in another section, a part of the rewards flow (as section funds are managed by one section, but held/verified in another section). This is being covered as we speak, and should be merged before end of week.

On that front, with a little update to some routing events we now get our section-sibling PublicKey on split so we know where to send tokens to populate the resulting child section wallets. After some further flow debugging there, that appears to be going through and weā€™re now debugging a wee routing loop on split, where section info is not being detected properly, and weā€™re repeatedly passing the same (wrong) info to a new node. We are also refactoring the communication pattern amongst clients, nodes and their sections, where previously outdated section keys were causing bugs in the rewards and transfer flows. Therefore, weā€™ll now be enforcing PublicKey knowledge checking and updating with every message that would be sent across the network, and consequently have all the peers up to date with the latest knowledge of the network.

To start with, the split of the section funds will be one transfer chained after another, as we still donā€™t have one-to-many transfers, and chaining was a trivial task, working well enough for a testnet. However, with the refactor of TransferAgreementProof a couple of months ago into a signed debit and a signed credit, we can now relatively easily implement one-to-many transfers by including a set of credits. A goodie for later :).

As a lower priority task, and in parallel to the above, we started preparing for the upgrade to a new Quinn version that allows us to finally upgrade to the stable Tokio v1. Weā€™ve just given it a try and are preparing the PRs for this migration to happen as soon as Quinn release v0.7.0 is published.

Another improvement which made it in this week concerned the deletion of private data. Before the newly merged changes in self_encryption and sn_client, deletion of a private blob meant deletion of only the root blob which was the data map of the actual data that is self-encrypted and stored on the network. Our latest addition to the team, @kanav, has implemented a recursive deletion approach that deletes the individual chunks, along with the chunks that store the data map(s), achieving deletion in a true sense.

API and CLI

@scorch has submitted a PR to remove the -V option from CLI subcommands to avoid confusion between the version of the CLI itself and the version of external binaries such as sn_node or sn_authd. This change also included the addition of a bin-version subcommand to $ safe node and $ safe auth subcommands to fetch the version of the external binaries, so that the semantics are clear, along with the distinction between the CLI version and sn_node or sn_authd versions.

Currently, the qjsonrpc lib is implemented to support the JSON-RPC 2.0 standard. That said, there are certain error codes that are defined in the spec that were not exposed by the crate. This means consumers need to redefine the same constants themselves, which isnā€™t necessary since they are in some sense part of the implementation. For this reason @scorch also submitted a PR to expose these error codes as constants from qjsonrpc.

As mentioned in the previous section, weā€™ve also been trying to get ready for upgrading to Tokio v1, thus we have been preparing the CLI and authd crates for such an upgrade by doing some preliminary tests.

CRDT

Weā€™ve been iterating on the CRDT underlying the Sequence type in sn_data_types. Previously, Sequence was implemented with LSeq. We tried out a simpler List to resolve some panics with deep inserts, and then moved to GList to support the grow-only use-case. On analysis, all of these CRDTā€™s donā€™t do model versioning as weā€™d like them to. They try to linearise the order of documents, when in reality a document history forms a DAG. We have a design for a Merkle-DAG Register CRDT which would allow us to model document history faithfully and to read the most up to date versions.

We have also started removing the mutability of the Policies from our mutable data types, i.e. from our CRDT-based data types like Sequence. In our current implementation weā€™ve been trying to solve all types of conflicts that concurrent Policy mutations can create on a CRDT data. This has proven to make things quite complicated while not even covering all the possible scenarios for conflict resolution. Therefore, we decided to start moving onto a different approach where Policies become immutable once they have been defined at the creation of a piece of content. Changing a Policy will then mean cloning the content onto a new address with the new Policy, and some mechanism for linking these different instances can eventually be created and only used on a case by case basis by the applications.

BRB - Byzantine Reliable Broadcast

Our attempt to integrate the sn_fs filesystem prototype with BRB has exposed a couple of rough edges. The reason is that sn_fs is receiving operations from the operating system kernel faster than they can be applied by BRB. To this end, weā€™ve come up with a couple of related solutions: 1) bypass the network layer when sending an operation to self, and 2) keep track of when peers have received ProofOfAgreement so we can avoid sending the next operation until 2/3 of peers have applied the current op. This is necessary to meet the source-ordering requirement of BRB. Source-ordering means that operations coming from the same source (actor) must be sequentially ordered, however operations from many different actors may be processing concurrently.

Also as part of the sn_fs integration, we modified the brb_dt_tree crate to support sending multiple tree operations within a single BRB op. This effectively gives us an atomic transaction property for applying logically-related CRDT ops in all-or-nothing fashion. We intend to use this same pattern in other BRB data types.

Routing

Project Plan

This week we merged a PR changing the way client messages are handled, and so now they can be routed through the network the same way as node messages. This means that a client can send a request outside of its section, and receive a response back even when the client is not directly connectable to by the recipient(s) of the request due to restrictive NAT or similar issues.

As detailed in the Nodes section above, we also implemented message signature accumulation at destination, which means the users of routing no longer have to implement this flow manually, resulting in simpler code.

Finally, the fork resolution PR is now up and undergoing review. During the work on this PR we discovered a few additional bugs that were not related to fork handling. Throughout the week we have been busy debugging them and as of today, it looks like we mostly fixed them all. The internal stress testing results look very promising, plus we managed to run a localhost network with 111(!) nodes on a single machine and everything went smoothly. A PR with those fixes is currently up in draft status, and should be ready for review soon.

Useful Links


Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French ; :bulgaria: Bulgarian

As an open source project, weā€™re always looking for feedback, comments and community contributions - so donā€™t be shy, join in and letā€™s create the Safe Network together!

79 Likes

firstā€¦ muhahahahhahaha

23 Likes

one SECOND too late

20 Likes

third#1!!!

17 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! Remember not to work too hard!:racehorse:

35 Likes

Hereā€™s the tweet supporting this update so you can like, retweet & comment!

If there are any topics in particular that you want to cover during this weekā€™s Safe Chat, please DM them to me.

27 Likes

hell yes. that sounds like a very productive week. a couple more of those and who knowsā€¦ we might have our testnet :slight_smile:
special thanks to @scorch and @bzee for their continued work

22 Likes

Here is a meme for all of the Maidsafe developers who are working just a little too hard! :racehorse:

13 Likes

Bugs killed, efficiency rising, tests passing, community contributing, rewards accruing. Nice work SAFE.

26 Likes

Thanks for publishing the market research video and for all your hard work @Sotros25 @piluso @m3data and @JimCollinson. It was a very interesting watch and even though you didnā€™t come to many conclusions, you all seem to be basically in agreement and I know thereā€™s a lot of value in simply hearing the same ideas in different words because thatā€™s where the ā€˜ahaā€™ moments tend to arise. I feel confident weā€™ve got some great professional minds on the job and itā€™s just a matter of time. Oh, and whatā€™s that about a messaging app? Are you going to surprise us Jim? :wink:

I thought Mattā€™s point about projecting hope was very pertinent. Encapsulating that hope in a few words in a way thatā€™s arresting, original and appealing to the audience is the hard part - and why companies spend millions on advertising agencies. Also he was spot on about the branding - from here on in there needs to be consistency.

Blockstack is an interesting case study. They have millions of dollars at their disposal, a slick website, lots of devs, and the whole Ivy League thing, but while looking into it, I was left with a feeling of ā€˜Yes, but whatā€™s it for?ā€™ Because thereā€™s no clear vision for people to get passionate about Iā€™d bet anything that those street murals were paid for!

17 Likes

This sounds nice indeed. Welcome @kanav!

33 Likes

So pleasing to hear about more pieces of the puzzle fitting together, and less bugs getting in the way. Thanks to all the devs and contributors for the work!

Just watched the video and trying to catch up with the marketing efforts. Not my specialist area, but blown away by the research by @Sotros25, and the general sense that folk are going to make it happen!

20 Likes

Really great to see this being solved. Tricky bugs are satisfying to see squashed, congrats to the people who dived deep on this one.

15 Likes

Well, that was a lot! Quite technical and thus a bit difficult to grasp for me. But looks like the thing is getting (better)Ā².

@kanav , wellcome!

Sounds incredible! That single machine is some better than average desktop machine, I suppose? Iā€™m curious about the details: Why that particular number? Why not 110 or 112? Was there many sections, or only one?

@mav do you think this helps with the Node performance issue you have been looking into?

12 Likes

Just a laptop AFAIK :smiley: These are routing nodes though not full blown Safe nodes, but still would have been inconceivable previously.

20 Likes

Looking good

11 Likes

I had as much trouble understanding your post as I do the Weekly Updates

10 Likes

Could be. Iā€™ll try testing it out sometime. Sounds promising.

6 Likes

I donā€™t know how to use network but itā€™s sound great

6 Likes