Safe Network Dev Update - October 22, 2020

Summary

Here are some of the main things to highlight since the last dev update:

  • Internal testnets continue to help us run end to end tests and pinpoint issues.
  • Dynamic StoreCost metrics have been just about nailed down after our initial experimentation.
  • Some major Routing refactoring PRs here and here were merged to master this week, making it easier to read and understand, and therefore debug, onboard and contribute to.
  • We are re-hiring a CRDT consultant to help us achieve our objective of developing a permissionless network of autonomous agents collaboratively hosting CRDT data.

Safe Client, Nodes and qp2p

Safe Network Transfers Project Plan
Safe Client Project Plan
Safe Network Node Project Plan

We’ve continued down the refactor road for our keypair types in the API. We have the basics in place now to use either Ed25519 keys (which would be the default) or BLS keys for clients/wallets, etc. The focus there this week has been on updating the code base and tests. We’re now looking at some finer points around how the various keys allow or disallow cloning, and will be looking to refactor things in general to prevent the need for cloning entirely (which should be safer code-wise). Though that will likely come in a follow-up PR.

We are moving forward this week with more internal testing. Testnet results are getting more and more consistent with various bugs being hunted down and fixed. One of the major fixes that is in progress is within inter-section communication. Elders can send messages to other Sections either as individual nodes or as a part of their Section to prove Network Authority. There are a few messages that do not need to be accumulated, for example client messages that need to be forwarded to its data Section, therefore sn_node had its own layer of messaging using sn_routing to overcome this. But this left us with a disadvantage as we could no longer validate the forwarding intermediate Authority. Therefore the upcoming change brings in tighter and more secure messaging done solely within sn_routing, while also removing an extra layer of messaging at sn_node.

StoreCost metrics have also been more or less nailed down in the last week with the testnets seeing stable and reasonable fluctuations which should be good for us to start with. Up next, we’ll begin to chaos test these dynamics alongside data operations to optimise the metrics based on observations. This allows us to simulate the effects of running a public testnet where randomness cannot be bounded, helping to prepare us for what to expect when a testnet is released, and to potentially adjust the metrics accordingly.

Routing

Project Plan

This week we first got the major refactoring work to remove SharedState, introduce Node, Section and Network modules merged. This simplifies the Routing crate’s structure, making it easier to read and understand, and therefore debug, onboard and contribute to.

There was then further cleanup work merged which removes the remaining state machine in Routing. This removed the Bootstrapping and Joining states. The bootstrapping process now happens inside Routing::new, so when that function returns, the node is already connected to a section.

Meanwhile, the broken minimal example was fixed this week in this PR. Having this running again has already helped us to reproduce some potential misbehaviours that were being observed by upper layer tests using Routing. We are investigating those closely now.

There was also a mysterious crate dependency CI failure which was fixed by no longer using serde macro derive. We suspect the recent Rust update meant our previous macro derive shortcut was no longer supported, so we will now use a safer import-on-use approach.

More testing and experimentation has been done on the CRDT PoC, this time experimenting with different types of messaging mechanisms when an Elder is being voted for removal. We are trying to use proptest to help us prove/validate these mechanisms can work accordingly in edge cases.

This week we also spent some time working on a separate PoC for dynamic membership in a section using distributed secure broadcast (from the AT2 paper) to provide Byzantine fault tolerance. Our bft-crdts implementation already supports reaching consensus over adding a peer, so the task at hand is to add support for peer removal. A peer may be removed voluntarily or forcefully if detected to be faulty. Both cases require a round of voting to reach consensus, but the latter is more complex as each voting peer must also detect that the peer is faulty. We have some encouraging preliminary results, but this remains a work in progress.

And in other news…

Some good news for the project this week - we have agreed to again hire a CRDT consultant (same person we hired recently, at the beginning of our CRDT investigations) to help us achieve our objective of developing a permissionless network of autonomous agents collaboratively hosting CRDT data.

We’ve settled on 3 changes we need to make to rust-crdt in order to achieve this goal:

  1. All Causal CRDTs need to be modified to reject Ops who have been delivered out of “causal” order AND report back a summary of the missing Ops required to apply the given Op.
  2. Remove internal buffering of out-of-order Ops in ORSWOT and Map.
  3. Introduce a Causality Barrier to bring back the buffering behaviour for users who rely on the automatic buffering done in ORSWOT and Map.

The CRDT consultant is someone who is very familiar to us, and us to them, and who we trust to help us deliver. We expect that they will be onboard with us for a few weeks, in which time we hope to achieve the following:

  1. All Causal CRDTs present in rust-crdt will be hardened to reject out-of-order Ops.
  2. All Causal CRDTs present in rust-crdt will be modified to respond with a summary of dependent ops that are missing before an Op may be applied.
  3. ORSWOT and Map will be modified to remove internal Buffering.
  4. A CausalityBarrier will be implemented to optionally add back buffering.

Note - please respect everyone’s right to confidentiality and do not speculate on names. :slightly_smiling_face:

Useful Links


Feel free to reply below with links to translations of this dev update and moderators will add them here:

:bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

65 Likes

The team is busy driving the Safenetwork onto the launch platform.

27 Likes

Another week, another impressive amount of complexity and progress. It’s all this hammering and robust iteration that gives us confidence the final product will be sound. Look forward to seeing testnets.

:+1:

19 Likes

What @davidpbrown said above - steady progress, stuff getting simplified and strengthened and moving towards a testnet that we can now be pretty confident will Just Work for the vast majority of us. The learning from the testnets will be around confirming that the work done in these last few months performs at scale.
This is why the UI is so important, it must be dead easy, super simple for us ALL to dive in and test. Individually we are unlikely to uncover many bugs but the team will learn from the at scale use of the concepts developed - well thats my gut feeling anyway.

Thanks as always to everyone in and around the team that has contributed.

16 Likes

This is great news, it seems clear that his previous involvement has been very helpful.

Does the decision to make these 3 changes stem from internal test results or has it been in the pipeline? Should we curb our community test expectations until after this work is done?

Exciting progress!!

20 Likes

Another exciting update that definitely feels like we’re in the last mile here. Looking forward to seeing things come together en route to the next public testnet. Certainly feels like ithe next testnet and Fleming will provide confirmation of the viability of Safe at scale. :blush:

19 Likes

These changes are for 2 purposes really.

  1. So we don’t queue out of order (ooo) operations. That could be an attack. Our pattern is error back out of order ops and let the sender fill in any info we are missing (such as a missed message/operation).
  2. Signed operations and causal order. This is an anti fraud measure to secure a CRDT for use in a dynamic permissionless network. It was always a requirement, but to get things running we could do without it till now. We could not launch without it though.

So really tidying up and ensuring the CRDTs can be used in hostile environments. It’s quite novel, but then again … Safe is novel :smiley:

26 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! And good luck to the new CRDT consultant! :racehorse:

14 Likes

Not had time or headspace to keep up day to day lately but getting a real sense of solidity and quiet confidence from these updates. Thanks everyone!

15 Likes

Thx Maidsafe devs for all your continued hard work.

Really super exited to see this in action

Dear consultant you will come with super sayian power out of the Maidsafe “gravity room”. :muscle:

Keep up the good job great team :stuck_out_tongue:
Steven Ballmer developers exitement mode on: Testnet testnet testnet

13 Likes

@zeroflaw I took your meme and added the text to the bottom. Here is what it looks like. :racehorse:

4jh7z1

15 Likes

Space geek pedant alert!!!

That vehicle transporter is empty and looks like it is returning from the pad…

#JustSayin cos Im a pedantic PITA

10 Likes

Or its on its way to collect :wink:

10 Likes

I feel that way too, but I think the key here is that it is just feeling and seeming. We don’t have any guarantees that the current path is going to give us a functioning network. PARSEC seemed to be the solution on the paper, but when implemented, it had performance issues to the point of failure. As far as I see it, we might run into performance issues (or some other crucial detail) with CRDT -based solutions as well. And because we don’t know what it takes to make this thing work, we really don’t have any knowledge of the distance to the finish line. I haven’t heard anyone using any other “metrics” to the distance except feeling.

By the way I still don’t know why exactly PARSEC didn’t work? Do we know for sure that it was just sheer size of the gossip graph, or was it some bug causing memory issues?

Whatever it was, I think the use of CRDTs makes very much sense. But reading the last dev Update about bringining in CRDT specialist because of order -related things makes me wonder if we will eventually go back to using PARSEC somehow in combination with the benefits of CRDTs.

And if CRDT alone is the answer, I am a bit scared there may be another project somewhere lurking in darks. The tech has been around for some time.

My thoughts here don’t stem from any technical knowledge, but these are the kind of speculations going on in my head, preventing me from buying and driving the price up for my own part.

This could be the case, but I don’t understand why? I sincerely mean that I don’t understand why smaller market should equal to lower price? It seems to be the case because after we lost Poloniex the price dropped immediately… but after a while it recovered with no major reasons from the project itself.

What is the thinking behind the idea, that bigger markets would automatically lead to higher prices? I get the idea that shrinking markets are bad news and growing markets are good news, but when the market size is past the “news” stage, why would it matter? It probably does, but I don’t just understand why?

(Just like I don’t understand why table salt is so cheap? The cheapest salt on my local shop costs about 1€ / kg and the most expensive about 23€/kg. 1 kg is maybe somewhere between 6-12 months of a household use. No one would give a damn if the cheapest salt would be 2€/kg. So why is it so cheap? It seems to me you could double the price without any impact on the sales. You never see any advertisement saying “Hey we have really cheap salt here.”)

7 Likes

The order issue is not one of getting order, most CRDTs are ordered via partial order to get to strong consistency. The order part here is a small issue and exists in every system. I will try and explain

We have a counter, simple monotonic counter

A sends operations to B
A -> B 0
A -> B 1
A -> B 3 ---- Here we cannot apply this so we wait
A -> B 2 ---- Here we have the missing count

With the wait above we have a counter 0,1,2,3

However we did not need to wait for 2 above. We could have said back to A “hey I cannot apply 3 you are missing 2”. Then A could send us 2 & 3 at the same time.

This is the order issue we are fixing. If B cached out of order ops then it’s an attack. A can send it millions of out ticks and never send 2. That way our queues fill up, plus we have missed a message perhaps and will never ask for it!.

Parsec in conjunction with routing worked in the manner that missed messages were a problem. Some would even kill the network. Yes incredible, I know, however when I took over the cto stuff it was one of the first things to fix. Make messages guaranteed to deliver with prob close to 1, but also and more importantly allows nodes to see a message was missing and ask for it. Even then Parsec still could not work, purely down to never completing tasks as it took too long. Parsec was made (much to my disapproval) production ready, it was months of work, but tested using “mock” objects. So production ready but never used in production. I won’t say more about that :wink:

We call this lazy messaging, and as we have partial order in most data we can spot out of order or missing data and request it. It’s a very simple thing and obvious when you look. So very basic Engineering really, no calculus, no fancy papers, no great greek letters and names, so maybe not sexy enough for some Engineers to spot. But these basic parts of Safe were not obvious to some members of the team for a long time. Now these points are second nature to everyone in the team. It’s easier that way :wink:

22 Likes

Wow, really good explanation, addressing all the things I have been thinking, but not daring to ask. :wink:

(Mods: maybe this should be copied / moved / linked to dev update thread?)

7 Likes

Solid update through and through! I’m really interested to see what comes of bringing on a CRDT consultant.

There’s so many potential applications in a distributed network for CRDTs. In my mind that implies a lot of room for further innovation and exploration, which only adds further evidence of both feasibility and commercial viability from the perspective of those looking in. Not to mention the excitement just involved with chasing those novel applications once the concepting is solidified :slight_smile:

13 Likes

Agreed, with this work we will have fraud resistant CRDT’s (by having Actors sign ops and replicas sign causal order) and also with the deterministic secure broadcast mechanism we show how the Initiator (Actor/Client) can gather the majority votes (consensus an operation is valid) and provide that. So this means the Actor can say here is a signed operation and here is Authority (NetworkAuthority) that I can do this. Now you use your secured CRDTs to provably get strong consistency with your ever changing network of millions and millions of data items that all follow this pattern.

So this simplifies consistency in a hostile network to some incredibly simple rules.

The end of this mini 3 week project cements all of this in place. To me the possibilities for massively concurrent apps is a game changer. An important stepping stone here. I have a next step, but well after launch.

18 Likes

Yes… :money_mouth_face:!!

3 Likes

So is the goal to launch a testnet after all this CRDT work is done? As much as I like the excitement in the last few weeks of updates. It’s getting difficult to track what goal leads to a hands on community network (even if it only ran for a day). safenetwork.tech has been out of date for months, it still references PARSEC. The GitHub project plans also reference PARSEC.

The weekly updates sound great, but they are becoming littered with out of date links and materials. Now you have a much more concrete grasp on the outstanding requirements. Having the confidence to state 3 CRDT requirements, seems like a huge leap in a clear path to what’s left to build.

I feel like it’s time to be brave, and clearly state what’s left until the community is able to help with testing.

2 Likes