Safe Network Dev Update - September 17, 2020

Summary

Here are some of the main things to highlight since the last dev update:

  • We were able to git clone the crdt_tree repo and build it with cargo build in the safe_fs filesystem prototype, another encouraging stage reached in its evolution.
  • Our newly async components are interacting more each day, with Safe Client now able to receive events and errors from the network, and vaults now serving events.
  • A fully asynchronous vault binary took a huge step closer this week with this PR being raised and now going through our review process.
  • A significant routing bug fix PR was raised today, resolving almost all of the parsec removal bugs.

CRDT

safe_fs

Work has progressed on the (local only) safe_fs filesystem prototype this week to make it more usable. Support for OS specific (POSIX, Windows) inode attributes were added, e.g. uid, gid, mode on POSIX and hidden, readonly, compressed, encrypted on Windows. These should make safe_fs more useful as a backup/restore solution for local files.

We set a goal this week to be able to git clone the crdt_tree repo and build it with cargo build. This exercises the filesystem substantially. Initially, memory usage would increase rapidly due to storage of file contents inside the inode (in crdt_tree metadata) and further it was unusably slow due to use of large sparse-files being repeatedly copied in memory. We switched to a pass-through type of system, where file content (only) is stored in the underlying disk filesystem. This performs well enough that we were finally able to complete a full build, which is encouraging. We also started to run some filesystem testing tools against it to identify any further problem areas.

Safe Client Libs, Vaults and qp2p

Safe Network Transfers Project plan
Safe Client Libs Project plan
Safe Vault Project plan

The Safe Client can now listen to the network using the qp2p stream that’s initialised on network bootstrap. Network queries will still use their own streams to keep response handling nice and clean. But this gives us the ability to receive events (e.g. TransactionValidated) or errors (e.g. InvalidTransaction) from the network.

Vaults are also now set up to be serving events. There’s some rebasing to be done and a bit of refactoring to get this into master, but with the listener and event sending we’re in a very good position with regards to getting the full AT2-ready client going and Safe Client Libs tests back up and running.

Async vaults has also progressed well alongside the other modules undergoing the async revolution this week, with this PR being a leap towards a fully asynchronous vault binary. The introduction of async/await across all modules in the vault has also allowed us to refactor the vault test environment so we can set up and test a complete network with the asynchronous vaults. Fully-fledged sections are getting more and more stable as we integrate, debug and fix them with the new async routing module. With the listeners/responders set up at SCL and vaults, client tests are the last stop in the pipeline for us to finish building a prototype farming network.

Routing

Project Plan

This week, alongside the manual testing we are doing where we run vaults and clients using the new async API across the crates, we also started creating a new test suite in routing to validate some of these scenarios. This new test suite doesn’t make use of any mock or fake components, instead launching real routing nodes on localhost and verifying the routing nodes can bootstrap, send messages to each other, and of course report the events as expected on its API, which is what the vault consumes.

We are also doing some research around how to make the data kept by a routing node more CRDT-compliant, which will allow us to test routing data mutation in isolation to verify it can handle network messages properly, regardless of the order they arrive. Additionally, we expect this to give us more confidence in the correctness of the main business logic of a routing node.

We created a PR fixing some of the bugs that remained after the removal of parsec. It overhauls the DKG module, fixing some edge cases and making it simpler, while also fixing some issues around relocations. The test failures went down quite a bit with this PR, but they are not at zero yet. There is more work to do but it seems to tie nicely into the CRDT work we started.

Useful Links


Feel free to reply below with links to translations of this dev update and moderators will add them here:

:bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

70 Likes

Woohooooooo yeah

21 Likes

Thank you for your hard work! :pray:t2: The world is in critical need of Safe Network.

25 Likes

:partying_face:

:+1: Good job keep at it… looking forward to having something more to play with.

20 Likes

Sounds like CRDT everything :smile: this is unrelated to the semi lattice of shared state correct?

Great work everyone @maidsafe all the different working parts sound like they are getting a fine tuning and starting to get tied together and showing some promise. Farming ready prototype network sounds fuuuuun :grin:

14 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! Keep the enthusiasm going! :racehorse:

4ba2m3

25 Likes

wow - I think that’s my favourite meme so far… simple before and after, is powerful!

15 Likes

Thx for your continued hardwork Maidsafe devs…

Love seeing you super ants make progress, Earth need this baby up and running

Hyper exited for another testnet or livenet

Cheers and keep up the good work

18 Likes

Great progress again, thank you!

I read the linked PR below and got some questions:

In Github it says in regards to this PR:

Unsolved issues:

The above mechanism can sometimes lead to more than one successful DKG outcome (usually during heavy churn). There is currently a simple measure in place which blocks voting for section update if another one is in progress. but it is not completely sufficient because the DKG results can arrive in any order and it can cause section to stall. This is quite rare but still happens and needs to be solved.

You know I don’t know programming at all, so my thoughts here may be really out of place. But I have been thinking about this change away from parsec, away from total order “God” algorithm, and I wonder if it is possible? Or if there will always remain some edge cases, where “God” would be the solution? Or some other outside observer. Like would it be possible to ask another section to guide us through our section splitting? Would that solve some problems related to decicion maker collapsing during the process of choosing new decicion maker? Could “God” be mostly resting in peace and be asked to stop by only when needed?

Does my thinking make any sense? I just have this vague uneasy feeling that there may be some difficult paradox in the “renewing the decicion maker”, if you know what I mean?

9 Likes

CRDT algorithms provably enable all replicas to arrive at the same state (data+order) eventually. So yes, it is possible. And they do not have to wait for each other, which speeds things up. Our challenge now lies in adapting older code/data/logic to this model.

23 Likes

Thank you for your persistent updates and work MaidSafe Team! It’s great to keep track of the progress in this way.

8 Likes

My concern was not so much about the actual technical solution, but about the abstract principle itself. If the valid “decicion maker” is a board of elders, it seems that when it splits / disintegrates, there is going to be moment when it doesn’t exist, so what carries “the spirit” over this gap of “body” not existing? (Jeez, I’m so sorry about the language I use here, not being able to but this in more concrete terms…) Even If CRDT guarantees all replicas to arrive in the same state, in sounds to me that the state where every replica arrive is decided by one replica (one elder). So it sounds that the principal of “group decicion” is temporarily switched of in some situations, where new section is chosen?

I guess my question is how it is guaranteed that the set of new elders is always chosen by consensus of other elders, so that there is not going to be a moment of chaos in the room, when someone with gun takes the lead?

I hope I’m not distracting you guys too much from actual work.

3 Likes

I’m not sure of the mechanism, but maybe I can suggest ways that the issues you raise (which are good points IMO) could be addressed, and may be the way it’s done.

Firstly, there will probably be several things that need to be able to carry on over the time when a section splits, not necessarily just selection of new elders. So how can we move from one functioning section to two new ones?

Firstly I believe we will be able anticipate that a split is coming and elect enough elders before it happens, so that there are sufficient elders in the two new sections to function immediately the split occurs. This can happen over time as the section side grows.

In which case the question becomes what happens to an operation that is unlucky enough to be in progress as the section decides to split?

The best way to cope with this kind of issue is to cater for an operation being initiated but not completed, because there are all sorts of reasons that might happen. So assuming that’s is baked into the design, I think it will be able to cope with a section split causing an in progress operation to fail.

I don’t know the detail of how incomplete operations recover, but I imagine it will be up to the client to detect this, and decide what to do - such as try again.

I don’t know if my description is accurate, but hopefully it gives some idea as to how the issue can be handled, even if it isn’t exactly how it will be implemented.

11 Likes

Thanks, I knew there would be a simple solution like this. :wink: And maybe we can have a couple extra new elders just in case one of them goes offline during the split.

7 Likes

I’ve been away from the details for a while and I may be missing some things, but isn’t it fairly simple?

The idea would be to always build for twice the # of elders necessary to run a section, perhaps a bit more than that, spread as evenly as possible across the section’s address space. All are doing the whole section’s business so have the necessary data and connections. When a split is needed, then, the designation of who goes in which section is either side of the split point. Bob’s your uncle. :grin:

I"m sure it’s much more involved, design-wise, but isn’t that the basic simplicity?

7 Likes

I may have missed a lot of updates, but why are we removing parsec? Wasnt it supposed to be an amazing new way of doing things?

2 Likes

forum search “parsec remove” → SAFE Network Dev Update - May 28, 2020 - #28 by dirvine which detail more why what is now is better.

7 Likes

Simple (minded), non-developer-type US investor here. I enjoy reading these dev updates, but many times I feel like the pet dog who can’t understand 99% of the language spoken here, except when I hear “Bacon treat!” and the tail begins to wag.

“A significant routing bug fix [PR] was raised today, resolving almost all of the parsec removal bugs.”

Thanks for the treat. :dog:

Cheers

11 Likes

It’s a good point. Who reads the Dev Updates and how much do they understand of them? As a non-techie I would like some more “plain english” translations. The updates are translated into Bulgarian and Greek, to name the ones I know of. My knowledge of both is a bit rusty, but translating “The Safe Client can now listen to the network using the qp2p stream that’s initialised on network bootstrap” is hardly in the same league as “In which cupboard does your uncle keep his groceries?”.

4 Likes

IMO, it would be great to state what is left to do and the progress towards it in the updates.

The detail is interesting, but many folk will just want to know how things are progressing.

10 Likes