So this week had a few surprises in store for us, some good and some not so much in terms of Test-12c. Firstly, we completed the routing patches (that we mentioned in the last dev update) on Friday and early this week, we also had the routing soak test run about 1000 iterations successfully. These are tests that use the mock crust feature to simulate different types of network setups and specific cases we’d like to test. This passed all the way through and was great news.
We then moved on to testing the production vaults in droplets with internal tests. This was where the not so great news occurred. We hit a few issues very early on that we patched, however, we continued to see a few more issues through the week. It’s worth noting that each issue that needs to get debugged and patched takes a bit of time since this involves collecting about 1800 log files and spotting the source of the issue, (fun times ). The Routing team went at it hard (so hard that we’ve made Andreas and Diggory sick) to try and get these resolved ASAP. This was further complicated with the new Crust version we were using in the live network, and involved us running two internal tests to confirm Crust wasn’t the variable here (it certainly wasn’t, so looking good from that side, more good news from Crust coming up).
Now for some more good news, we’ve currently had an internal test network running since yesterday afternoon that has gone through about 2100 churn events. That translates to about ~3500 nodes joining and leaving the network. The internal tests also test some particular corner cases such as lots of merges/splits happening in the network at the same time as nodes joining/leaving. As of writing this dev update, that network is retaining data while the invariant is holding, this is certainly great news. All is not completely done yet though. We have one other issue we’ve spotted that can occur when a candidate (new node going through resource proof check) is dropped by a section after it splits which in an edge case prevents the split section from ever re-connecting to that candidate. The bug meant that the section did not get rid of the candidate properly after the split which prevented them from even re-establishing the connection to that candidate (who by that time might have been promoted to a full node). This is certainly an issue and @qi_ma is patching this today. We’ve not run into this edge case in the network within the last 24 hours.
Once we have the patch for this in place, we intend to keep repeating the internal tests until we can confirm a bug-free network before progressing to the most elaborate test of user run vaults (Test-12c) From the results this week we’re hoping that we’re not far away, but please do take that with a grain of salt as this situation is very fluid as you can imagine.
With all this happening at one side this week, we’ve got some really good news to share from some of the other teams, specifically the progress being made in protocol support from Crust for uTP, and various hole punching tests that we’ve tried in combination (all this will increase the number of people able to join the network and not just rely on TCP). With frontend APIs getting more reliable, @ustulation and team are also starting their work in Vault to support — yep, you guessed it! — mutable data . This will enable the test networks to move from the SD/AD data types, introduce the new data type MD which will then facilitate the Authenticator paradigm in place of Launcher, which would get deprecated at that stage. Just to note this isn’t a patch job that we expect to be done with in a week, but we’re certainly in motion there to get vaults to support this data type. We’ve also got the frontend guys going hard at their applications and docs which we intend to share more with you this week. Please check the individual module sections for more information in all these aspects.
As you already know, we have some of the team in Asia which is great, but as a small team it does hurt dev progress a little back home. So this balance is a fine one that we choose to make, even though it does add a lot of time pressure. We feel sure though that getting the SAFE name out and solidifying relationships with our Asian partners will benefit the SAFE Network and all that is stands for in the long run.
So in summary, it’s been a pretty hectic week all around, got a few people ill (looks like they’ll survive just fine), got some of us in Asia spreading the SAFE Network concepts, patching some network bugs from internal tests and progressing along with the MD implementation and Crust updates for protocol support and doc updates of frontend APIs.
SAFE Authenticator & API
Last week, @happybeing (again we must thank him for his amazing contributions to this project) had hosted the API documentation for safe_app_nodejs for easier access. We have now published the documentation for safe_app_nodejs’ master branch, which we will keep up-to-date with the latest changes.
We’re currently improving the build process for the Node.js library and also the safe_browser. The library will be downloading the native dependency at run time and won’t have to install Rust and build locally to get the native libraries. @lightyear has created a dependency downloader which is a postinstall tool for safe_app_nodejs to download the native library for the current architecture and the platform being installed on.
@lightyear and @shankar have been working on packaging the example app – which depends on safe_app_nodejs – using Electron Forge. This is tested on macOS and we are currently setting it up for use on Windows. We hope to have it integrated soon and test the same across all three platforms. Minor feature updates to the example app have been implemented by @shankar.
Once we have these tested across platforms, it will make the installation of the safe_app_nodejs library seamless. The browser build process will also be simplified with this WIP PR getting merged.
@bochaco, has been working on fixing minor issues in the API along with continued testing. At the same time, @bochaco is working on identifying the helper functions which can be moved to the Rust FFI layer so that the integration with FFI will be easier across languages.
SAFE Client Libs & Crust
Quite a few good things are planned for Crust and Rust networking in general. Carl having solved the issues with the internals of Mio, is planning to work on an async uTP support on top of Mio. We had given him a few pointers to where the protocol was itself, which is implemented in C++ as libutp, and had asked if he could come up with a list of TODOs. He has given us a rough idea of what API he plans to expose in this new crate. We were recently discussing on having a coded multiplexer in the uTP listener logic that functions on just one socket vs spawning multiple sockets per accept. The general consensus is to spawn a new socket per accept instead of using multiplexers. Although multiplexer would have helped us keep number of used descriptors under control, they would have a significant overhead in terms of CPU cycles and could potentially affect performance. Once we have uTP support, we will certainly be much better off in terms of reaching peers. UDP hole punching trumps TCP by a good factor and uTP wraps a UDP socket with reliability and congestion control layers added on top of it.
@nbaksalyar in the meantime has been writing more test cases for Crust and looking into the challenges of setting up a virtual network within a machine to test external reachability and other parts of code, which necessitates peers on non-loopback addresses thus posing challenge. While there are ways on *NIX platforms, Windows as usual needs more looking into and out of the box support seems bleak there. If he does figure it out, it might prove to be highly useful for tests in Crust because they would be closer to the real world and will help us iron out deficiencies if at all observed.
@adam started working on the implementation of MutableData in vaults. We already have this supported in safe_clients_libs via mock vault. Based on the churn handling in the vault library which isn’t catered for in the mock, we might have to optimise or revisit certain components of this implementation. MaidManagers should be easier to deal with than DataManagers just now, and that’s where we have started. Every successful mutation request now would deduct account balance (previously POST was free and went straight to the DataManagers). Also the final cap is based on total number of mutations, not just PUTs as previously was the case. See this post for more info.
@ustulation had intermittently done some NAT-traversal testing with different kinds of routers including Symmetric NATs and observed some expectedly unfriendly router behavior. NAT traversal is a cat and mouse game in which we need to find out and try and trick through the weird behavior of routers. All the observations until now are documented in detail with workarounds for successfully reaching peers in this repository. This is meant to be an independent pluggable crate for robust p2p communication. This is a work-in-progress and current design (and other) details (which are aimed to explain all observations and techniques thoroughly) can be found in the crate docs. Though still in early stages, a somewhat working chat-engine is already written and tested here.
Routing & Vault
As mentioned in the introduction, this week has mainly involved us patching and updating the routing library based on internal tests. Some of the changes that have been applied this week include:
- #1357 adds a test with an increased level of churn.
- #1363 makes the log messages a bit more concise.
- #1370 fixes problems that occur if a candidate (a joining node doing the resource proof) joins a section that is currently merging with another one.
- #1371 addresses another section merge bug.