Updates from MaidSafe HQ

Note: Non-Update Posts Will Be Deleted! :dragon:

We think this topic works best as a list of updates without replies or discussions, so please do not post here - if you want to say something or discuss anything, please use the Pre-Dev-Update Thread! Yay! :smiley: topic or start a new topic for a specific discussion.


Privacy. Security. Freedom

12 Likes

Update from MaidSafe HQ

No testnet v5 this week everyone.

I know I’ve said this a few times now…Anti Entropy is almost there! But there are still a couple of stubborn issues remaining and the team are hard at work to resolve.

While digging into these issues in detail over the last few days we’ve also found several inefficiencies and other small fixes, so you’ll be seeing a few other merges go in while we work on the bigger fixes.

Thanks everyone.

58 Likes

Update from MaidSafe HQ

:bell: Fleming testnet v5 release will be tomorrow morning :bell:

However, we’ve decided to hold Anti Entropy(AE) back. We’ve been resolving AE bugs for the last few days as you know, but we’re still finding others the more we test it.

There are several other new features & improvements that we are eager to have you use and give us feedback on, so we’ve decided to go ahead and release those.

Most of the team are finished for the day so let’s start fresh tomorrow with a full day of support available.

Watch out for the announcement in the morning! (BST!)

65 Likes

Update from MaidSafe HQ

No further testnet iterations this week.

Our engineers are working on three fronts at the moment:

  • First of all we are investigating and implementing several further improvements to network messages, including looking at parallel processing at sn_node. We believe this will help resolve the node OOM issue seen with testnet v5 after a few hours running.

  • Secondly, we continue to test the Anti Entropy implementation, now merged upstream. Once we’ve ironed out a few more glitches with this we will hopefully be confident enough to declare this feature finished. We also believe that Anti Entropy may help ease some stress on the nodes which caused yesterday’s OOM issue - to be confirmed.

  • Finally, work continues in the background on DBCs and their integration into our crates.

Have a Safe weekend everyone!

54 Likes

Update from MaidSafe HQ

No testnet iteration this week.

Our engineers continue to work on the same 3 fronts as the previous comment above - OOM bug, Anti Entropy, and DBCs. We are not expecting any of these to be resolved/completed this week.

Bamboo Garden Fund update to come today - to be posted in a new topic. :bamboo:

Thanks everyone

44 Likes

Update from MaidSafe HQ

We’ve made good progress in narrowing down the main cause of the OOM issue seen with v5 and are testing out fixes. We have realised that we are unnecessarily reserialising the whole msg for every dst update - this is causing a huge memory spike in the nodes. We are in the process of changing this. You may have also noticed several other message efficiency improvements over the previous few days - we’re working to improve node throughput so the memory in question is not held for as long, and to reduce the number of messages across the network as a whole, with a simplified routing layer filter being our target there. We must add that even once all this is in place we don’t believe we are finished with improvements in this area just yet, with various other improvements being debated amongst the team, but we hope these changes will result in more robust testnet iterations in the meantime.

With Anti Entropy we believe we’re just about there with what we think are the last couple of bug and test fixes being finalised.

Once the OOM and Anti Entropy fixes are in place we intend to do some intensive testing in house. If this goes well then we may be in with a chance of a testnet v6 later tomorrow (Thursday), but there’s a lot of variables there and it only takes one bug to set that back.

As always, we’ll do our best to keep you up to date.
Thanks everyone

82 Likes

Update from MaidSafe HQ

No new testnet iteration this week.

We’ve had a few key team member absences today which has meant we’ve been unable to round off a couple of remaining tasks before being able to begin internal testing.

We wouldn’t want to release on a Friday evening (GMT) so even if we managed to spend all day tomorrow (Friday) testing, we will rule out a testnet release this week.

This weeks progress update then remains as per the post above. We believe we’ve found the main cause of the OOM issue and are trying out various different changes to test how much difference they make, while also implementing several other changes to improve network message efficiency. We believe we’re also finalising our Anti Entropy implementation, with the final fixes expected to be merged tomorrow, allowing us to test the full flow.

Note that we are not intending to include DBCs in Fleming testnet v6. Progress with replacing transfers with DBCs in sn_node is going well, see draft PR here, but we estimate we’re still a few working days away from finishing this, plus extra time to test after that means that this would likely delay OOM and AE fixes being released.

42 Likes

Wee update on our AT2/DBC GitHub - maidsafe/sn_dbc: Safe Network DBCs

Currently unoptimised (no batch sigs etc.) mint show us at circa 19,000 tps. We think for a data network this is not too shabby at all :slight_smile: This is a good start and should clear the way for micropayments at a significant scale. On launch, we will of course have a mint per section, so this value will scale with the network, which feels natural.

@danda has been working on a stand-alone mint which looks great and will allow this side of things to be tested independently. Also great for everyone to play with. We may set up a DO mint, obviously, that’s untrusted, but will show the process and start allowing devs to poke around the mechanics of the economy. He will likely post the mint code soon, on the forum (if yer sneaky you would see it in his repo somewhere (GitHub - dan-da/sn_dbc_mint :wink: Hang on though it needs a couple of tweaks). It’s all starting to look very user friendly and efficient.

In terms of the mem consumption, we have had a few breakthroughs. The number of messages was way way too high and that is being addressed, some great progress from @Josh there today and @qi_ma is gathering a PR to cement some fixes. @bochaco has started a great move to pull all messages from routing. @oetyng is on DBC integration, routing re-org (for readability) and @lionel.faber hammering concurrency support with @yogesh almost well again, completing Ae. A good week here and lots to smile about.

90 Likes

Update from MaidSafe HQ

No testnet today everyone. This afternoon our internal testing has highlighted a bug where CLI commands are intermittently hanging and have to be manually killed. We’ll need some more time to fully investigate and resolve this.

We will update with progress tomorrow.

Thanks for your patience.

52 Likes

Update from MaidSafe HQ

We’ve postponed Fleming testnet v6 release until next week.

We believe we’ve identified and now fixed the hanging issue, which was being caused by connections between nodes being dropped unnecessarily. We’re currently engaged in some final testing in house to fully verify this as resolved.

However, given the fact that it’s Friday, and with many of the team already finished for the week, we’ll spend the rest of the day testing thoroughly and preparing for a release next week.

There are a couple of other improvements we intend to make in the meantime, not blockers to v6 proceeding, but now that we have some extra time we will be making a couple of other modifications to fix minor issues and inconsistencies we’ve spotted along the way.

Have a great weekend everyone!

81 Likes

Update from MaidSafe HQ

Yesterday we reproduced a “hanging” issue again, so the “fix” on Friday may have resolved something, but this proved that there was still something else at large.

Through investigations last week we also merged a fix for a potential cache dead-lock/hang here, which has probably been causing intermittent hanging in some scenarios.

Today we have also found and fixed here, another bug which has been causing hanging - this switch to write to make this single threaded has certainly resolved a load of issues being caused by nodes not being in sync. A follow up PR going through peer review here to fix concurrency should improve this further.

So is that all hangs resolved?
Well we are not 100% sure! Only testing will confirm. However, we suspect there is still an underlying issue lurking somewhere in qp2p which has also been causing hangs. We think the fixes and changes made over the last few days should have made things more reliable.

We’re testing these latest changes in-house now and if we feel it is reliable enough then we would push for a v6 testnet release (probably tomorrow at the earliest).
We’ve also been working in parallel to only accept nodes as Adults if they are reachable, which is in the final stages of development, so v6 will now wait for this to be added. Again, we feel this will make the network much more stable.

Thanks everyone.

75 Likes

Update from MaidSafe HQ

Good news - we seem to have resolved all major hanging issues that have been pestering us for this last week :tada:

We’ve been thoroughly testing the latest fixes for the majority of the day now and have had a clean bill of health. We can’t guarantee that there won’t be other “hanging” bugs yet to be discovered, particularly once we get a larger number of people connected to it and connecting/disconnecting their nodes, but it seems that standard usage of the testnet is looking great.

No testnet today though as it seems the fixes & improvements we’ve added over the last few days have introduced a few other, thankfully relatively minor issues. For example, the 3 minute loop to retry connecting a node to the testnet when it’s not accepting new nodes, is currently not working. Also, we’ve found a bug in chunk replication, with resolution PRs here and here currently under peer review.

We’ll be sweeping through these minor issues and checking to see if anything else needs attention.

Thanks everyone!

64 Likes

Yip bummer, but lots happening as usual. I see this as the longest gap between testnets though and know how everyone feels. It’s not a lot to fix and we did kill some nice bugs.

Dbc work progressing now though. Also a very important re-org of message types and data flows. That’s a nice place to be now as we turn each flow into a story that anyone can read. Simplicity is happening.

65 Likes

Update from MaidSafe HQ

Quick update today - we’ve been fighting against a few last minute issues which are holding us back from releasing a testnet today. Time has run away from us now so we’ll continue with the fixes tomorrow and see if we can get testnet v6 out to the community.

Apologies for the delay everyone, we’re very close!

65 Likes

Update from MaidSafe HQ

We’ve decided to skip the proposed 6.3 testnet for now, which would’ve contained a few minor fixes atop 6.2.

We’ve applied fixes for repeated qp2p-port-openings at nodes; and community members who were seeing illegal operations from the new blsttc bls implementation are able to build newer nodes, so things are looking good there.

Meanwhile we’re powering ahead with some larger changes: shifting to the tracing lib for logging, getting DBCs in (and transfers out); upload concurrency and making the network’s messaging interface more streamlined.

Folk should be able to test things out with newer versions of safe_network crate if they’re keen. But otherwise we’ll be looking to do a new testnet atop some of these larger changes as soon that makes sense! :+1:

46 Likes