Update 3 November, 2022

Recent updates to how churn events are processed has led to a realisation that the custom process we had in place for the genesis section of a network was no longer required. Less complexity equals less code paths, which brings multiple benefits. @davidrusu explains in a bit more depth below.

General Progress

This week, we’ve gotten further with bidirectional stream use in nodes, so now we have the flow going a complete round trip… from client to elder to adult to elder to client for an ACK message. That is to say, ACKs will now only come in after the data has been written (whereas main has been ACK on receipt of the message at elder… adults were not involved). This neatly evades a whole error class during tests and gives us more confidence in what we’re seeing during data storage at the client.

We’ve also been working hard refactoring away more code complexity. A PR from @anselme tidied up some more DKG work. @roland has more test code cleaned up and @bzee is hard at work updating for the latest quinn crate and changes around using streams therein.

Making the Genesis Section less special

There are a few things that make the very first Safe Network section special, for example it’s the only section who doesn’t have a parent section (obviously). But when we build complex systems, special is not something we want. It’s one more case to think about.

Prior to this week, the way that nodes joined the genesis section had a quirk where node ages were artificially inflated. Nodes joining early on started with a high age and progressively lowered ages for each node joining later.

i.e.

  1. Node A attempts to join with the default node age of 4
  2. The network responds with a Retry(age=97)
  3. Node A starts the join process again with age 97.
  4. The network accepts them.
  5. Node B attempts to join with the default node age of 4
  6. The network responds with a Retry(age=96) (the next node age is stepped lower)

In a stampeding herd situation, you could have many nodes attempting to join at once forcing lots of age synchronization:

  1. Nodes A,B,C,D concurrently attempt to join with the default node ages of 4
  2. The network responds with a Retry(age=97) to all of them.
  3. Nodes A,B,C,D start the join process again with age 97.
  4. Say the network accepts Node A.
  5. Nodes B,C,D will still be attempting to join with age 97, they will need to re-run through the age synchronization logic again

The reason we were doing this was to avoid excessive relocations early on in the network. If you recall, node’s are chosen randomly to be relocated to other sections when a churn event happens. The younger a node was, the more likely they will be chosen for relocation. To avoid having 80% of your section be relocated at once, we had introduced this age stepping behaviour to reduce the likelihood that a relocation would occur.

At some point we changed how churn events are processed to limit the number of nodes that can be relocated at once so that sections can maintain a healthy number of adults.

So now that the reasons behind the age stepping no longer hold, we are able to remove the age synchronization protocol when nodes join the genesis section. This makes the first section behave much closer to subsequent sections with no special code-paths dedicated to it! It should also make node-joins a bit more reliable and faster since we’ve removed one network round trip to synchronize the join age.


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

51 Likes

giphy

23 Likes

Silver bells! That update came out early!

20 Likes

Bronze as normal

3rd-place-bronze

PS Thanks Everyone at Maidsafe for working so hard to make this dream crystalize!

20 Likes

Since the update arrived so early, I think even the fourth place is worth a mention!

16 Likes

I’ll claim 5th then
:fireworks:

Always encouraged reading these updates… thanks to all involved for the hard work.

17 Likes

Do you guys think what I’m thinking?
:airplane: :airplane: :alarm_clock:

14 Likes

I found it, pondered should I take it, went and found the gif and it was still available.
So figured it must be mine :sweat_smile:

13 Likes

Nice update team! Short and sweet. The shorter the code gets the sweeter too I suppose :wink:


Image by Stable Diffusion: “Decentralized Safe Network”

Thanks for your efforts team. This next major testnet is going to be really great I bet.

14 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

13 Likes

Thanks for the early update and the simplification.

Now if I do something daft like try starting baby-fleming with a large no of nodes, I see one node at age 255 and everything else at 5 initially. No more 255 and a countdown from 96 step -2 to 86 and then age 5.

willie@gagarin:~/projects/maidsafe/safe_network$ safe node run-baby-fleming --nodes=53
Creating '/home/willie/.safe/node/baby-fleming-nodes' folder
Storing nodes' generated data at /home/willie/.safe/node/baby-fleming-nodes
Starting nodes to join the Safe network...
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-genesis"
Node PID: 171252, prefix: Prefix(), name: b0fd37(10110000).., age: 255, connection info:
"127.0.0.1:44265"
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-2"
Node PID: 171268, prefix: Prefix(), name: 48aa4d(01001000).., age: 5, connection info:
"127.0.0.1:59061"
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-3"
Node PID: 171291, prefix: Prefix(), name: afa8db(10101111).., age: 5, connection info:
"127.0.0.1:59409"
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-4"
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-5"
Node PID: 171310, prefix: Prefix(), name: 6cff25(01101100).., age: 5, connection info:
"127.0.0.1:54847"
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-6"
Node PID: 171326, prefix: Prefix(), name: 07d7a8(00000111).., age: 5, connection info:
"127.0.0.1:48688"
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-7"
Node PID: 171347, prefix: Prefix(), name: b14673(10110001).., age: 5, connection info:
"127.0.0.1:49101"
(PID: 171307): Encountered a timeout while trying to join the network. Retrying after 5 seconds.
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-8"
Node PID: 171363, prefix: Prefix(), name: 14a0a1(00010100).., age: 5, connection info:
"127.0.0.1:48190"
Starting logging to directory: "/home/willie/.safe/node/baby-fleming-nodes/sn-node-9"
Node PID: 171380, prefix: Prefix(), name: 7722c3(01110111).., age: 5, connection info:
"127.0.0.1:59944"
Node PID: 171307, prefix: Prefix(), name: 24103d(00100100).., age: 5, connection info:
"127.0.0.1:47451"

and examining the sections I get

willie@gagarin:~$ safe networks sections
Network sections information for default network:
Read from: /home/willie/.safe/network_contacts/default

Genesis Key: PublicKey(093b..9669)

Sections:

Prefix '0'
----------------------------------
Section key: PublicKey(0511..c8cb)
Section keys chain: [(PublicKey(093b..9669), 18446744073709551615), (PublicKey(00d5..e951), 4), (PublicKey(1169..4311), 1), (PublicKey(0511..c8cb), 2), (PublicKey(04f8..5b26), 5), (PublicKey(0810..93ee), 6), (PublicKey(02e9..db35), 7), (PublicKey(08a9..10b0), 0)]

Elders:
| XorName  | Age | Address         |
| 089c62.. |   5 | 127.0.0.1:35926 |
| 12bb62.. |   5 | 127.0.0.1:36096 |
| 1cfc59.. |   5 | 127.0.0.1:56296 |
| 3dfa0e.. |   5 | 127.0.0.1:38074 |
| 48aa4d.. |   5 | 127.0.0.1:59061 |
| 6cff25.. |   5 | 127.0.0.1:54847 |
| 7722c3.. |   5 | 127.0.0.1:59944 |

Prefix '1'
----------------------------------
Section key: PublicKey(01dc..cfc9)
Section keys chain: [(PublicKey(093b..9669), 18446744073709551615), (PublicKey(00d5..e951), 4), (PublicKey(1169..4311), 1), (PublicKey(01dc..cfc9), 2), (PublicKey(04f8..5b26), 5), (PublicKey(0810..93ee), 6), (PublicKey(02e9..db35), 7), (PublicKey(08a9..10b0), 0)]

Elders:
| XorName  | Age | Address         |
| 8b4ba3.. |   5 | 127.0.0.1:50408 |
| 9dc5e3.. |   5 | 127.0.0.1:36687 |
| a874c8.. |   5 | 127.0.0.1:47205 |
| afa8db.. |   5 | 127.0.0.1:59409 |
| b14673.. |   5 | 127.0.0.1:49101 |
| c6125d.. |   5 | 127.0.0.1:52349 |
| b0fd37.. | 255 | 127.0.0.1:44265 |

I am of course still trying to break this (cos someone has to) and I’m looking at a possible memory leak. The memory usage is trending upwards right now. This graph of CPU and memory shows the network being initialised, then a spike to 60% as I put BegBlag (as is tradition) and then a much larger spike to 90% as I put a 200MB file of random content.

EDIT: I have no firm ideas what the semi-regular spikes to ~30% total CPU usage represent. These spikes are very roughly 45secs apart.

This may all mean absolutely nothing though. No-one IRL is going to start a 53-node baby-fleming network on a single box.

11 Likes

I think we do have a memory leak here.
This graph shows network initialisation, BegBla and a 200MB single test files and then a dir of ~300 files from 2k up to 3MB.
Even when that put job is finished and the network is “idling”, the memory usage continues to rise.

13 Likes

Its good update you saying? And when we should take luggage to airplane?

7 Likes

Did you all conspire against me this week for an earlier bird special?

Great up date love the that more simplification is taking place can’t wait for the next release so we can hammer out a test net :slight_smile:

15 Likes

Any update on the revisions of the White Paper? ETA?

7 Likes

Yes we did.

Good job, guys! :beers:

7 Likes

Forever? or is it bound?

4 Likes

I was working outside for a while there so no further interaction. Memory usage plateaued at ~88%.
Then I tried a couple of large video files. 230Mb was fine but 592Mb of Rockers1978.mp4 killed it.

I just noticed I had vdash running as a quick way to confirm just how many nodes I had running. I’ll try the same again without vdash - unless there is something else you want me to try?

2 Likes

We were gonna do an update on that, but there are several agreements and decisions from regulators, banks, lawyers and the charities commission involved that it would get messy. So we postponed that for a week or so. In ETA then it should be this year, maybe only a few weeks, but there is always another round of checks from supervisory authorities etc.

Hard for most of us to keep up with all the moving parts there.

16 Likes