Update 08 September, 2022

You set up a test network, you upload some data, something goes wrong. What happened and where exactly did it occur? Tracking down where problems arise in networks is a tricky challenge, particularly in decentralised networks where each node is an individual. This week @davidrusu walks us through statemaps, a diagnostic tool that shows us exactly which state each node is in at any point in time. It’s a god’s eye view into the network which will undoubtedly make bug squashing a lot easier.

Thanks as always to everyone experimenting with local networks and comnets. We’re reasonably convinced that some of the issues with uploading large files are at the API layer, and we’re looking at that now.

General progress

@bochaco continues to refine the error reporting process to provide more meaningful messages to clients.

@anselme is looking at AE and gossip and how one can be a fallback for the other in case of communication failures.

On documentation, @jimcollinson is finalising the main whitepaper required by the Swiss authority FINMA. It’s an overview rather than a technical deep dive, so probably nothing new for most folks here, but ticking off those legal checkboxes ready for launch nonetheless.

@Chriso and @bochaco are tidying up what happens when a DBC is submitted for reissue, and in checking that process it’s been found that some spent proofs were signed with a section key that the section processing the reissue request is not aware of.

Statemaps

In a highly concurrent system, it can be very difficult to see what’s going on. Nodes move through states incredibly quickly and trying to correlate messages across nodes can feel like you’re trying to recover a shredded document.

Statemaps let us recreate a partial picture of what happened in a network after-the-fact. They’ve been a very useful tool in understanding where nodes are spending their time.

We’ve instrumented the sn_node code base to log when it enters a state and again when it leaves a state. We can then process those logs to generate a statemap like the one below:


Each row corresponds to a node, with time on the x axis. The rectangles on each row correspond to the state that node was in during that time interval.

Each state is assigned a colour:

By analysing this statemap, you can begin to understand what happened here, let’s label the various phases and talk through them.

  1. We can see the map starts with 6 Elders voting on membership (salmon)
  2. After membership completes, they immediately kick off DKG (orange). This should be a hint that there will be a change of Elders.
  3. Meanwhile, we see that a 7th node comes online. He receives an AntiEntropy (light blue) update letting him know that he’s been accepted in the network, and then joins in on the DKG (orange). This would suggest that this new node that just joined is being promoted to Elder status and that this is why the original 6 Elders started DKG.
  4. Now we see DKG has stalled, this is because DKG requires total participation to complete, the existing 6 nodes have all contributed their parts but they need the 7th node to put in their share to complete the section key.
  5. Eventually the 7th node catches up and DKG completes. Next step is to have the old 6 Elders verify the new section key is valid and to Handover (dark blue) control to the new 7 Elders.
  6. After Handover completes, we see a burst of Anti-Entropy being sent out, presumably with the new SAP, showing that the new elders have taken control of the section.

We’ve developed a bit of tooling around these statemaps, the Safe Network README has instructions for generating your own.

To ease in development, we’ve also configured CI to automatically generate and upload statemaps as well for each PR.

We hope you find these maps enlightening, happy splunking!


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

54 Likes

One two three first

Back in the game now to read :slight_smile:

15 Likes

Am I second for the first time?

16 Likes

I am bumping my head down here!

Would NODE_COUNT=15 RUST_LOG=sn_node=trace cargo run --features statemap --release --bin testnet be the way to start a local testnet?
I have ripgrep, and statemap installed but my safe_states.out remains empty no matter what I try.

13 Likes

I am bumping my head down here!

We hit our heads on this one a few times as well! :slight_smile: The statemap logs come from sn_interface, try changing you’re RUST_LOG filter to:

RUST_LOG="sn_node=trace,sn_interface=trace"
13 Likes

Bingo! :sunglasses: armed and dangerous I am going back in!

@davidrusu
definately progress safe_states.out: 145204 records processed, 145189 rectangles
but…

I have not put anything or used the network but am guessing that I should still see activity?

15 Likes

What was the statemap command you used to generate the svg? Make sure to copy and paste the fresh statemap command from the preprocess script.

The -b <start> -e <end> time span is important and changes with each run

Alternatively run the preprocess script with --run-statemap to automatically generate the SVG with the correct time bounds.

14 Likes

nailed it two in a row, thanking you sir!

16 Likes

Don’t fancy caves, nor the evil in them that never sleeps. So, I will pass on the splunking. Good luck to the dauntless fellowship of the nodes (volunteer Comnet and Baby-Fleming testers)!

14 Likes

Thanks to all for the update.

Statemaps looks like being an incredibly useful tool. Thanks to @davidrusu and @Josh for asking and answering the Qs I would have had.
I would say I will get torn into this right away but I have been invited to a Smiths-themed party tonight. It was meant to be at a Korean restaurant in Glasgow but we had to cancel as they dont serve corgis, Excessive drinking and jollification will ensue in any case.
Its been quite a week - first Celtic hump the Queens XI 4-0 on Saturday, then we played Real Madrid and only got beat 3-0, then the Rangers got humiliated 4-0 again off Ajax last night and then the Queen died.
Any one of these would be cause for celebration.

I might get back to SAFE things later

14 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

10 Likes

I just learned the queen died here from reading your comment Southside. I’m thrilled to be so out of the loop, and that it was you that broke the news to me.

I will now proceed to continue my evening as if nothing important happened, as I continue to await something important - the Safe Network. Another great update, there’s been some very enjoyable ones recently…

I think the word might be out amongst the bugs that the heat is on, I can smell bug-cooking!

15 Likes

I understand that reading Southside’s comment was the cause of her death!

7 Likes

This is going to be so useful! I’m really fond of visual information, much more tangible and easy to see order than logs :sunglasses:

11 Likes

Nice to see a graphical representation like that (the Statemap).

12 Likes

The story up here is that the cause of death was seeing the Rangers (aka The Queens XI) get beat 4-0 off Celtic on Saturday and then 4-0 off Ajax on Tuesday night.

5 Likes

Thx 4 the update Maidsafe devs

Just when i think i understand 1% of the SAFE Network, there is
something like statemaps, reducing my understanding to 0.001%. My clueless
jaw drops, seeing the descriptions of what the nodes do, these will become college topics…

Thank you Maidsafe devs for the extreme care you take to deliver this network, this is a true labor of love.

:clap: :clap: :clap: To the community members who stubbornly keep testing the testnets, providing valuable feedback

Do keep hacking super ants :stuck_out_tongue_closed_eyes:

18 Likes

Thank you for the heavy work team MaidSafe! I add the translations in the first post :dragon:


Privacy. Security. Freedom

12 Likes

Nice. Still checking in after a long summer vacation and seeing progress, and dev and QA tools open to the community. Keep on it team.

10 Likes