Profiling node performance

mav · March 25, 2021, 12:37am

There was no big ‘aha’ moment looking at the timeline, but I’ll record what I saw (this is with no bls signatures or verification, so it’s showing just the data flow rather than the overhead for doing cryptographic operations).

Main observations:

The client is doing 2.0-2.5 seconds work very early in the upload process, and the nodes are idle while that’s happening. 2s is a long time considered in isolation as raw cpu-time, and is also a long time relative to the overall upload time (between 5s-25s).
The nodes get slower to process each upload. Average of 9.92% slower for each upload. First upload is ~4.8 seconds, last upload is ~24.4s.

Client taking 2s at the start of upload

This timeline shows upload #2 3 and 4. Upload 1 is slightly different so I’ve left it out to focus on the repeated upload pattern.

Nodes #1-11 are displayed from top to bottom. The cli begin/end events are at the bottom with the red lines. There’s a 3 second pause between each upload to try to visually separate them (Gap A and Gap C), so the red lines indicate where the client is active.

Gaps B and D are not expected.

The start of Gap B has a small flurry of activity from 3 nodes. This begins with nodes doing
TransferQuery::GetHistory
and ends with
QueryResponse::GetHistory(Success)
so it seems entirely history related.

Then there’s Gap B that lasts 2.347 seconds. Is the client doing 2+ seconds of work here? Seems hard to believe. It would be good to look into what’s happening on the client during these gaps. For now it’s quite informative to just see the nodes idle.

Then the first activity from the nodes after Gap B is
ClientMessageReceived ... cmd: SequenceWrite::NewSequence, payment: TransferAgreementProof
So it seems during Gap B the client is doing stuff for the TransferAgreementProof.

Another interesting point is the client considers the upload finished just before Gap C but there’s still about 150ms of node activity following it, which is all to do with writing sequence data. Nice to see async doing its thing here.

Gap D is the exact same story as Gap B.

Main takeaway: why is the client taking 2s between gethistory and transferagreementproof? Why does this happen on upload 3, 4, 5+ but not upload 2?

Nodes slow down

Looking at all 20 uploads it’s clear the nodes are getting slower on each upload.

The Gap B style pauses are quite consistent at around 2.0-2.5 seconds, but the overall time for each upload increases because the node processing time getting longer, not with any particular gaps just the whole process is gradually stretching out.

My feeling is this is to do with Sequence data (which is due for replacement by Register data type). In the past I’ve noticed I could get faster performance by minimizing the amount of modification to sequences:

One Client and One Sequence With Many Appends - Very slow
One Client and Many New Sequences - Less slow
Many New Clients and Many New Sequences - Least slow

I’m very interested to test the replacement for Sequence Data when it arrives.

Topic		Replies	Views
SAFE Network Dev Update - July 30, 2020 Updates	34	2565	August 6, 2020
SAFE Network Dev Update - February 20, 2020 Updates	65	3693	April 23, 2020
Testnet Vault Specs / Performance? Development	2	1005	March 3, 2016
SAFE Network - Test 16 Updates	186	14239	April 27, 2017
SAFE Network Dev Update - May 7, 2020 Updates	64	4446	May 14, 2020

Profiling node performance

Related Topics