Profiling node performance

There was no big ‘aha’ moment looking at the timeline, but I’ll record what I saw (this is with no bls signatures or verification, so it’s showing just the data flow rather than the overhead for doing cryptographic operations).

Main observations:

  • The client is doing 2.0-2.5 seconds work very early in the upload process, and the nodes are idle while that’s happening. 2s is a long time considered in isolation as raw cpu-time, and is also a long time relative to the overall upload time (between 5s-25s).
  • The nodes get slower to process each upload. Average of 9.92% slower for each upload. First upload is ~4.8 seconds, last upload is ~24.4s.

Client taking 2s at the start of upload

This timeline shows upload #2 3 and 4. Upload 1 is slightly different so I’ve left it out to focus on the repeated upload pattern.

Nodes #1-11 are displayed from top to bottom. The cli begin/end events are at the bottom with the red lines. There’s a 3 second pause between each upload to try to visually separate them (Gap A and Gap C), so the red lines indicate where the client is active.

Gaps B and D are not expected.

The start of Gap B has a small flurry of activity from 3 nodes. This begins with nodes doing
TransferQuery::GetHistory
and ends with
QueryResponse::GetHistory(Success)
so it seems entirely history related.

Then there’s Gap B that lasts 2.347 seconds. Is the client doing 2+ seconds of work here? Seems hard to believe. It would be good to look into what’s happening on the client during these gaps. For now it’s quite informative to just see the nodes idle.

Then the first activity from the nodes after Gap B is
ClientMessageReceived ... cmd: SequenceWrite::NewSequence, payment: TransferAgreementProof
So it seems during Gap B the client is doing stuff for the TransferAgreementProof.

Another interesting point is the client considers the upload finished just before Gap C but there’s still about 150ms of node activity following it, which is all to do with writing sequence data. Nice to see async doing its thing here.

Gap D is the exact same story as Gap B.

Main takeaway: why is the client taking 2s between gethistory and transferagreementproof? Why does this happen on upload 3, 4, 5+ but not upload 2?


Nodes slow down

Looking at all 20 uploads it’s clear the nodes are getting slower on each upload.

The Gap B style pauses are quite consistent at around 2.0-2.5 seconds, but the overall time for each upload increases because the node processing time getting longer, not with any particular gaps just the whole process is gradually stretching out.

My feeling is this is to do with Sequence data (which is due for replacement by Register data type). In the past I’ve noticed I could get faster performance by minimizing the amount of modification to sequences:

  • One Client and One Sequence With Many Appends - Very slow
  • One Client and Many New Sequences - Less slow
  • Many New Clients and Many New Sequences - Least slow

I’m very interested to test the replacement for Sequence Data when it arrives.

9 Likes