Profiling node performance

mav · September 19, 2016, 1:33am

This post is about testing variability and repeatability.

Software Versions

Vault 0.11.0
Routing 0.23.4
Launcher 0.8.0
DemoApp 0.6.0
SafeCore 0.19.0

Changes from default operation

group size: 3
quorum size: 2
upload / storage limits: extremely large
remove one-vault-per-lan restriction

Methodology

Load and start 28 vaults on a network of 7 pine64s.
Create an account using random password / secret.
Upload 655 MiB file via the demo app (ubuntu-16.04-server-amd64.iso).
Record the timing of the upload.
Stop and delete the vaults.
Reboot pine64s and repeat for a total of ten identical tests.

Results

Test  Time (m)
   1      59.5
   2      59.6
   3      54.7
   4      55.5
   5      54.6
   6      52.5
   7      65.6
   8      55.4
   9      51.6
  10      59.9

Min: 51.6
Max: 65.6
Average: 56.9
Median: 55.4
Standard Deviation: 4.2

I was quite surprised by the degree of variation, considering the network is a completely isolated / controlled environment. Factors that may contribute to this variation are

arrangement of nodes relative to each other due to the randomized naming process
entry point to the network for the client due to the randomized login credentials
message routes and message queue length, thus processing demand and delays due to blocking
processing load due to the ‘heavy’ processing nodes being on different vs the same pine64
the Edimax ES-5800M V2 switch being used has three different priorities depending on the physical port on the switch.

Factors that probably do not contribute to variability are

ram vs swap - the 2GB of ram per pine64 is never fully consumed
disk speed - all devices have the same brand / model of microsd
network speed - network cables are the same length and brand of cat6
churn - there should be no network churn during the upload since vault names are the same at the start and end of the test
other running processes - the devices are dedicated to this test with no other processes running (except to keep the os running of course!)

It’s a little confusing why there is so much variation. I assume this is mainly due to differences in the vault names and thus the topology that messaging must negotiate, but it’s hard to know without measuring.

The main takeaway for me is that the effect of changes to the codebase should be measured using averages over multiple tests, since the error on a single test may be quite significant (much much larger than I initially thought).

I did a second test where the file was uploaded, deleted, then reuploaded multiple times. These tests also shows an unusual amount of variation. In this test the vault and client naming is identical between tests, so the messaging patterns between vaults should be very close if not identical. Yet there was still significant variation.

In summary, there’s much less consistency in upload time than I would have expected, which must be considered when measuring the effect of changes to the codebase.

Topic		Replies	Views
SAFE Network Dev Update - July 30, 2020 Updates	34	2557	August 6, 2020
SAFE Network Dev Update - February 20, 2020 Updates	65	3677	April 23, 2020
Testnet Vault Specs / Performance? Development	2	1003	March 3, 2016
SAFE Network - Test 16 Updates	186	14198	April 27, 2017
SAFE Network Dev Update - May 7, 2020 Updates	64	4434	May 14, 2020

Profiling node performance

Software Versions

Methodology

Results

Related Topics