Profiling node performance

mav · June 25, 2021, 6:31am

I wanted to see if nodes get in each others way when doing 8-core-running-11-nodes, so I fired up some 16-core aws virtual machines which should leave some cores free (although now with multithreading in node that’s probably not true?!)

In summary, yes nodes do tend to get in the way of each other. Running 1 node per vCPU seems about the best ratio.

To clarify some terminology, a vCPU is not necessarily one cpu core. eg on my laptop I have 4 physical cores but with hyperthreading I get 2 vCPUs per physical core, so that’s a total of 8 vCPUs. This shows as 8 individual cpu graphs in task manager.

The basic test is:

Run baby fleming with node v0.49.8 and 11 nodes

Upload 10 MiB file using sn_cli v0.29.2

A1 ARM Procesors

Firstly, the A1 series of processors. From aws instance types: A1 instances are the first EC2 instances powered by AWS Graviton Processors that feature 64-bit Arm Neoverse cores and custom silicon designed by AWS.

Since maidsafe does not put out releases for arm architecture, I built the code on the first vm I started then copied those binaries to each of the other vms for the rest of the tests.

Type	Time (s)	vCPUs	RAM (GB)
a1.medium	>600	1	2
a1.large	>600	2	4
a1.xlarge	82.705	4	8
a1.2xlarge	42.559	8	16
a1.4xlarge	31.138	16	32
a1.metal	29.645	16	32

A tangential observation, I could not build with musl on arm. The ring crate was throwing an error, didn’t dig into it though, maybe one to look into later.

Command to try to build for aarch64 musl was
cargo build --release --target aarch64-unknown-linux-musl

M6G ARM Processors

From AWS: “deliver up to 40% better price/performance over current generation M5 instances and offer a balance of compute, memory, and networking resources for a broad set of workloads.”

Type	Time (s)	vCPUs	RAM (GB)
m6g.medium	>600	1	4
m6g.large	>600	2	8
m6g.xlarge	63.871	4	16
m6g.2xlarge	35.680	8	32
m6g.4xlarge	25.345	16	64
m6g.8xlarge	25.091	32	128
m6g.12xlarge	25.315	48	192
m6g.16xlarge	25.512	64	256
m6g.metal	?	64	256

I couldn’t ssh into m6g.metal for some reason so there’s no result.

Once we get to 16+ cores the time stays pretty stable, shows that 11 nodes on 8 or less vCPUs is hitting some cpu bottlenecks.

M5 X86 Processors

“the latest generation of General Purpose Instances powered by Intel Xeon® Platinum 8175M processors. This family provides a balance of compute, memory, and network resources, and is a good choice for many applications.”

Type	Time (s)	vCPUs	RAM (GB)
m5.large	113.684	2	8
m5.xlarge	57.282	4	16
m5.2xlarge	30.285	8	32
m5.4xlarge	17.201	16	64
m5.8xlarge	14.951	32	128
m5.12xlarge	13.844	48	192
m5.16xlarge	13.952	64	256
m5.24xlarge	13.719	96	384
m5.metal	14.125	96	384

And yes, all 96 cores are used, shown here:

Improved bls lib

Following on from this post which says “3. Integrate a faster threshold_crypto” I thought let’s see what that’s like on the fastest platform the m5.24xlarge

m5.24xlarge gives 8.633s for the new lib vs 13.719s for the old one, quite a lot faster.

I also happened to test m5.metal which gave 12.422s for the new lib vs 14.125s for the old one.

Topic		Replies	Views
SAFE Network Dev Update - July 30, 2020 Updates	34	2556	August 6, 2020
SAFE Network Dev Update - February 20, 2020 Updates	65	3677	April 23, 2020
Testnet Vault Specs / Performance? Development	2	1003	March 3, 2016
SAFE Network - Test 16 Updates	186	14198	April 27, 2017
SAFE Network Dev Update - May 7, 2020 Updates	64	4434	May 14, 2020

Profiling node performance

A1 ARM Procesors

M6G ARM Processors

M5 X86 Processors

Improved bls lib

Related Topics