I wanted to see if nodes get in each others way when doing 8-core-running-11-nodes, so I fired up some 16-core aws virtual machines which should leave some cores free (although now with multithreading in node that’s probably not true?!)
In summary, yes nodes do tend to get in the way of each other. Running 1 node per vCPU seems about the best ratio.
To clarify some terminology, a vCPU is not necessarily one cpu core. eg on my laptop I have 4 physical cores but with hyperthreading I get 2 vCPUs per physical core, so that’s a total of 8 vCPUs. This shows as 8 individual cpu graphs in task manager.
The basic test is:
Run baby fleming with node v0.49.8 and 11 nodes
Upload 10 MiB file using sn_cli v0.29.2
A1 ARM Procesors
Firstly, the A1 series of processors. From aws instance types: A1 instances are the first EC2 instances powered by AWS Graviton Processors that feature 64-bit Arm Neoverse cores and custom silicon designed by AWS.
Since maidsafe does not put out releases for arm architecture, I built the code on the first vm I started then copied those binaries to each of the other vms for the rest of the tests.
Type | Time (s) | vCPUs | RAM (GB) |
---|---|---|---|
a1.medium | >600 | 1 | 2 |
a1.large | >600 | 2 | 4 |
a1.xlarge | 82.705 | 4 | 8 |
a1.2xlarge | 42.559 | 8 | 16 |
a1.4xlarge | 31.138 | 16 | 32 |
a1.metal | 29.645 | 16 | 32 |
A tangential observation, I could not build with musl on arm. The ring
crate was throwing an error, didn’t dig into it though, maybe one to look into later.
Command to try to build for aarch64 musl was
cargo build --release --target aarch64-unknown-linux-musl
M6G ARM Processors
From AWS: “deliver up to 40% better price/performance over current generation M5 instances and offer a balance of compute, memory, and networking resources for a broad set of workloads.”
Type | Time (s) | vCPUs | RAM (GB) |
---|---|---|---|
m6g.medium | >600 | 1 | 4 |
m6g.large | >600 | 2 | 8 |
m6g.xlarge | 63.871 | 4 | 16 |
m6g.2xlarge | 35.680 | 8 | 32 |
m6g.4xlarge | 25.345 | 16 | 64 |
m6g.8xlarge | 25.091 | 32 | 128 |
m6g.12xlarge | 25.315 | 48 | 192 |
m6g.16xlarge | 25.512 | 64 | 256 |
m6g.metal | ? | 64 | 256 |
I couldn’t ssh into m6g.metal for some reason so there’s no result.
Once we get to 16+ cores the time stays pretty stable, shows that 11 nodes on 8 or less vCPUs is hitting some cpu bottlenecks.
M5 X86 Processors
“the latest generation of General Purpose Instances powered by Intel Xeon® Platinum 8175M processors. This family provides a balance of compute, memory, and network resources, and is a good choice for many applications.”
Type | Time (s) | vCPUs | RAM (GB) |
---|---|---|---|
m5.large | 113.684 | 2 | 8 |
m5.xlarge | 57.282 | 4 | 16 |
m5.2xlarge | 30.285 | 8 | 32 |
m5.4xlarge | 17.201 | 16 | 64 |
m5.8xlarge | 14.951 | 32 | 128 |
m5.12xlarge | 13.844 | 48 | 192 |
m5.16xlarge | 13.952 | 64 | 256 |
m5.24xlarge | 13.719 | 96 | 384 |
m5.metal | 14.125 | 96 | 384 |
And yes, all 96 cores are used, shown here:
Improved bls lib
Following on from this post which says “3. Integrate a faster threshold_crypto” I thought let’s see what that’s like on the fastest platform the m5.24xlarge
m5.24xlarge gives 8.633s for the new lib vs 13.719s for the old one, quite a lot faster.
I also happened to test m5.metal which gave 12.422s for the new lib vs 14.125s for the old one.