Ed25519 vs bls performance

SAFE network uses two types of asymmetric keys, ED25519 keys and BLS keys.

Sometimes the choice may be due to functionality, eg multisig is built in to bls whereas ed25519 requires custom code to do multisig.

Sometimes the choice may be due to performance (otherwise why not just use bls everywhere?).

Out of curiosity I had a look at how much difference there is between the two for signing and verifying.

All values are in microseconds per signature/verification, and is the average of 1000 samples. BLS is using a single keypair, ie not requiring any combining of signature shares, as a standard safecoin wallet would be doing.

BLS ED25519 BLS/ED25519
Sign 32 bytes 2220.78 19.36 114.72
Sign 1 MiB 5822.96 5826.58 1.00
Verify 32 bytes sig 5983.90 50.52 118.45
Verify 1 MiB sig 9478.69 2624.92 3.61

Signing small messages, BLS is about 100 times slower.

Signing large messages, they are identical, which was a big surprise to me.

Verifying signatures of small messages BLS is about 100 times slower.

Verify signatures of large messages BLS is about 3 times slower.

The info is I suppose not produced with any intention in mind, but I thought it was interesting to see the difference so I’ll put the figures here.

30 Likes

Me too, I had thought bls always slower. BTW @mav were these done using the libs we are using? I know some BLS impl’s are quite a bit slower again.

13 Likes

Yes.

Code is here: GitHub - iancoleman/ed25519_vs_bls_key_benchmarks: Compare performance of ed25519_dalek and threshold_crypto bls keys.

ed25519-dalek = "1.0.0"
threshold_crypto = "0.4.0"

Not sure if there’s any significant difference between ed25519-dalek 1.0.0 vs 1.0.0-pre.3 as used in maidsafe/routing, but I assume not.

Would be interested in others results.

14 Likes

Here are my results (similar to yours):

BLS ED25519 BLS/ED25519
Sign 32B (µs) 3428.03 27.91 122.84
Sign 1MiB (µs) 8354.38 7677.05 1.09
Verify 32B (µs) 8492.67 73.78 115.11
Verify 1MiB (µs) 13616.37 3924.91 3.47
15 Likes

As a random question, what does this mean for the ultimate performance of SAFE? How many verify operations are actually carried out? With a naiive reading, I see 8ms/verify32 = ~ 120 verifies/second per core? How many is a vault actually performing?

6 Likes

Would be good to know. For any individual vault it depends on

  • is elder or not
  • how many other nodes it’s connected to, ie network size and section size
  • the current rate of GET / PUT from clients
  • the amount of churn happening

So it’s a complex equation.

I’d be interested to see this info logged at some point so we can get some idea of it in testnets.

8 Likes

Hmm… If this is going to be some kind of a bottleneck, is it possible to make ASICs for these signature processes?

2 Likes

IMO This shows we should consider BLS a lot more. For multisig it’s great, however when not multisig then we should be using ed keys. Right now clients are all BLS, but that may need to change. Not a huge deal (it’s just another algo) so Signature for us may need to become

pub enum Signature {
  ED25519(EDSig), 
  BLS(BlsSig),
}

So not a showstopper in any sense. It’s things like these checks @mav and others do that allow us to be significantly more efficient. We knew bls is much slower, but seeing it laid out is a compelling picture and stops us from ignoring that fact. Many client actions will be small messages so it makes sense, if a data item is individually owned then we use ed keys, otherwise, we use BLS keys. So all BLS use will be multisig and never individual key use, which is actually nice.

21 Likes

I thought that its typically the hash of a message youre signing, not the message itself, because of the slowness of pubkey crypto? But BLS seem not to be that bad for large msgs.

3 Likes

Yes, good point.

bls threshold_crypto uses sha3-256 for signatures

ed25519-dalek uses sha2-512 for signatures

https://github.com/dalek-cryptography/ed25519-dalek/blob/master/src/secret.rs#L406

I added these two hash functions to the test, these are the results

SHA2-512 SHA3-256 SHA2 / ED25519 Sign SHA3 / BLS Sign
32B (µs) 0.358 0.494 0.0194 0.0002
1 MiB (µs) 2607.408 3477.901 0.5081 0.6141

For small messages signature hashing is a tiny portion of the total time, less than 2%.

For large messages signature hashing is around 50-60% of the total time.

Added to the benchmarking tool in commit 4d64af4.

5 Likes

Should you be comparing sha2-512 to sha3-512?

The idea is to see what portion of the signing time is spent doing the hash, so it must be using the underlying hash in the implementations.

For bls (GitHub - poanetwork/threshold_crypto: A pairing-based threshold cryptosystem for collaborative decryption and signatures used in HoneybadgerBFT implementation) this is sha3_256

For ed25519 (GitHub - dalek-cryptography/ed25519-dalek: Fast and efficient ed25519 signing and verification in Rust.) this is sha2-512

So the comparison, despite being different lengths, is valid since it allows us to see the portion of time spent during each signature on the hashing.

This made me wonder though if we could remove the dependency on sha2-512 and use only sha3-256, since ed25519-dalek has a function sign_prehashed. But then I saw within that functin sha2-512 is called anyhow so there’s no avoiding it. Nice little comment to go with it:

10 Likes

There is this FPGA implementation for BLS12-381 by Ben Devlin

In the architecture docs they describe performance characteristics:

performance was benchmarked vs the Rust bls12_381 crate on a 32GB, 3.7GHz i5-9600K CPU. FPGA is running at 200MHz. … see a 2.9x speedup in the final ate pairing.

I would have thought more than 2.9x speedup. They list 4 points for Future Optimizations, but anyway, this was an interesting result to me, I would have expected greater speed improvements.

Video presentation on the library on youtube: Ben Devlin | Blockchain Acceleration Using FPGAs.

4 Likes

supranational have a bls implementation with rust wrappers, it’s “a BLS12-381 signature library focused on performance and security. It is written in C and assembly.”

https://crates.io/crates/blst

I added this to the benchmarks (commit d71b596), it’s fast.

(TC is threshold_crypto currently used by sn_node etc, BLST is supranational blst crate)

Test TC BLST TC/BLST
Sign 32 bytes (µs) 2379.36 192.81 12.34
Sign 1 MiB (µs) 5934.10 2736.71 2.17
Verify 32 bytes sig (µs) 6235.49 846.80 7.36
Verify 1 MiB sig (µs) 9937.81 3471.24 2.86

Signing small messages blst was about 12x faster than threshold_crypto

Verifying small messages blst was about 7x faster than threshold_crypto

Signing and verifying large messages blst was about 2-3x faster than threshold_crypto

There are two modes for blst, one for minimal-pubkey-size operations and one for minimal-signature-size operations. This test was done with minimal-signature-size operations, although I did test with minimal-pubkey-size operations and it was a little slower but still much faster than threshold_crypto.

Be aware this is a very basic test and group multisig operations haven’t been tested, but on the first impression it seems pretty fast.

12 Likes

ed25519-dalek has support for avx2 backend which speeds up signature verification about 1.5x

See https://github.com/dalek-cryptography/ed25519-dalek#benchmarks

Results from my own tests:

Veryify 32B with / without avx2: 36,452 ns / 57,232 ns

Veryify 1 MB with / without avx2: 2,359,297 ns / 2,818,746 ns

Using avx2 is up to 1.57x faster signature verification on my laptop.

It was a bit messy for me to get working, I couldn’t get the --features=avx2_backend flag to work since it’s a curve25519-dalek feature not an ed25519-dalek feature.

Potentially an easy win to implement at some stage since the default is not to use avx2.

11 Likes

I have not looked, but is this an smid type thing where we just need architecture feature gates?

Seriously @mav thanks for all you do here. It’s like having our own research department. I wish we could pay you somehow!

11 Likes

Yes it’s simd, via packed_simd. However:

WARNING: this crate [packed_simd_2] only supports the most recent nightly Rust toolchain and will be superceded by stdsimd.

There’s some crates.io quirks meaning packed_simd has become packed_simd_2, see packed_simd/issues/303

We cannot [push to crates.io packed_simd]. We lack the crates.io permissions. The original maintainer is out of contact.

If you adjust your dependency entry like this it should pick up the crate we published under an alternative name.

packed_simd = { version = "0.3.4", package = "packed_simd_2" }

Also see curve25519-dalek/3.1.0/Cargo.toml L49-51:

# The original packed_simd package was orphaned, see
# https://github.com/rust-lang/packed_simd/issues/303#issuecomment-701361161
packed_simd = { version = "0.3.4", package = "packed_simd_2", features = ["into_bits"], optional = true }

To get avx2 working on my machine I changed my local copy of ed25519-dalek Cargo.toml like this:

-curve25519-dalek = { version = "3", default-features = false }
+curve25519-dalek = { version = "3.1.0", default-features = false, features = [ "avx2_backend" ] }

Also to build with avx2 I had to use rust nightly which is not really ideal. Stable gives this error:

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /home/ian/.cargo/registry/src/github.com-1ecc6299db9ec823/subtle-2.2.3/src/lib.rs:12:34
   |
12 | #![cfg_attr(feature = "nightly", feature(external_doc))]
   |                                  ^^^^^^^^^^^^^^^^^^^^^

Good to keep a record of this messing about while it’s fresh.

15 Likes

Great to see your expertise in this area being shared, @mav ! I always enjoy seeing your posts and the positive impact it has on the project.

3 Likes

Is it possible to get stats for how many sigs vs how many pubkeys are sent over the wire?

I’d imagine it would be many more sigs than pubkeys.

The two modes give:

mode pk bytes sig bytes
minpk 48 96
minsig 96 48

blsttc uses minpk because that’s the same as what threshold_crypto implements for their pubkeys and sigs (threshold_crypto doesn’t have a minsig mode; all public keys are 48 bytes and all sigs are 96 bytes in threshold_crypto)

So if we send many more sigs than pubkeys it might be worth using minsig. Or if we send many more pubkeys than sigs it might be worth using minpk. Or if they’re about even then it makes no difference.

Curious to know if there’s room for saving bandwidth by using one mode over the other? Probably only a tiny improvement but anyway figured I’d point it out. Might be interesting to measure this on a testnet.

Need to also consider dkg has been developed and tested using minpk (48 byte pubkeys) and may not necessarily work the same way for minsig (96 byte pubkeys).

An inconsequential point to note is the secret key stays the same for both modes.

5 Likes

They are almost entirely even as we include the pub key with the sig in all cases. We need to check it’s 100% but I think it is and has to be as the id of the node etc. is a pub key and for data we sign including pub key, but we perhaps don’t need to as the data itself defines the key(s) that must be used to sign, paging @bochaco and @chris.connelly

8 Likes