Ed25519 vs bls performance

mav · September 8, 2020, 9:53am

SAFE network uses two types of asymmetric keys, ED25519 keys and BLS keys.

Sometimes the choice may be due to functionality, eg multisig is built in to bls whereas ed25519 requires custom code to do multisig.

Sometimes the choice may be due to performance (otherwise why not just use bls everywhere?).

Out of curiosity I had a look at how much difference there is between the two for signing and verifying.

All values are in microseconds per signature/verification, and is the average of 1000 samples. BLS is using a single keypair, ie not requiring any combining of signature shares, as a standard safecoin wallet would be doing.

	BLS	ED25519	BLS/ED25519
Sign 32 bytes	2220.78	19.36	114.72
Sign 1 MiB	5822.96	5826.58	1.00
Verify 32 bytes sig	5983.90	50.52	118.45
Verify 1 MiB sig	9478.69	2624.92	3.61

Signing small messages, BLS is about 100 times slower.

Signing large messages, they are identical, which was a big surprise to me.

Verifying signatures of small messages BLS is about 100 times slower.

Verify signatures of large messages BLS is about 3 times slower.

The info is I suppose not produced with any intention in mind, but I thought it was interesting to see the difference so I’ll put the figures here.

dirvine · September 8, 2020, 10:17am

Me too, I had thought bls always slower. BTW @mav were these done using the libs we are using? I know some BLS impl’s are quite a bit slower again.

mav · September 8, 2020, 10:30am

Yes.

Code is here: GitHub - iancoleman/ed25519_vs_bls_key_benchmarks: Compare performance of ed25519_dalek and threshold_crypto bls keys.

ed25519-dalek = "1.0.0"
threshold_crypto = "0.4.0"

Not sure if there’s any significant difference between ed25519-dalek 1.0.0 vs 1.0.0-pre.3 as used in maidsafe/routing, but I assume not.

Would be interested in others results.

tfa · September 8, 2020, 9:25pm

Here are my results (similar to yours):

	BLS	ED25519	BLS/ED25519
Sign 32B (µs)	3428.03	27.91	122.84
Sign 1MiB (µs)	8354.38	7677.05	1.09
Verify 32B (µs)	8492.67	73.78	115.11
Verify 1MiB (µs)	13616.37	3924.91	3.47

dask · September 9, 2020, 9:31pm

As a random question, what does this mean for the ultimate performance of SAFE? How many verify operations are actually carried out? With a naiive reading, I see 8ms/verify32 = ~ 120 verifies/second per core? How many is a vault actually performing?

mav · September 10, 2020, 1:02am

Would be good to know. For any individual vault it depends on

is elder or not
how many other nodes it’s connected to, ie network size and section size
the current rate of GET / PUT from clients
the amount of churn happening

So it’s a complex equation.

I’d be interested to see this info logged at some point so we can get some idea of it in testnets.

Toivo · September 10, 2020, 7:18am

Hmm… If this is going to be some kind of a bottleneck, is it possible to make ASICs for these signature processes?

dirvine · September 10, 2020, 7:26am

IMO This shows we should consider BLS a lot more. For multisig it’s great, however when not multisig then we should be using ed keys. Right now clients are all BLS, but that may need to change. Not a huge deal (it’s just another algo) so Signature for us may need to become

pub enum Signature {
  ED25519(EDSig), 
  BLS(BlsSig),
}

So not a showstopper in any sense. It’s things like these checks @mav and others do that allow us to be significantly more efficient. We knew bls is much slower, but seeing it laid out is a compelling picture and stops us from ignoring that fact. Many client actions will be small messages so it makes sense, if a data item is individually owned then we use ed keys, otherwise, we use BLS keys. So all BLS use will be multisig and never individual key use, which is actually nice.

urrtag · September 13, 2020, 7:23am

I thought that its typically the hash of a message youre signing, not the message itself, because of the slowness of pubkey crypto? But BLS seem not to be that bad for large msgs.

mav · September 13, 2020, 10:27am

Yes, good point.

bls threshold_crypto uses sha3-256 for signatures

github.com

poanetwork/threshold_crypto/blob/master/Cargo.toml#L32


      
          byteorder = "1.3.4"
          ff = "0.6.0"
          group = "0.6.0"
          hex_fmt = "0.3.0"
          log = "0.4.8"
          pairing = "0.16.0"
          rand = "0.7.3"
          rand_chacha = "0.2.2"
          serde = { version = "1.0.104", features = ["derive"] }
          thiserror = "1.0.22"
          tiny-keccak = { version = "2.0.1", features = ["sha3"] }
          zeroize = "1.1.0"
          
          # optional
          bincode = { version = "1.2.1", optional = true }
          codec = { package = "parity-scale-codec", version = "1.2.0", default-features = false, features = ["derive"], optional = true }
          
          [dev-dependencies]
          bincode = "1.2.1"
          criterion = "0.3.1"
          rand_xorshift = "0.2.0"

ed25519-dalek uses sha2-512 for signatures

https://github.com/dalek-cryptography/ed25519-dalek/blob/master/src/secret.rs#L406

I added these two hash functions to the test, these are the results

	SHA2-512	SHA3-256	SHA2 / ED25519 Sign	SHA3 / BLS Sign
32B (µs)	0.358	0.494	0.0194	0.0002
1 MiB (µs)	2607.408	3477.901	0.5081	0.6141

For small messages signature hashing is a tiny portion of the total time, less than 2%.

For large messages signature hashing is around 50-60% of the total time.

Added to the benchmarking tool in commit 4d64af4.

jlpell · September 13, 2020, 1:07pm

Should you be comparing sha2-512 to sha3-512?

mav · September 13, 2020, 11:04pm

The idea is to see what portion of the signing time is spent doing the hash, so it must be using the underlying hash in the implementations.

For bls (GitHub - poanetwork/threshold_crypto: A pairing-based threshold cryptosystem for collaborative decryption and signatures used in HoneybadgerBFT implementation) this is sha3_256

For ed25519 (GitHub - dalek-cryptography/ed25519-dalek: Fast and efficient ed25519 signing and verification in Rust.) this is sha2-512

So the comparison, despite being different lengths, is valid since it allows us to see the portion of time spent during each signature on the hashing.

This made me wonder though if we could remove the dependency on sha2-512 and use only sha3-256, since ed25519-dalek has a function sign_prehashed. But then I saw within that functin sha2-512 is called anyhow so there’s no avoiding it. Nice little comment to go with it:

github.com

dalek-cryptography/ed25519-dalek/blob/75a199e5a0a9313c4d31b977569f2c793dc242a0/src/secret.rs#L477-L495


      
          // This is the dumbest, ten-years-late, non-admission of fucking up the
          // domain separation I have ever seen.  Why am I still required to put
          // the upper half "prefix" of the hashed "secret key" in here?  Why
          // can't the user just supply their own nonce and decide for themselves
          // whether or not they want a deterministic signature scheme?  Why does
          // the message go into what's ostensibly the signature domain separation
          // hash?  Why wasn't there always a way to provide a context string?
          //
          // ...
          //
          // This is a really fucking stupid bandaid, and the damned scheme is
          // still bleeding from malleability, for fuck's sake.
          h = Sha512::new()
              .chain(b"SigEd25519 no Ed25519 collisions")
              .chain(&[1]) // Ed25519ph
              .chain(&[ctx_len])
              .chain(ctx)
              .chain(&self.nonce)
              .chain(&prehash[..]);

mav · October 7, 2020, 4:30am

There is this FPGA implementation for BLS12-381 by Ben Devlin

In the architecture docs they describe performance characteristics:

performance was benchmarked vs the Rust bls12_381 crate on a 32GB, 3.7GHz i5-9600K CPU. FPGA is running at 200MHz. … see a 2.9x speedup in the final ate pairing.

I would have thought more than 2.9x speedup. They list 4 points for Future Optimizations, but anyway, this was an interesting result to me, I would have expected greater speed improvements.

Video presentation on the library on youtube: Ben Devlin | Blockchain Acceleration Using FPGAs.

mav · October 7, 2020, 5:53am

supranational have a bls implementation with rust wrappers, it’s “a BLS12-381 signature library focused on performance and security. It is written in C and assembly.”

https://crates.io/crates/blst

I added this to the benchmarks (commit d71b596), it’s fast.

(TC is threshold_crypto currently used by sn_node etc, BLST is supranational blst crate)

Test	TC	BLST	TC/BLST
Sign 32 bytes (µs)	2379.36	192.81	12.34
Sign 1 MiB (µs)	5934.10	2736.71	2.17
Verify 32 bytes sig (µs)	6235.49	846.80	7.36
Verify 1 MiB sig (µs)	9937.81	3471.24	2.86

Signing small messages blst was about 12x faster than threshold_crypto

Verifying small messages blst was about 7x faster than threshold_crypto

Signing and verifying large messages blst was about 2-3x faster than threshold_crypto

There are two modes for blst, one for minimal-pubkey-size operations and one for minimal-signature-size operations. This test was done with minimal-signature-size operations, although I did test with minimal-pubkey-size operations and it was a little slower but still much faster than threshold_crypto.

Be aware this is a very basic test and group multisig operations haven’t been tested, but on the first impression it seems pretty fast.

mav · April 14, 2021, 10:27pm

ed25519-dalek has support for avx2 backend which speeds up signature verification about 1.5x

See https://github.com/dalek-cryptography/ed25519-dalek#benchmarks

Results from my own tests:

Veryify 32B with / without avx2: 36,452 ns / 57,232 ns

Veryify 1 MB with / without avx2: 2,359,297 ns / 2,818,746 ns

Using avx2 is up to 1.57x faster signature verification on my laptop.

It was a bit messy for me to get working, I couldn’t get the --features=avx2_backend flag to work since it’s a curve25519-dalek feature not an ed25519-dalek feature.

Potentially an easy win to implement at some stage since the default is not to use avx2.

dirvine · April 14, 2021, 10:35pm

I have not looked, but is this an smid type thing where we just need architecture feature gates?

Seriously @mav thanks for all you do here. It’s like having our own research department. I wish we could pay you somehow!

mav · April 14, 2021, 11:34pm

Yes it’s simd, via packed_simd. However:

WARNING: this crate [packed_simd_2] only supports the most recent nightly Rust toolchain and will be superceded by stdsimd.

There’s some crates.io quirks meaning packed_simd has become packed_simd_2, see packed_simd/issues/303

We cannot [push to crates.io packed_simd]. We lack the crates.io permissions. The original maintainer is out of contact.

If you adjust your dependency entry like this it should pick up the crate we published under an alternative name.

packed_simd = { version = "0.3.4", package = "packed_simd_2" }

Also see curve25519-dalek/3.1.0/Cargo.toml L49-51:

# The original packed_simd package was orphaned, see
# https://github.com/rust-lang/packed_simd/issues/303#issuecomment-701361161
packed_simd = { version = "0.3.4", package = "packed_simd_2", features = ["into_bits"], optional = true }

To get avx2 working on my machine I changed my local copy of ed25519-dalek Cargo.toml like this:

-curve25519-dalek = { version = "3", default-features = false }
+curve25519-dalek = { version = "3.1.0", default-features = false, features = [ "avx2_backend" ] }

Also to build with avx2 I had to use rust nightly which is not really ideal. Stable gives this error:

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /home/ian/.cargo/registry/src/github.com-1ecc6299db9ec823/subtle-2.2.3/src/lib.rs:12:34
   |
12 | #![cfg_attr(feature = "nightly", feature(external_doc))]
   |                                  ^^^^^^^^^^^^^^^^^^^^^

Good to keep a record of this messing about while it’s fresh.

Traktion · April 15, 2021, 7:25am

Great to see your expertise in this area being shared, @mav ! I always enjoy seeing your posts and the positive impact it has on the project.

mav · July 24, 2021, 9:47am

Is it possible to get stats for how many sigs vs how many pubkeys are sent over the wire?

I’d imagine it would be many more sigs than pubkeys.

The two modes give:

mode	pk bytes	sig bytes
minpk	48	96
minsig	96	48

blsttc uses minpk because that’s the same as what threshold_crypto implements for their pubkeys and sigs (threshold_crypto doesn’t have a minsig mode; all public keys are 48 bytes and all sigs are 96 bytes in threshold_crypto)

So if we send many more sigs than pubkeys it might be worth using minsig. Or if we send many more pubkeys than sigs it might be worth using minpk. Or if they’re about even then it makes no difference.

Curious to know if there’s room for saving bandwidth by using one mode over the other? Probably only a tiny improvement but anyway figured I’d point it out. Might be interesting to measure this on a testnet.

Need to also consider dkg has been developed and tested using minpk (48 byte pubkeys) and may not necessarily work the same way for minsig (96 byte pubkeys).

An inconsequential point to note is the secret key stays the same for both modes.

dirvine · July 24, 2021, 9:59am

They are almost entirely even as we include the pub key with the sig in all cases. We need to check it’s 100% but I think it is and has to be as the id of the node etc. is a pub key and for data we sign including pub key, but we perhaps don’t need to as the data itself defines the key(s) that must be used to sign, paging @bochaco and @chris.connelly

Topic		Replies	Views
BLS signature aggregation performance Features	5	922	June 3, 2021
What kind of keys does SAFE use for group-signing? Features	4	564	June 15, 2020
RFC 59: Boneh-Lynn-Shacham scheme in Routing RFCs	7	1904	July 3, 2019
Feedback and Questions on the SystemDocs Community	55	2995	May 10, 2015
Is this true? Features	2	617	January 5, 2023

Ed25519 vs bls performance

Related Topics