Group Size security calculator


#1

Efficient security of the SAFE network is one of the aspects that tends to get people fairly excited, especially in contrast to blockchain.

So hopefully this interactive Group Size Calculator I’ve be working on will be an interesting read.

https://iancoleman.github.io/groupsize/

It’s still taking shape but I think it’s ready for some initial feedback and conversation, especially about the way security is actually measured and modelled.

There’s still progress to be made but am looking for feedback on the existing work. Please do provide criticism, this is just a curiosity I wanted to explore and would be glad to know where it goes astray.

A lot has already been said about this topic (see the More Reading section), but I hope the calculator serves as both a layman explanation as well as a modelling and exploratory tool.


#2

Nice work Ian, looks really cool!

Where you calculate the quorum, you should use floor(quorum_fraction * vaults per group) + 1, so that when you have quorum_fraction=0.5 and vaults per group equal to an even number like 8, you get: quorum = floor(0.5 * 8) + 1 = 5, rather than quorum = 4 (which is the result with ceiling(0.5 * 8)).


#3

Very cool! You are a credit to this community!


#4

I think this attack is reasonably easy to carry and I will reformulate it with other words and with slightly different figures but the overall result is the same.

This attack is done in 2 steps:

  • the first part uses the birthday paradox to get 2 vaults in the same section. If the number of sections is T then the formula to have a 50% chance to get such a pair is 1.177 * √T, for T = 6667 sections that gives 97 vaults

  • the second part keeps those 2 vaults running and reuse the others to get 14 more vaults in this section to get the quorum over 15 existing vaults on average. The section might be split during the process but this isn’t a problem because the quorum will be reached in one of the halves. We need T/2 trials to have a 50% chance to get one vault in this section and 14 times more to get 14 vaults in it (7 * T). For 6667 sections that gives 46667 trials.

You indicate one trial per second. A more precise approach is to consider that starting and relocating a vault takes 5 minutes (I don’t remember the exact figures since we had vaults at home). To simplify computations, a total of 116 vaults is used, to have 100 free vaults in the second part. As 5 mn = 300 s, we can get a trial every 300 / 100 = 3s.

This means the second part lasts 46667 * 3 = 140000 s = 39 h.

For completeness, the first part lasts 5 mn as we can start the 102 vaults all at once. Maybe this part can be relaunched several times to try to get a smaller section whose quorum is lower.

In conclusion, only 116 vaults and 39 hours are needed to have a 50% chance to control one section in a 100000 nodes network in current conditions.

Clearly these conditions must be improved (increase delay to relocate a vault, increase min group size, restrict quorum, …)


#5

So how does node aging and “voting” rights affect this. I am thinking in terms that sections will be splitting before you can get quorum in a lot of cases. It may or may not increase your 39 hours?

Has the 28 out of 32 agree changed to 50/50?


#6

Quite a bit actually, when you join a section you are moved almost immediately to the next section. Then after 2 churn events, moved again and then after 4 etc. etc… The weight you have then influences the vote you have (older nodes are respected more) but the Quorum still requires >50% of nodes in the decision. Targeting then becomes much more difficult. I have a constant nagging, but have not made strong enough arguments or detailed the design of this part, that nodes under a certain age have zero vote (infants). There is also a possibility extra nodes per section do different things such as client requests, hop messages etc. Not all required for launch but long term these may bear a bit of fruit.

At the moment its >50% of a section where min size is 8 and dynamically goes to around 20ish. It’s a bit more involved, but you can consider Quorum to be >50% for now.


#7

Good probing @tfa, keep going!


#8

Exist a crate with routing simulation with several probabilities.


I think that calculate attacks without node ageing (or other possible relocation algorithm) is useless. Without that the network would be totally defenseless.


#9

Now I think this is a problem for the attacker because the newly created sections will have a longer prefix than their neighbors. The network has a rebalance mechanism that changes a relocation address targeting these sections to a neighboring one with a shorter prefix. This means that all neighbors will have to be filled and split before the attacker can fill again the section he is really targeting. I estimate that there are around a dozen neighbors for a network of this size, which renders this attack impracticable in this case.

There are two ways to improve the attack to overcome this difficulty:

  • The first one is to target one half of a section so that the malicious nodes automatically get the quorum majority when the section is split. This doesn’t impact the number of needed vaults and only doubles the attack time (because the second phase duration is proportional to the number of sections).

  • The second way is to repurpose the first phase to find a section having only 10 nodes or less (so that it doesn’t split when 11 nodes are added to it). A vault knows the size of its own section but also the size of its neighboring sections, about a dozen sections in this configuration. I estimate you just need to create and then delete less than 1000 vaults to have a complete mapping of such a network. This can be done in less than one hour (with a trial every 3s as indicated in my previous post). In a network of this size I can’t imagine that all sections are above 10 vaults (unless of course min group size is raised to 11, but I am talking of the network under current conditions).

Then the attacker chooses the shortest section as the target of the second phase and runs it until he gets n +1 nodes in this section (11 nodes in the worst case). Duration of second phase is shortened compared to my previous post:

  • Number of trials: 11 * T/2 = 11 * 6667/2 = 36 667

  • Durations: 36 667 * 3 = 110000 s = 31 h

First phase is one hour long at most (we can stop it as soon as a 10 nodes section is reached). So, with this new procedure only 32 hours and 111 vaults are needed.

That should do the job to counter the attack.

The attacker has to wait until a vault reaches the target section with a sufficient age. He definitively needs more vaults and has to slow down the trial rate. But I am not able to compute the corresponding values.


#10

I would love to contribute meaningfully to this conversation, but I don’t feel I understand enough and so will ask some simple questions. I may edit this post as I keep reading.

  • What would an attacker be able to do with a “controlled” group of nodes?
    Any examples?
  • Wouldn’t the network find the same source(s) making thousands of vault join requests to be suspicious? Is there an in-built method to restrict this?

Final thought: I truly appreciate this community’s collection of polite, intelligent, cross-disciplinary experts. I sense great creativity coupled with the focus on non-abrasive security. :v: (ಠ‿ಠ)


#11

If there were network created things like safecoin, they could do that. It could also drop messages from passing through it and more. Detectable for sure, but the intent is to make it infeasible for this to happen anyway.

Not so simple, this source would join different groups (or try to) and won’t be noticed so much cross network. There’s a few things to make this harder and then as useless as possible like resource proof and then node age respectively.

These are the scenarios we dream up all the time, but much worse sometimes and much deeper. The folk working 100% on it, forget what’s there at times, but it’s constant in our meetings. It has to be. Sometimes there are folk we can show a problem to and they get excited at the prospect of fixing it, however sometimes people crumble and think it can never be fixed :wink: It’s really interesting though, I love the challenges and even more when we find the correct solutions, especially when its simpler, and more so when we do it without timers and magic numbers :smiley:


#12

Nodes in a controlled section could delete or alter the data they are responsible for (meaning whose address is within the range of the xor space of the section) and that includes safecoin transactions (though we don’t know yet how they will be implemented).

They can delete the nodes they don’t control in the section and only accept relocated nodes they control. By doing this they can maintain absolute power in the section.

I wonder if they could also destroy the whole network by acting like a black hole: if they never relocate themselves and delete all the nodes that are relocated to this section, then other sections will shrink including neighboring ones which will merge with them when they have only 8 nodes left (which will then be deleted because they will be in minority in the merged section).

As the nodes can all have a different IP address and IP addresses are removed at first hop, there isn’t really a common “source”. Node ageing seems a better way to protect the network from a controlled section.


#13

Great post tfa. Glad to see other approaches are in line with each other.

I think it’s harder than this. According to the relocation algorithm it requires two nodes directly next to each other

the target is computed from the two closest nodes and the joining node’s current name

So the difficulty is based on the number of vaults rather than the number of groups (ie about 15 times harder).

One of the main lessons I took from fiddling around with the calculator is that increasing group size is very effective increasing the difficulty of attack. So is increasing the amount of churn.

Things like increasing the chance of success or even the rate of attack are not as effective.

I think improvements to security from node ageing will make attacks very difficult.

I also think offline attacks are difficult (ie precomputing a key that hashes to a specific relocation point) because they’re prone to being interrupted. The larger the network the higher the chance of an interruption. But offline attacks do seem quite feasible… but also unavoidable since the relocation mechanism must be non-predictable but still verifiable.

I think you will appreciate this code comment about quorum: github.com/maidsafe/routing/src/lib.rs

// Quorum is defined as having strictly greater than `QUORUM_NUMERATOR / QUORUM_DENOMINATOR`
// agreement; using only integer arithmetic a quorum can be checked with
// `votes * QUORUM_DENOMINATOR > voters * QUORUM_NUMERATOR`.
pub const QUORUM_NUMERATOR: usize = 1;
pub const QUORUM_DENOMINATOR: usize = 2;

I love the way the network doesn’t ‘know about time’. Using churn events as a form of timing is brilliant. It’s a really underappreciated aspect of the network imo.

I couldn’t find any discussion about this simulation, can anyone help with this? I’d like to add more info to the ‘More Reading’ section of the page than just the repo.

I also wonder this but I think if datachains are used effectively it should pick up on this ‘incorrect’ behaviour of never relocating.


#14

That was a very old repo, we had some post-docs working on kademlia simulations back in the day. The more recent crates would be simulations we currently do, @michaelsproul has a good github repo on data chain simulation amongst others. Well worth digging in his github. He tends to quickly pull together sims. A lot of off line simulations and mini tests are done to, we cannot capture them all though as we throw away an awful lot of ideas. Here’s a couple of Michael’s playgrounds :wink:
https://github.com/michaelsproul/ewok
https://github.com/michaelsproul/gossip

We will get better at documenting things though, we have a few examples progs in some crates we use to run up testnets to try out some plans.


#15

I really like this topic. Node_age is indeed a very protective thing as you don’t have a clue to which group you’re moved when you are getting a higher age. So in a 100.000 group network with unlimited resources I would try the following:

  • Start 1000 Vaults and keep them running until they all aged to 2 or 3.

To keep things simple I will use easy numbers:

  • Kill all your Vaults that are in group 1 to 50.000 Keep the others (in groups 50.001 to 100.000) running.

Now you have 500 Vaults in 50.000 groups which is 0.01 Vault per group. (I know that’s impossible just like the average family having 1,6 child, but it is just for the calculation).

  • Do this trick a hundred times and you might have 1 Vault per group.
  • Do it 500 times and you might take over a group if you’re lucky to find a small one with only 8 nodes.

But there’s a catch; your Vaults won’t stay in the same group as that changes with age. And they can’t target that new group.

So an attack could only be done with nodes that have a high age (as they don’t switch groups that much) and with a lot of resources.