## Aim

As part of our focus on evaluating the design of the SAFE Network for Fleming, we took a good hard look at Sybil resilience.

The main objective of this work was to clearly map out the extent to which the Network is resilient to Sybil attacks.

This knowledge would then provide us with certain constraints that we’d need to take account of in developing other aspects of the Network design.

## Assumptions

We’ll start by describing the Network according to a set of assumptions. This creates a model that lets us run simulations and analyse the system’s response to perturbations in the parameters.

### Uniform work

We define the concept of *iteration* as being the point at which all nodes in the Network have completed one unit of work (`w`

).

In a real network, the pace at which useful work is carried out by different nodes may vary. However, this simplification allows for a uniform description of the Network. We’ve also assumed that minor variations between different nodes will cancel each other out, because this allows us to gain good insight in the trends of the system at a macro scale.

### Ageing

Since node ageing is the main weapon against Sybil attacks on SAFE, we now have to simulate ageing in order to measure its effectiveness.

In the fully featured SAFE Network, node ageing will use proxies for the amount of work done by a node to estimate how much useful work each node has contributed to the Network. It will then age nodes logarithmically: in other words, every time a node has done *twice* the amount of work it has previously completed, it will gain `1`

age.

For this model, we assume that these proxies are perfect. This means that the age of each node is equal to `log_2(w)`

: the logarithm of the number of work units they’ve done. Because of the uniform work assumption, this is equivalent to saying that their age is `log_2(n)`

where n is the number of iterations for which they’ve been a member of the Network.

### Nodes leaving

To simulate the probability of nodes leaving the Network at any point in time, we use an exponential decay approximation.

This approximation appears to be sensible as these trends were measured in existing peer-to-peer networks such as gnutella.

We use the simplest expression of this exponential decay available to us by assuming:

`Prob(node leaving) = 1 / w`

where `w`

is the amount of work done by this node.

We don’t try to tweak any constants to fit a hypothetical reality as we are mainly interested in the trends. We’re not attempting to provide precise definitions of `w`

or what actually constitutes a single `work unit`

here. This means that we can simply make deductions based on the macro-impact of certain decisions according to the trends observed in the system.

Later, we’ll be able to build a more precise picture by estimating values for these constants based on the *actual* behaviour of live test networks.

### Nodes joining

We’re also considering one idea for reducing the possible impact of an attempted Sybil Attack: the Network should only accept joining nodes on its own terms (i.e. if it needs the extra-capacity that these nodes could offer).

Such a measure would prevent any entity from immediately offering nodes that would cover a high percentage of the Network because they have a malicious intent to take over.

In our simulations, we are considering that such a measure be put in place so that nodes only get accepted as the Network needs more capacity.

We should be clear that this exercise does not include any simulation of how data will perform. Instead, the rate of growth of the Network is simply defined as the rate at which the data stored in the Network grows.

In some simulations, we consider the steady-state (where the number of nodes in the Network is not changing). In others, we look into the impact of varying growth rates on the system.

For the steady-state case, any node leaving the Network is immediately replaced with a node of *age 4* joining the Network. The reasoning behind this is that if a joining node starts at age 4, it will avoid immediate churn in the Network.

## Variables

The main point of modelling the Network’s Sybil resilience is to inform the design work. Here are some levers we can action to influence the design:

### Node ageing

To a large extent, node ageing was designed as a Sybil resiliance mechanism so:

- How does it accomplish its task?
- What is its impact?

### Section size

- How many nodes should there be in any given section of the network?

This particular design decision has ramifications on many aspects of the design of the Network. For instance, a larger section will lead to fewer hops through the Network when communicating messages across the Network. On the other hand, a larger section may mean that the number of nodes that can be communicated with directly by any given node grows beyond what’s technically feasible. For this reason, other aspects of our design work interplay with this decision.

- Being mindful of this, what is the impact of the minimum section size on sybil resilience alone?

### Number of Elders per section

In the SAFE Network, Elders are the oldest nodes in any given section. They are trusted with important tasks such as reaching consensus on events happening in their section. No individual Elder is ever trusted, but a quorum of elders can make decision together on behalf of their entire section.

Varying how many Elders constitute a quorum may impact Sybil resilience. In a way, reducing the number of Elders may centralize the decision-making power within a section.

So we have to answer this unintuitive question: what is the right balance of centralization that will make our decentralized system the most secure?

Note that it may seem that considering some level of centralization as a measure against Sybil attacks goes against the principles of the Network. It isn’t actually the case: rather than a binary decision, centralization vs decentralization can be seen as more of a slider. Our aim is to pick the setting on that slider that makes the Network the most fit to deliver on its fundamentals overall.

Like the section size, the number of Elders per section has wide reaching implications on the design of the Network overall.

For its two main components, having a bigger group means that the consensus algorithm (PARSEC) needs to scale to a larger number of participants.

Beneath a certain number, having less Elders may negatively impact the reliable delivery of messages through the Network if connectivity between nodes is limited.

## Threat models

The main attack vector we are considering in this document is Sybil attacks: in other words, where an attacker creates many pseudonymous identities in an attempt to gain more influence on the Network.

We considered two different scenarios:

### Botnet attack

In one scenario, the attacker gets hold of a botnet. In such a case, we can assume that the financial cost of maintaining an attack for a long time is rather low to the attacker. Practically, it’s more the opportunity cost of not using the botnet for other nefarious purposes that may be more lucrative.

For this reason, we allowed a botnet attacker to provide a larger proportion of nodes to the Network (20% of all joining nodes in our simulations).

We are also interested in observing the behaviour of such an attack over a relatively long time period.

The attacker will also like face the same challenges with the reliability of his nodes as any other: therefore, our simulation assumes that the attacker’s nodes will leave the Network at the same rate as any other node.

### Datacenter attack

Another possible situation consists of using a large amount of reliable compute power in an attempt to target the Network. For instance, an attacker could run their own datacenter or simply rent a server farm for a certain amount of time.

The financial cost of such an attack is higher to the attacker, particularly when maintaining that compute power over time. Therefore, we’re more interested in observing the impact of such an attack over a relatively short time period - but with the optimal amount of compute power needed to maximise Network disruption. We saw that an attacker who provides above 5% of all nodes joining the Network didn’t improve the prospects of an attack significantly, so this is the magic number that we chose for our simulations.

## Simulations

### Picking a network size and number of iterations

In most of our simulations, we picked a time horizon of 20,000 iterations with 100,000 nodes. These values were picked so that the simulations could run in a reasonable time (i.e: ~1 hour).

By running one simulation with 100,000 nodes and another one with 200,000 nodes, we could see that the impact of the number of nodes was quite limited. This meant that we could use the results we obtained with 100,000 nodes in various conditions and extrapolate the trends for larger networks.

All of the curves in this blogpost are representing the same thing, so we will use this opportunity for giving a short explanation about them:

On the `x`

axis, we have the number of iterations. With a network of age 12 (the Network has been running for 2^12 iterations until now, so a minority of older nodes have age 12 and most nodes are younger), 20K iterations represent about 5 times the original amount of work done by the Network. In this case, the Network’s starting age is 20 so 20K iterations represent about 2% of the entire lifetime of the Network.

On the `y`

axis, we have information about the proportion of nodes in any section. The distribution varies over the entire Network, so the black curve represents the average, the blue and pink curves represent half a standard deviation and a full standard deviation away from the average (to give an idea of the spread). The green curve represents the percentage of malicious Elders in the section in the entire Network that has the most malicious elders.

Because any Section that has 33% of its Elders as malicious means that they can stall progress in that Section, a very simple way to interpret this curve is that the earlier the green curve crosses the 33% percentage line, the worse this set of parameters is at fending off a Sybil attack.

By this measure, the particular conditions we picked here are terrible.

Of course, this is to be expected as this choice of parameters was totally arbitrary and simply used as a starting point. Continuing this post, we will show how a more careful choice of parameters leads to much better results.

#### Conditions:

Variable | Value |
---|---|

Number of iterations |
`20K` |

Initial number of nodes |
`{100K, 200K}` |

Initial network age | `20` |

Network growth | `static` |

Minimum section size | `10` |

Ratio of elders to min section size | `100%` |

Attack vector | `20% botnet` |

### Figures with varying total number of nodes in the network

N: 100K | N: 200K |
---|---|

### Picking an initial network age

We played with a few different values for the initial network age. Let’s focus here on two different ages: 12 and 16. *Note that these represent the logarithm of the amount of work done, so a network of age 16 has done 16 times more work (i.e: 2^(16-12)) than a network of age 12*.

The general trend is quite predictable:

For a network of age 12, the amount of work done by the oldest existing members is *2^12 == 4096 work units*. This means that a time horizon of 20,000 represents ~5 times more work than has been done by the oldest Elders.

For a network of age 16, the entire duration we’re looking at represents 1/3 of the work done by the oldest Elders. This means that we learn less about how the Network would behave in the long run.

Since we are looking for high-level trends, it is better for us to mostly use a younger network here (age 12). This allows us to extrapolate the results to older networks by keeping the amount of work an adversary puts in proportional to the amount of work did before that adversary joined.

In other words, using a young network means we can easily (with fewer computations) create situations where an adversary is able to take over the Network. This helps us to see what can be done to mitigate such attacks.

#### Conditions

Variable | Value |
---|---|

Number of iterations |
`20K` |

Initial number of nodes |
`100K` |

Initial network age |
`{12, 16}` |

Network growth | `static` |

Minimum section size | `50` |

Ratio of elders to min section size | `100%` |

Attack vector | `20% botnet` |

#### Figures with varying starting age

age: 12 | age: 16 |
---|---|

### Impact of network growth

We simulated network growth so that we could measure its impact on the Network.

Everything else being equal, we compared a static network with two networks: one that grew 7 times and another that grew 50 times during the duration of the simulation.

Note that since we simulated an attacker that keeps providing 20% of all new nodes, we assumed that the rate of resources provided by the attacker was able to match the network growth.

Given the dramatic difference in network growth, we observed that the impact on Network behaviour wasn’t as high as we may have predicted.

Roughly, a network that is growing very fast behaves similarly to a network with a lower initial age as the ratio of existing nodes to the nodes the attacker provides decreases with the growth rate.

Because simulating network growth makes the simulations slower and we saw that the trends weren’t massively different, we decided to use a network with static size in any further simulation.

#### Conditions

Variable | Value |
---|---|

Number of iterations |
`20K` |

Initial number of nodes |
`100K` |

Initial network age |
`12` |

Network growth |
`{static, 7X, 50X}` |

Minimum section size | `50` |

Ratio of elders to min section size | `100%` |

Attack vector | `20% botnet` |

#### Figures with varying network growth

static | 7X | 50X |
---|---|---|

### Impact of the section size

With the number of nodes and the initial network age decided, it was time to start looking into the impact of variables that we have control over when designing the SAFE Network.

We did a few simulations with varying values for the minimum section size.

The result is pretty significant! With this set of parameters, a network with sections of size `10`

will not be able to withold a 20% Sybil attack for any stretch of time. But increasing the section size to `100`

drastically reduces the probabilty of a section being owned by the attacker.

The spread of ratio of elders per section becomes much tighter across the Network, which means that it becomes considerably harder for an attacker to control 1/3 of any given section.

At this stage, we didn’t test a minimum section size larger than 100 as we thought that there may be scaling limitations by going beyond this order of magnitude. Now it’s time to explore other ways to reduce the impact of a Sybil attack.

#### Conditions

Variable | Value |
---|---|

Number of iterations |
`20K` |

Initial number of nodes |
`100K` |

Initial network age |
`12` |

Network growth |
`static` |

Minimum section size |
`{10, 50, 100}` |

Ratio of elders to min section size | `100%` |

Attack vector | `20% botnet` |

#### Figures with varying minimum section size

10 | 50 | 100 |
---|---|---|

### Reducing the number of Elders per section

One idea that we explored was to reduce the number of Elders in each section. This is closer to what is specified in the current Network specifications.

We ran various simulations. Here is a sample of 3 simulations with various number of Elders per section under two different attack vectors: the 20% botnet and the 5% datacenter attack.

The most important feature of these graphs is the delay happening for around 5000 iterations (around twice the age of the Network) when reducing the number of Elders.

It’s important to remember that however powerful any attacker is, their nodes must first catch up with the age of the existing Elders before they can pose a threat. Because there are now only 10% Elders, the attacker must wait for their nodes to become one of the 10% oldest nodes.

With this information, it’s quite obvious that if we were to assume a technical limitation of ~100 nodes per section, having only ~10 Elders per section is much better than having mostly Elders in the network.

It is also better for scalability of the Network to have less Elders per section as Elders are much more connected than simple Adult nodes.

#### Conditions

Variable | Value |
---|---|

Number of iterations |
`20K` |

Initial number of nodes |
`100K` |

Initial network age |
`12` |

Network growth |
`static` |

Minimum section size |
`100` |

Ratio of elders to min section size |
`{10%, 17%, 17%, 30%, 50%, 100%}` |

Attack vector |
`{20% botnet, 5% datacenter}` |

#### Figures with number of elders per section

##### 20% botnet

10 | 17 | 100 |
---|---|---|

##### 5% datacenter

10 | 30 | 50 |
---|---|---|

### Scaling up

So far, we saw that a low ratio of Elders (~10%) is good as it imposes a delay before any kind of attacker is able to do any damage to the Network.

We don’t have hard numbers that say that 100 nodes per section represents the largest number that is technically possible. But we’ll know what is technically acceptable when we have more data from the testnets.

So - assuming this limit of 100 isn’t a real limitation in any way, how can we improve Sybil resilience of the Network?

To investigate further, we ran some simulations with larger sections - but this time, we kept the ratio of Elders to Adults at 10%. (A lower ratio means more concentration of power between Elders, which may have other undesirable consequences even without considering the impact on Sybil attacks).

These simulations showed very clearly that given 10% of Elders, the larger the section, the more resilient our network would be to Sybil attacks.

#### Conditions

Variable | Value |
---|---|

Number of iterations |
`20K` |

Initial number of nodes |
`100K` |

Initial network age |
`12` |

Network growth |
`static` |

Minimum section size |
`{100, 300, 500}` |

Ratio of elders to min section size |
`10%` |

Attack vector |
`20% botnet` |

#### Figures with increasing section size and fixed elders ratio

100 | 200 | 300 |
---|---|---|

## Conclusions

To summarise, these simulations have been invaluable to the design of the SAFE Network. While we can’t come up with a strict number for any specific variable without considering the scope of the wider Network, there are nonetheless some conclusions that we can make from:

- Node ageing mitigates Sybil attacks by forcing an attacker to maintain an attack for longer before they can participate in the decision making process.
- A larger Section size reduces the probability of owning any Section of the Network with a relatively low proportion of nodes.
- A smaller fraction of Elders adds a delay between the time an attacker starts providing nodes to the Network and the time before they can theoretically control a Section.

**A ratio of 10% of elders per Section seems to be striking a good balance between the scalability cost of having large Sections, the scalability cost of having many elders and the impact on Sybil resilience.**

As an order of magnitude, a minimum section size of ~100 seems to be required to be able to significantly affect the ability for an attacker to perform a Sybil attack. A larger Section (as far as connectivity permits) seems to only improve the Sybil resilience properties of the system.

With this knowledge, we can use information we will gain in future testnets (for instance: what’s the relationship between the number of Elders and the scalability of the Network as a whole) to decide of exact numbers while being aware of how they impact Sybil resilience of the system.

Of course, we are standing on the shoulders of giants when performing simulations of the network in this community. For instance, this post and this one already contained a treasure trove of information on how the Network would behave under a variety of circumstances. What we wanted to investigate here was specifically the impact of how exactly we define node ageing on Sybil resilience of the systems, which is why we went ahead and performed these new simulations.

These kind of topics are a place were the power of the community can definitely bring a lot, so if you fancy running your own simulations under various conditions and sharing the results with your own analysis, feel free to have a look at the code we used to produce this data.

In the next post in this series, we aim to shift the focus to another one of the “big questions” we’ve been discussing on the road to Fleming: “Network restarts”.