Analysing the google attack

Google attack stats to control one section

The google attack test should report how much work it takes to control a single section. This post uses a test which:

  • establishes an honest network of a certain size and average age
  • then repeatedly
    • add attacking vaults
    • add 1 normal vault for every 10 attacking vaults
    • remove 1 normal vault for every 10 attacking vaults
  • this attack pattern continues until the attacker controls a single section.
  • report how many attacking vaults are required to control a single section

This test better represents the target of the attacker to reach their motive (ie to control consensus). The test includes ageing and elders etc.

Results

Given an initial network size (col 1), how many vaults need to be added by an attacker until they control their first section (col 2-6)? And what percentage of the network does this represent (col 7, average of 2-6 as percent)?

Unexpectedly, the larger the network the smaller the proportion needed for success. Of course it’s still more total vaults to attack a larger network, but the proportion decreases as the network gets larger.

Netsize   Test 1       2       3       4       5 | Avg Percent
     1K     2599    1573    1964    2043    2340 | 67.4
    10K    11449   19945   15004   19601   11616 | 60.0
   100K   125776   98073  137955  103118   97975 | 52.7
     1M   893224  983631  864215  974953  724229 | 46.9
    10M        - 7026273 5017128 6996670 7921890 | 40.0

The simulation is deterministic so these tests are repeatable (see commit edf7aff). The simulations take optional flags for seed and netsize, eg $ ./google_attack -netsize=1000 -seed=2 should give 1573 attacking vaults (row 1 test 2)

One caveat is the ‘disallow rule’ preventing multiple vaults aged 1 had to be disabled because it prevents small networks from growing.

What does it mean?

These results should be taken with a large amount of skepticism since there’s so many unknown factors. And the ageing mechanism is still being fine tuned by the maidsafe team.

The answer to ‘how many vaults are required to control consensus’ can not be any more precise than ‘it depends how big the network is’. Even when the network size is known, it constantly changes throughout the attack so the amount of resources required is very hard to know in advance. The table above at least gives some idea of the magnitudes at play as the network grows.

I’m surprised how difficult it is to make correct assumptions about this attack; most of my intuitions about it have been shown incorrect by the simulations.

It always needs to be remembered that controlling a single section doesn’t necessarily give the attacker much benefit (as happybeing pointed out above).

Also from a marketing perspective, it may be desirable to use percentage figures from before the attack. So for row 1 test 1 in the table above, 2599/3599 = 72% is the proportion of attacking nodes on the network after the attack. But it’s more impressive and marketable to use the proportion of attacking nodes required before the attack, ie 2599/1000 = 260% of the network required to perform a google attack. That’s almost triple the current network size! 260% is much more impressive than 72%, even though it’s actually the same thing.

Impact of Ageing

A typical attacked section has an age distribution like the table below (in this case taken from netsize=100K seed=4), with detail for the elders (oldest 8 vaults).

Attacked Section Age Distribution

Age Attacker
7   false
5   true
5   true
5   false
5   true
5   false
5   true
5   true
5   false
5   false
5   false
Age Attackers NonAttackers
4   4         6
3   15        5
2   5         2
1   1         0

Resource bottlenecks

Yes this is an interesting question.

The limits on how many vaults can be thrown at a network… trying to wander through the variables… they must store data, but the more vaults there are the less data each vault has to store. They must supply bandwidth for chunks, so at some point that becomes a bottleneck depending on the size of their pipe. They must be timely in their responses, so latency and bandwidth both matter there, probably also cpu for signature generation and verification. Additional labour and skill to modify the vault code for coordination between the attacker vaults… All these things cost money so ultimately budget will limit the total number of possible vaults any one entity can run and the duration they can run for. Maybe I missed some aspects?

Amended thoughts on google attack

I don’t think it’s possible to fully model a google attack because it depends on human behaviours of non-attacking participants (which are difficult to predict and model). Despite the imperfections there can be some attempt by big farmers to categorise themselves as a potential attacker vs merely a large-scale participant. Bystanders will probably only know of a google attack after it happens.

A successul google attack seems pretty far-fetched to me… but bitcoin mining ended up more centralized than people first imagined so I’m not too keen on making predictions!

Where to from here?

I’d like to look into the difficulty of an attacker controlling all copies of a chunk, ie not just controlling a single section but also controlling sections with redundant copies of a chunk (see RFC-0023 Naming Immutable Data Types although routing as currently coded only keeps chunks in a single section so I’m not sure what the plan here is). However, if there was some sort of ‘cache scavenging’ mechanism to recover chunks from temporary caches anywhere on the network, this would negate the attack.

Data loss seems to be Peter Todd’s main concern of a google attack, so for that to happen all copies of the chunk must be controlled by the attacker (which the simulation in this post models as currently implemented, but not as currently designed, pending implementation of backup and possibly also sacrificial chunks).

35 Likes