Analysing the google attack

Starting with this comment, please feel part of this process, you are and its really great to have people outside the office looking at this. It is very refreshing. Now, thank you for testing the initial age thing here, nice to see this.

Interesting thought.

Just on this, as you say churn is a human affected thing in the main. What we can use though is the fact that that as age increases then that peer will be 50% less likely to churn. Just a note.

Absolutely

Yes this does need more thought. Separation of concerns is always good, we are possibly coupling here and should not be.

It may be worth sharing some thoughts with you here. Here are some points, in no order really.

  1. Age is a measure of increasing work done by a peer.
  2. Work done should be something that has taken place over time and not able to be done faster by a larger peer etc. (no faster cpu/hashing etc.).
  3. Older peers are more trusted than younger ones
  4. A peer that has been doing work for X is very likely to be able to do 2X (this is about the only thing that came out of a study into gnutella IIRC). The point was on line time probability increases the longer a node is on line, so a node on for 1 hour likely to be on for another 1 hour period and so on
  5. Age uses a mod division to work out effort instead of counting events. Agreeing a count is to hard in SAFE type networks as its just extra state, so instead we use mod.
  6. Initially age was going to use the hash of hop messages through the section, but we considered that could be gamed by a bad actor sending messages that would give a hash to relocate their own nodes. Not as simple as it sounds to do that, but it was considered a risk none the less.
  7. Having infants not count as churn events, but using node lost or similar slows down ageing, but also prevents targeting to a great extent.

Looking at 6. again is probably something we should do, but ensure there are no targeting capabilities introduced.
Anyway I am dumping some thoughts here, no particular reason, just to help you see some thinking so far. I will try and do more of this in this thread where possible.

I don’t feel that making relocation orthogonal to churn events is a hard problem, just some subtitles around it. I will try and give this a bit more thought tonight. I have a load of paper here on another subject I will clear up first though, but I will get back to this later or tomorrow for sure.

Thanks again Ian, don’t worry about chatting publicly about all of these points, its the very best way to go most of the time. We do keep some stuff until we have good results, but that is because Engineering discussions can appear negative to non engineers, its just the logic sounds tough at times, but I think this community are strong enough for all of that. Its great for everyone to see the small things that matter a lot like orthogonality of code modules and algorithms/ I hate coupling so this is indeed a good discussion and very valid points about age/churn being coupled and trying to do too much. We can unweave that part quite easily with just a little more poking.

Cheers Ian, nice one again.

23 Likes

I tried the four possible combinations with:

  • youngest/oldest priority for relocation

  • allow/disallow more than one node aged 1 per section

In all cases the initial section never splits. Here are the results with 3000 iterations:

Oldest first + disallow more than one node aged 1:

Age: 1 2 3 4 5 6 7 8
Count: 1 15 3 4 3 2 1 2

Oldest first + allow more than one node aged 1:

Age: 1 2 3 4 5 6 7 8
Count: 2446 35 14 10 5 1 2 1

Youngest first:

Age: 1 2 3
Count: 5 2428 105

(last result is independent on whether or not more than one node aged 1 is allowed)

I also tried many other things:

  • enlarge the age range when choosing the relocated node

  • allow a larger number of aged 1 nodes in a section

  • instead of choosing one of the youngest or oldest nodes, choose a node in the most common age (to balance the age distribution in a section). It didn’t work but this an idea to keep in mind.

  • modify the drop probability function (I know this is an external factor, outside of the vaults control, but this was my last desperate attempt).

But in all cases the simulation never takes off and I am beginning to suspect a bug in Maidsafe simulation. As he has rewritten the simulation from scratch with another programming language (Go), @mav doesn’t have the problem.

@mav, what you do is a very important because it will allow to confirm results generated from the same set of rules but programmed by 2 independent people/team, using different algorithms and different languages. When Maidsafe makes its simulation work and if the results are similar then we can be confident on the future network behavior.

17 Likes

Great work @mav

Sounds like a great use case for adding a VDP file listing your preferred project dependencies :wink:

The current PoC implementation (python) won’t be useful yet (bitcoin fees too high) but I’m planning to release another one (written in Rust) later this year that will process payments in LN/BTC, and maybe BCH or LTC (not decided yet).

2 Likes

I am not sure why, in this case, but perhaps you are applying the rules for a complete group when there is none. So if there is no group with all elders > 4 then it is not complete, we allow any number of infants etc at that time. It could be your code si being to aggressive and applying the disallow etc. before a group is formed?

Just a thought, I am a bit flu ridden right now so accept errors :wink:

Not sure, @bart is off this week (mostly) but back tomorrow, he can confirm, but I can tell you the simulation is not released by us at all yet and is undergoing big changes with the design meetings as we move along. SO I would not be surprised if there is unpublished parts of the code there. It will all be fine soon, but mid test/impl we will make a ton of tweaks and it will inconvenience you a bit (sorry). I am sure there is a branch or off line code you need anyway.

4 Likes

My experience was with currently available @bart’s repository.

Thanks, I am looking forward to playing with this code.

3 Likes

This post investigates a targeted google attack, ie offline key generation to initialize a vault into a specific section.

Thanks to @digipl for provoking this line of thinking in a prior post.

Background Info

The relocation mechanism requires targeting to be not only possible but to also be viable. Vaults are relocated to a neighbouring section, which means the vault must be able to generate a key matching that section prefix. This is exactly the same thing an attacker would do to target a specific section.

“When a churn even does force a node to relocate to a destination group (D)… The relocating node then generates a key that will fit in D and attempts to join D.”

from RFC-0045 Node Ageing, more info in this post in the Data Chains topic.

The reason this targeting is damaging is due to relocation to neighbouring sections only. This means the possible sections to be relocated to is quite small, as outlined below.

Sections: total sections in the network
Neighbs: how many neighbours an average section would have (log2(Sections))
Reachable%: the percentage of sections available as a relocation target (Neighbs/Sections)

Sections Neighbs Reachable%
      1K      10 0.997
     10K      13 0.133
    100K      17 0.0166
      1M      20 0.00199
     10M      23 0.000233

This is not great! Vaults can only ever be relocated to a tiny subset of the network (typically less than 1%).

Targeted vs NonTargeted

The previous simulation was a nontargeted attack, akin to an intern at google deciding to start a bunch of vanilla vaults which are randomly located through the network.

A targeted attack means the attacker is always joining the same part of the network, which means their vaults will always reside in a small subset of the network as indicated above, even after relocation.

This concentrates their influence much more strongly than using random starting locations.

The ability to target a section is not only possible but is an essential feature for the current design, since vaults must be able to relocate to a specific neighbour.

Test Conditions

100K vaults (500K join interleaved with 400K depart). Then 10 attacking vaults all targeting the same section + 1 normal vault join + 1 normal vault depart. Repeat until a section is controlled by the attacker.

No disallow rule.

Full details in google_attack_targeted.go.

Result

Targeted NetSize  Test 1       2       3       4       5 | Avg Pct
      No   100K   125776   98073  137955  103118   97975 | 52.7
     Yes   100K     1852    4893    2598    2366    3694 |  3.0

Disallow Rule

This is probably a great example of where the disallow rule might be considered useful since it should prevent any single section being flooded by an attacker.

But it doesn’t work because it leads to a denial of ALL vaults to that section until the youngest vault moves away. An attacker spamming the section with joins makes the chances of a normal vault joining much lower, so all new joins are very likely to be from an attacker.

I think there are better alternatives to the disallow rule and it should be scrapped completely. I’m confident that punishment for misbehaviour is good enough. There’s no need to be strict at the point where users are simply trying to participate.

Also, at a more ‘social’ level, turning enthusiastic new recruits away is really poor. This is a legitimate marketing problem (and by marketing I don’t mean ‘impressing people’ I mean ‘increasing the size of the market for safenetwork users’).

Eldership

It’s worth considering, do vaults actually want to be elders? Intuitively I’d say ‘yes’ since it’s the end result of a long process, so surely it must be good! But in reality, being an elder means a lot more work and demand on resources, but for what benefit to the vault operator?

If vault operators only want to earn safecoin, they’re better off not being elders so they can put all their resources into serving user requests and getting maximum chances to farm new safecoins. Any work done ‘managing the network’ (ie being an elder) is ‘wasted’ from a safecoin perspective.

So I think eldership must be viewed as a chore not a privilege. To me this is a very significant change in perspective with significant consequences through the entire incentive structure and the design of participation, rewards, punishment, security… the whole lot.

At the very least it’s worth considering that maybe being an elder isn’t desirable to all operators and it’s possible that many elders will immediately leave the network upon being promoted to avoid doing the work so they can get back to earning safecoin with minimum effort.

Proposals

My suggestions for potential ideas:

  • no disallow rule

  • relocate to anywhere on the network, not just neighbours (a very large and technical discussion needed here)

  • farm rate varies depending on the amount of work done (so elders have economic incentive to take on the extra work they must do. A complex discussion to be had here)

  • elders are assigned on a ‘rotating’ basis, so the ‘chore’ is distributed rather than retained, many possible mechanisms for this, but maybe turns out not to be a desirable feature…?

  • more discussion of punishment as a way to replace the disallow rule

  • there are so many potential paths, these are just some of many, but they are the ones I currently think are the leading contenders. I’m totally open and excited to explore other potential paths.

It’s obviously not great to have major rethinks of critical security algorithms. I would love it if the existing design could be made to work. But so far I’ve not been clever enough to smooth the rought bits into a habitable design. Hopefully further thinking and discussion can get us there.

15 Likes

Ah just a quick point here Ian, this relocate is recursive, so if you see a neighbour that has a greater requirement than you then you relocate again. This requirement is up for debate (i.e. lower elder age etc. are all possible). Currently in master where we don’t relocate for age, but do relocate for security this is what happens. A node creates a key this key will be for initial group (lets call it X). Group X then decide on the actual joining group (lets call it Y). The nodes proxy is then informed to join Y and Y all send their connection info to the node. The nodes group Y is also recursive, so that node is also recursively relocated until it fits the best group (i,e, the group with less peers, only for now where we have no age).

It may not be so clear in the docs, but the relocate is not contained to neighbours, it is recursive, sorry for the confusion there. This recursion helps balance the network as we try to grow the network sections all at a similar pace.

[EDIT, above when I say you relocate, its not the node who gets to choose, but the groups being asked to relocate to, they will decide where the peer goes. The recursion goes along the network until the best location is found, best in the case is a group where none of its neighbours is better to accept the peer.]

12 Likes

Ah, good clarification, thanks for pointing this out.

I’ve added this recursion to the relocation mechanism to the simulation.

The simulation defines ‘greater requirement’ as “prioritise sections with shorter prefixes and having less nodes to balance the network”. This may change in the future.

Tracking the number of neighbourhoods before the ‘best’ section is found gave these results (see neighbour_relocation_hops.go):

100K vaults (500K join interleaved with 400K depart)

hops occurances
   0 288
   1 220779
   2 39293
   3 41590

301950 total relocations

Most relocations are within the current neighbourhood (1 hop away, about 73% of the time). But there are about 27% of cases where the vault is relocated further than just the nearest neighbourhood (2 or 3 hops away).

Redoing Targeted Google Attack

The targeted google attack was retested with recursive-neighbour relocation (which should spread attacking vaults further afield and reduce their impact).

Targeted Recursive NetSize  Test 1       2       3       4       5 | Avg Pct
      No        No    100K  125776   98073  137955  103118   97975 | 52.7
     Yes        No    100K    1852    4893    2598    2366    3694 |  3.0
     Yes       Yes    100K    5631    4031    5514    3810    4494 |  4.5

An improvement for sure, but still the effect of concentrated targeted attack is quite significant.

Thanks again for clarifying the details of the relocation mechanism.

14 Likes

I haven’t looked at the specifics of key-to-section allocation but isn’t it possible to make section targeting more expensive by forcing new vaults to incorporate some kind of proof-of-work in the middle of the key generation process?

1 Like

How would such targeting be carried out practically? I was under the impression that the network allocated a section to a new node at random?

1 Like

Is Avg Pct the percentage of the final network nodes which an attacker must have created to control one section?

1 Like

Yes I believe that it is. The relocation algo could suffer actually as @mav points out. Relocate to shortest prefix all of the time will help very young networks but would mean mostly local relocates when the network gets larger (due to much more balance). The trick here though is initial join, what we had/have is

  • a node creates a keypair,
  • a message goes to the group close to the key (X)
  • That key then is used along with agreed group data to create another (random) address.
  • The group close to the new address (Y) is then informed of a node joining
  • Group Y then accept the node if it had generated a key to fit Y.

So this method does something quite important. Rather than relocate moving nodes around the network randomly (which still may be a good thing to do) it means the node cannot target its first group. This is really important as otherwise an attack can be constructed fairly easily by a large player, even with node age.

@mav this may help your sim, but the results are interesting though to see relocate is pretty local and will be more so as the network grows. Perhaps there should be more randomness there. We will need to simulate that though as another attack is actually being to random, where an attacker waits on some of his nodes grouping together (randomly) if that makes sense. So diluting membership location is important.

Another important aspect is the key generated. If it is any key that fits a section then it can be done in a way to generate keys that will always split, say into the 1 section. That allows an attack that is more subtle, but can basically double the attackers chance of getting a group. That attack is weakened by asking the joining node to create an address within a range in the Y group. That range is not final but should be along the lines of the largest gap in the section members list (I mean we need to fill the section with nodes equidistant if possible, the relocate should force a joining peer to “fit” the range at the right height (of the binary tree)).

Thanks again @mav you are showing the importance of node age and relocate, that is neat from our perspective as it is a discussion we have internally, but so often some engineers feel/felt that it is an add on to worry about later and this shows how important this is. Nice one, you make my job much simpler with this stuff. :+1:

13 Likes

Just to clarify, key generation isnt just a key generation already. It involves resource proof as well which involves some calculations(can get replaced to data based checks from dst section) and sending data to prove some X resource capability. All of which is required to be vetted by the target section before a node gets accepted to that section.

10 Likes

Can give some context for this debate too. So with recursive relocation, its nice that a distribution is aimed and hopefully achieved but sometimes what can also be observed is multiple sections relocate a peer to the same target section and the target if just forced to accept can end up with a few peers that maybe dont help it Split and reach a higher prefix thereby creating a cluster of nodes which isnt too desirable either.

So recursive relocation itself is nice to disperse nodes, just hope to get it to a point where we dont accidentally swamp a section with a bunch of peers that dont help that section Split.

So in an ideal setup, you almost want to relocate peers to the section with highest req(recursively), only if that helps that section split(these nodes going into the adult/elder category of the target section), with nodes starting at age 4, the next relocation already makes them an adult(thereby being eligible to facilitate a split of the target section) which kind of bypasses this issue, however a window still exists where a section can be in the process of resource proof for multiple peers and is bombarded with new peers from other network sections that think it needs help.

9 Likes

@mav: First, would like to say your work is really great.

Just would like to clarify that with the infant created with age 1, according to the data chain proposal (section of Dead block’s Trigger for voting), which means that once a section reached complete group status (i.e. the network grows to certain size), the churn of infant nodes won’t trigger any relocation (affects aging).
Will this affect the numbers you got in your original analysis?

7 Likes

OK, please correct me if I’m wrong, but it seems to me like that resource proof happens after the vault has generated the key/name and tries to join the target section. If that’s the case then it’s still relatively inexpensive for a malicious vault to brute force a ‘vanity’ address that matches its indented target section and only perform the resource proof once.

A way to add proof-of-work into the key generation stage would be to consider the first n bits of the public address as a required pow, i.e. first 6 bits must be 0s for it to be a valid address, and only look at subsequent bits for section matching.

Again, I haven’t yet looked at the code, so apologies if this makes no sense or if it already works that way.

1 Like

This is similar actually. The resource prof sets the leading zeros and data size. The key the node generates is not the one it is allowed to use. That key gets to a group that then finds the new group the peer should generate a key for.

1 Like

It’s possible. My main concern is joining should be easier not harder. ie preventing a google attack should not come at the cost of preventing the participation of enthusiastic beginners.

See RFC-0030 Nodes Key As Name. Specifically step 2 - “Group (X) then take the 2 closest nodes to this [random key] and hash the three values (2 closest plus node).”

The targeting can be done by generating keys offline and testing the resulting section per step2 until a key is found that puts the vault in the desired section.

So it’s a bit more involved than just ‘generate a key for a section’ since it needs to also hash from that section to the desired new section based on the members of that section. Harder than finding a key for relocation, but still possible.

Yes. So targeting takes the required portion of the network an attacker needs from about 50% to about 5%.

This work very nicely if the random address is indeed random. But it’s proposed that it’s not random. It’s a hash of the attackers chosen pubkey + some section data. Since the attacker can choose their pubkey they can manipulate the outcome (assuming they can also know the section data). Importantly, the attacker can do the targeting entirely offline so it’s potentially very fast.

It mostly affects the viability (ie cost) rather than the possibility. I have assumed that targeting is possible, even if it’s a bit more fiddly than simple ‘vanity-gen’ style algorithm. Maybe that assumption turns out to be incorrect.

Yes I think this idea of relocating to ‘help create a split’ contributes well to the definition of ‘best’ section to relocate to. I like this idea.

I guess ‘helping prevent a merge’ is also a very high priority, especially since merges are often more complex and time-consuming than split so are less desirable.

My sim treats any departing node as a potential for relocation, so I need to update this to consider the difference of complete vs incomplete and elders vs all nodes. Thanks for the clarification. This will almost certainly affect the numbers, but probably for the worse since it results in less relocation.

I would love to see a proof-of-signature for this (I wrote about this a while ago on the dev forum) so the proof of work is related to real work to be done on the network. ie the supplied proof is to find a message that when signed with the keypair results in a signature with a certain desired property (eg X leading zeros).


The tests are starting to become incomparable so my focus for the short term is to build a standard set of metrics that can be used for meaningful comparisons between changes. Ideas for metrics are most welcome.

10 Likes

Absolutely, the key here is the random data. There are a few ways to achieve that or at least sources of random data such as last valid block hash (in the data chain). This wont be agreed by the whole group, but should be agreed by the quorum (that is all that is needed) most of the time. This means that there can be errors when the group is out of sync (normal) and the joining peers key is then rejected (from group X). So it has to create another key and try again. It will not be 100% sure to get a key accepted, but should not be many retries, if any, in normal operation.

Previously we did use the closest 2 peers to the chosen address and hashed al of that with the key to get to the Y group. Its OK but not as random as it could be.

3 Likes

I think this has to be a fundamental principle. Once you start down the path of making PoW in order to solve attacks then where does it stop.

Yes, any attacker can have a smaller number of very powerful number crunchers to solve things for the attacking NODES and since the NODES for the most part do not need the number crushing the ratio of number crunchers to NODES is small indeed. Another reason why PoW will not solve the security issue.

EDIT In times past the system was that a node joining a group (even after relocation) generated their key then the group it contacts then gives it another key thus making it harder to target a group. That seems to have changed now and is a loss in my mind. Even using a Fn of the node’s key + section data is making it easier to attack.

4 Likes