Analysing the google attack


#62

My experience was with currently available @bart’s repository.

Thanks, I am looking forward to playing with this code.


#63

This post investigates a targeted google attack, ie offline key generation to initialize a vault into a specific section.

Thanks to @digipl for provoking this line of thinking in a prior post.

Background Info

The relocation mechanism requires targeting to be not only possible but to also be viable. Vaults are relocated to a neighbouring section, which means the vault must be able to generate a key matching that section prefix. This is exactly the same thing an attacker would do to target a specific section.

“When a churn even does force a node to relocate to a destination group (D)… The relocating node then generates a key that will fit in D and attempts to join D.”

from RFC-0045 Node Ageing, more info in this post in the Data Chains topic.

The reason this targeting is damaging is due to relocation to neighbouring sections only. This means the possible sections to be relocated to is quite small, as outlined below.

Sections: total sections in the network
Neighbs: how many neighbours an average section would have (log2(Sections))
Reachable%: the percentage of sections available as a relocation target (Neighbs/Sections)

Sections Neighbs Reachable%
      1K      10 0.997
     10K      13 0.133
    100K      17 0.0166
      1M      20 0.00199
     10M      23 0.000233

This is not great! Vaults can only ever be relocated to a tiny subset of the network (typically less than 1%).

Targeted vs NonTargeted

The previous simulation was a nontargeted attack, akin to an intern at google deciding to start a bunch of vanilla vaults which are randomly located through the network.

A targeted attack means the attacker is always joining the same part of the network, which means their vaults will always reside in a small subset of the network as indicated above, even after relocation.

This concentrates their influence much more strongly than using random starting locations.

The ability to target a section is not only possible but is an essential feature for the current design, since vaults must be able to relocate to a specific neighbour.

Test Conditions

100K vaults (500K join interleaved with 400K depart). Then 10 attacking vaults all targeting the same section + 1 normal vault join + 1 normal vault depart. Repeat until a section is controlled by the attacker.

No disallow rule.

Full details in google_attack_targeted.go.

Result

Targeted NetSize  Test 1       2       3       4       5 | Avg Pct
      No   100K   125776   98073  137955  103118   97975 | 52.7
     Yes   100K     1852    4893    2598    2366    3694 |  3.0

Disallow Rule

This is probably a great example of where the disallow rule might be considered useful since it should prevent any single section being flooded by an attacker.

But it doesn’t work because it leads to a denial of ALL vaults to that section until the youngest vault moves away. An attacker spamming the section with joins makes the chances of a normal vault joining much lower, so all new joins are very likely to be from an attacker.

I think there are better alternatives to the disallow rule and it should be scrapped completely. I’m confident that punishment for misbehaviour is good enough. There’s no need to be strict at the point where users are simply trying to participate.

Also, at a more ‘social’ level, turning enthusiastic new recruits away is really poor. This is a legitimate marketing problem (and by marketing I don’t mean ‘impressing people’ I mean ‘increasing the size of the market for safenetwork users’).

Eldership

It’s worth considering, do vaults actually want to be elders? Intuitively I’d say ‘yes’ since it’s the end result of a long process, so surely it must be good! But in reality, being an elder means a lot more work and demand on resources, but for what benefit to the vault operator?

If vault operators only want to earn safecoin, they’re better off not being elders so they can put all their resources into serving user requests and getting maximum chances to farm new safecoins. Any work done ‘managing the network’ (ie being an elder) is ‘wasted’ from a safecoin perspective.

So I think eldership must be viewed as a chore not a privilege. To me this is a very significant change in perspective with significant consequences through the entire incentive structure and the design of participation, rewards, punishment, security… the whole lot.

At the very least it’s worth considering that maybe being an elder isn’t desirable to all operators and it’s possible that many elders will immediately leave the network upon being promoted to avoid doing the work so they can get back to earning safecoin with minimum effort.

Proposals

My suggestions for potential ideas:

  • no disallow rule

  • relocate to anywhere on the network, not just neighbours (a very large and technical discussion needed here)

  • farm rate varies depending on the amount of work done (so elders have economic incentive to take on the extra work they must do. A complex discussion to be had here)

  • elders are assigned on a ‘rotating’ basis, so the ‘chore’ is distributed rather than retained, many possible mechanisms for this, but maybe turns out not to be a desirable feature…?

  • more discussion of punishment as a way to replace the disallow rule

  • there are so many potential paths, these are just some of many, but they are the ones I currently think are the leading contenders. I’m totally open and excited to explore other potential paths.

It’s obviously not great to have major rethinks of critical security algorithms. I would love it if the existing design could be made to work. But so far I’ve not been clever enough to smooth the rought bits into a habitable design. Hopefully further thinking and discussion can get us there.


#64

Ah just a quick point here Ian, this relocate is recursive, so if you see a neighbour that has a greater requirement than you then you relocate again. This requirement is up for debate (i.e. lower elder age etc. are all possible). Currently in master where we don’t relocate for age, but do relocate for security this is what happens. A node creates a key this key will be for initial group (lets call it X). Group X then decide on the actual joining group (lets call it Y). The nodes proxy is then informed to join Y and Y all send their connection info to the node. The nodes group Y is also recursive, so that node is also recursively relocated until it fits the best group (i,e, the group with less peers, only for now where we have no age).

It may not be so clear in the docs, but the relocate is not contained to neighbours, it is recursive, sorry for the confusion there. This recursion helps balance the network as we try to grow the network sections all at a similar pace.

[EDIT, above when I say you relocate, its not the node who gets to choose, but the groups being asked to relocate to, they will decide where the peer goes. The recursion goes along the network until the best location is found, best in the case is a group where none of its neighbours is better to accept the peer.]


#65

Ah, good clarification, thanks for pointing this out.

I’ve added this recursion to the relocation mechanism to the simulation.

The simulation defines ‘greater requirement’ as “prioritise sections with shorter prefixes and having less nodes to balance the network”. This may change in the future.

Tracking the number of neighbourhoods before the ‘best’ section is found gave these results (see neighbour_relocation_hops.go):

100K vaults (500K join interleaved with 400K depart)

hops occurances
   0 288
   1 220779
   2 39293
   3 41590

301950 total relocations

Most relocations are within the current neighbourhood (1 hop away, about 73% of the time). But there are about 27% of cases where the vault is relocated further than just the nearest neighbourhood (2 or 3 hops away).

Redoing Targeted Google Attack

The targeted google attack was retested with recursive-neighbour relocation (which should spread attacking vaults further afield and reduce their impact).

Targeted Recursive NetSize  Test 1       2       3       4       5 | Avg Pct
      No        No    100K  125776   98073  137955  103118   97975 | 52.7
     Yes        No    100K    1852    4893    2598    2366    3694 |  3.0
     Yes       Yes    100K    5631    4031    5514    3810    4494 |  4.5

An improvement for sure, but still the effect of concentrated targeted attack is quite significant.

Thanks again for clarifying the details of the relocation mechanism.


#66

I haven’t looked at the specifics of key-to-section allocation but isn’t it possible to make section targeting more expensive by forcing new vaults to incorporate some kind of proof-of-work in the middle of the key generation process?


#67

How would such targeting be carried out practically? I was under the impression that the network allocated a section to a new node at random?


#68

Is Avg Pct the percentage of the final network nodes which an attacker must have created to control one section?


#69

Yes I believe that it is. The relocation algo could suffer actually as @mav points out. Relocate to shortest prefix all of the time will help very young networks but would mean mostly local relocates when the network gets larger (due to much more balance). The trick here though is initial join, what we had/have is

  • a node creates a keypair,
  • a message goes to the group close to the key (X)
  • That key then is used along with agreed group data to create another (random) address.
  • The group close to the new address (Y) is then informed of a node joining
  • Group Y then accept the node if it had generated a key to fit Y.

So this method does something quite important. Rather than relocate moving nodes around the network randomly (which still may be a good thing to do) it means the node cannot target its first group. This is really important as otherwise an attack can be constructed fairly easily by a large player, even with node age.

@mav this may help your sim, but the results are interesting though to see relocate is pretty local and will be more so as the network grows. Perhaps there should be more randomness there. We will need to simulate that though as another attack is actually being to random, where an attacker waits on some of his nodes grouping together (randomly) if that makes sense. So diluting membership location is important.

Another important aspect is the key generated. If it is any key that fits a section then it can be done in a way to generate keys that will always split, say into the 1 section. That allows an attack that is more subtle, but can basically double the attackers chance of getting a group. That attack is weakened by asking the joining node to create an address within a range in the Y group. That range is not final but should be along the lines of the largest gap in the section members list (I mean we need to fill the section with nodes equidistant if possible, the relocate should force a joining peer to “fit” the range at the right height (of the binary tree)).

Thanks again @mav you are showing the importance of node age and relocate, that is neat from our perspective as it is a discussion we have internally, but so often some engineers feel/felt that it is an add on to worry about later and this shows how important this is. Nice one, you make my job much simpler with this stuff. :+1:


#70

Just to clarify, key generation isnt just a key generation already. It involves resource proof as well which involves some calculations(can get replaced to data based checks from dst section) and sending data to prove some X resource capability. All of which is required to be vetted by the target section before a node gets accepted to that section.


#71

Can give some context for this debate too. So with recursive relocation, its nice that a distribution is aimed and hopefully achieved but sometimes what can also be observed is multiple sections relocate a peer to the same target section and the target if just forced to accept can end up with a few peers that maybe dont help it Split and reach a higher prefix thereby creating a cluster of nodes which isnt too desirable either.

So recursive relocation itself is nice to disperse nodes, just hope to get it to a point where we dont accidentally swamp a section with a bunch of peers that dont help that section Split.

So in an ideal setup, you almost want to relocate peers to the section with highest req(recursively), only if that helps that section split(these nodes going into the adult/elder category of the target section), with nodes starting at age 4, the next relocation already makes them an adult(thereby being eligible to facilitate a split of the target section) which kind of bypasses this issue, however a window still exists where a section can be in the process of resource proof for multiple peers and is bombarded with new peers from other network sections that think it needs help.


#72

@mav: First, would like to say your work is really great.

Just would like to clarify that with the infant created with age 1, according to the data chain proposal (section of Dead block’s Trigger for voting), which means that once a section reached complete group status (i.e. the network grows to certain size), the churn of infant nodes won’t trigger any relocation (affects aging).
Will this affect the numbers you got in your original analysis?


#73

OK, please correct me if I’m wrong, but it seems to me like that resource proof happens after the vault has generated the key/name and tries to join the target section. If that’s the case then it’s still relatively inexpensive for a malicious vault to brute force a ‘vanity’ address that matches its indented target section and only perform the resource proof once.

A way to add proof-of-work into the key generation stage would be to consider the first n bits of the public address as a required pow, i.e. first 6 bits must be 0s for it to be a valid address, and only look at subsequent bits for section matching.

Again, I haven’t yet looked at the code, so apologies if this makes no sense or if it already works that way.


#74

This is similar actually. The resource prof sets the leading zeros and data size. The key the node generates is not the one it is allowed to use. That key gets to a group that then finds the new group the peer should generate a key for.


#75

It’s possible. My main concern is joining should be easier not harder. ie preventing a google attack should not come at the cost of preventing the participation of enthusiastic beginners.

See RFC-0030 Nodes Key As Name. Specifically step 2 - “Group (X) then take the 2 closest nodes to this [random key] and hash the three values (2 closest plus node).”

The targeting can be done by generating keys offline and testing the resulting section per step2 until a key is found that puts the vault in the desired section.

So it’s a bit more involved than just ‘generate a key for a section’ since it needs to also hash from that section to the desired new section based on the members of that section. Harder than finding a key for relocation, but still possible.

Yes. So targeting takes the required portion of the network an attacker needs from about 50% to about 5%.

This work very nicely if the random address is indeed random. But it’s proposed that it’s not random. It’s a hash of the attackers chosen pubkey + some section data. Since the attacker can choose their pubkey they can manipulate the outcome (assuming they can also know the section data). Importantly, the attacker can do the targeting entirely offline so it’s potentially very fast.

It mostly affects the viability (ie cost) rather than the possibility. I have assumed that targeting is possible, even if it’s a bit more fiddly than simple ‘vanity-gen’ style algorithm. Maybe that assumption turns out to be incorrect.

Yes I think this idea of relocating to ‘help create a split’ contributes well to the definition of ‘best’ section to relocate to. I like this idea.

I guess ‘helping prevent a merge’ is also a very high priority, especially since merges are often more complex and time-consuming than split so are less desirable.

My sim treats any departing node as a potential for relocation, so I need to update this to consider the difference of complete vs incomplete and elders vs all nodes. Thanks for the clarification. This will almost certainly affect the numbers, but probably for the worse since it results in less relocation.

I would love to see a proof-of-signature for this (I wrote about this a while ago on the dev forum) so the proof of work is related to real work to be done on the network. ie the supplied proof is to find a message that when signed with the keypair results in a signature with a certain desired property (eg X leading zeros).


The tests are starting to become incomparable so my focus for the short term is to build a standard set of metrics that can be used for meaningful comparisons between changes. Ideas for metrics are most welcome.


#76

Absolutely, the key here is the random data. There are a few ways to achieve that or at least sources of random data such as last valid block hash (in the data chain). This wont be agreed by the whole group, but should be agreed by the quorum (that is all that is needed) most of the time. This means that there can be errors when the group is out of sync (normal) and the joining peers key is then rejected (from group X). So it has to create another key and try again. It will not be 100% sure to get a key accepted, but should not be many retries, if any, in normal operation.

Previously we did use the closest 2 peers to the chosen address and hashed al of that with the key to get to the Y group. Its OK but not as random as it could be.


#77

I think this has to be a fundamental principle. Once you start down the path of making PoW in order to solve attacks then where does it stop.

Yes, any attacker can have a smaller number of very powerful number crunchers to solve things for the attacking NODES and since the NODES for the most part do not need the number crushing the ratio of number crunchers to NODES is small indeed. Another reason why PoW will not solve the security issue.

EDIT In times past the system was that a node joining a group (even after relocation) generated their key then the group it contacts then gives it another key thus making it harder to target a group. That seems to have changed now and is a loss in my mind. Even using a Fn of the node’s key + section data is making it easier to attack.


#78

Certainly but just not at the cost of the network having a negative effect facilitating the same. A balance here like everything else is key imo. Note I’m not referring to PoW or any arbitrary complication but just in general, its easier almost asking the question of how any X helps the network than what sounds nicer. Allowing everyone to join with ease while it sounds great might also not be the best for the network if these nodes didn’t help much but increase the hops for general operation. Thereby also helping prevent spam attacks and sorts…

I’m slightly confused with the targeting scenarios, so if we consider an example of a node being able to calculate where it’d be nice for it to join the network and generates the initial key accordingly to then get relocated say to a pre-calculated target Y section(if recursive relocation dst is computed too), not sure I get how this is affecting the network btw if this node still isnt an elder of the section and in this case is an infant(new node) and is going to then get relocated later on based on the churn events or so on … Lack of quorum and potential node age loss(thereby loosing elder privileges and so on if trying to restart) should still hold right? or are we discussing something else here in terms of not wanting even a single node to choose even its infant section if it so went about trying to do so?

EDIT:

Just had a chat with @dirvine there about ^^ and think I got what the discussion was about in terms of a balanced network and relocation knowledge being local based and thereby allowing a group of nodes to get relocated roughly grouped and allow for potential attacks while the nodes are ageing themselves.

While discussing that what we were thinking of is ofc see how likely such a thing would be via simulations, but for a moment if we consider it is likely and we want to avoid it even if just being safe, couple options ofcs exist wherein like the thread so far has explored use randomness to achieve the same.

So simply not have relocation based on just the section that needs a new node and range to relocate to, but also structure it such that when a relocation trigger say occurs(assuming for now its a churn event), use that Hash(churn-node-id-xor-name) as the dst section where the relocated node is being sent to, however when the rpc gets to this dst section, they can then do the balanced relocate themselves which if still happens local to them is still not hopefully deterministic from the src.

So in essence have balanced node relocation and a relocation range, however make the first hop of the relocation be based on the randomness(hash of churning node name) and then let the balance relocation kick in recursively which should hopefully spoil being able to compute rough dst locations after the initial join process.


#79

Thanks for your interest in all of this!

As I mentioned in another thread on the dev forum, your observation that the simulation doesn’t take off right now is entirely correct. This is because it is outdated, and doesn’t actually use the same rules as @mav’s simulation. We were aware that it needed more work, but other issues with higher priority appeared and updating the simulation has been postponed.

Since we are now ready to resume work on it, let me outline a rough plan for the nearest future here.

First, to address the problem that the initial section doesn’t split, we will modify the rule for splitting a bit. We were requiring enough elders and adults in the section, so that the new sections will each have a complete group - this will be enforced only if we already have a complete group in the current section, otherwise we will just require that the section has enough peers for both of the resulting section to have GROUP_SIZE+BUFFER peers (as opposed to peers with age > 4).

Second, and this will be a quick test - we will run our simulation to see what happens if peers join at age 4 instead of 1. I see that @mav already conducted such tests, but getting more independent results will always be beneficial :wink:

Then, when we have those results, we’ll see if we are satisfied with the way ageing works, or if we want some more changes to the rules in order to make the network operations more smooth.


#80

us satisfied? you’re a changed man in the new year @bart :wink: I’d suspect we certainly need additional relocation triggers even something like mutation operations being used accordingly for the same(not saying GET here cos its a free action that can get spammed too ofc) just so we dont end up with stalling sections and facilitate network growth.


#81

Thanks for your clarification and glad to hear what you had to say

To be clear this is what I was referring to.

There will always be a certain amount of work to do anything and the balance will mean we are not looking to PoW to solve security issues which should have a cleaner simpler solution.

One would expect of course there are minimum requirements. We just don’t want to see people joining the network but their NODE is at 50+% CPU (or GPU) just to prove they are a good guy and not an attacker. The easier is in terms of being able to use cheap hardware and SAFE using logic rather than PoW to “prove” genuine (or reduce effects of bad actors). Also that the Node is able to be farming in a reasonable time.