Node name proximity

This post is exploring what happens if nodes decide to relocate extremely close to other nodes, and how close they may be able to get. This may have implications for the security and economic design of the network.

Vaults are relocated to any name that’s within a specified target section (see routing::RelocateDetails).

From the keypair generation topic the expected rate of key generation is about 30K keys per second.

After one day a vault will have generated 2.6 billion possible new names. For a network with a million sections this vault would have about two thousand valid names for every section of the network.

So the vault can choose which name to use. How close do they want to be to existing vaults; maybe as far away from the other vaults as possible, or maybe as close to another as possible.

How does the operator choose one name from their thousand (or more) possible names?

The chart below gives an indication of the topic. A set of 10 random nodes (blue vertical lines) are in a section. The 20 red triangles indicate possible names my node has generated and could relocate to.

On the right-hand-side is a large gap in names, so I might pick a new name near the middle of that gap. This would have some certain effects for which chunks I be given and whether splits happen, etc.

On the left-hand-side are two existing nodes very close together, and I could have a name also very close to those two nodes. This would add a third node right on top of those two already very close nodes. Another slightly different effect on the section happens for chunk distribution, splits etc.

This chart is showing the effect of only twice as many names as nodes - the relocating node can choose within quite a wide variation of strategic analysis. In reality there will be thousands of names to pick from, so I could relocate my node virtually anywhere on the chart to suit my best benefit relative to the other node locations.

Let’s look at an extreme example to give an idea of where this can get funky.

The green lines are chunks. Perhaps a little too even for reality but this illustrates the point most clearly. The few blue nodes at the far right end of the chart are responsible for the majority of green chunks. The nodes in the middle of the dense blue clump of nodes are responsible only for a few green chunks.

If I pick a relocation name in the middle of the clump, I’d end up responsible for very few chunks.

If I picked a relocation name in the big gap, I’d end up responsible for a lot of chunks.

So the choice of relocation can (maybe) affect how many chunks I’d be storing (at least until a node near to my one moves).

This is interesting since it may be a way to use the side effect of relocation to also be a way to decide the vault size the operator would prefer, perhaps not precisely but to a degree.

The effects of various name choosing strategies are not obviously good or obviously bad, but maybe you can infer and extrapolate better than me on this topic. I’d be keen to hear your thoughts.

Size

There’s some degree of control over how many chunks a vault would have due to the density of other vaults around the relocated name.

Earning

Depending on how rewards work, it may be that vaults try to clump around particularly popular chunks, competing for it.

Network Health

A node is intended to be relocated to a specific section (or maybe a specific part of a section) depending which sections are the least healthy and most need the node.

However if the vault chooses to join the ‘most healthy’ part of the section rather than the ‘least healthy’ part of the section for relocation this may reduce the effectiveness of network health via relocating.

It may be possible to fix this by relocating to a specific xorname interval rather than a specific section.

Decentralisation

If clumping of nodes happens it may lead to large variation in vault size and capabilities, which may impact vault viability especially for smaller operators.

It may seem pointless to try to clump but it may be used by large operators to aid their other (non-clumped) nodes. I’m not sure of the possible magnitude of the effect of clumping or the potential for compounding the effect over time.

Accidents

It may be that a poorly coded vault sorts all new names and chooses the first available one, which would accidentally cause clumping at the very start of the section namespace.

Splitting

If clumping happens in only one part of the section it affects the intended split mechanic which is supposed to retain a fairly even balance of nodes and data on the network.

Optimisation

If a node can generate 1 billion keys per day (for example) which covers 30 bits of prefix then it makes relocating within a network of 20 bits prefix extremely trivial (possibly dangerous) and relocating within a network of 40 bits prefix extremely difficult. It seems odd that such an essential mechanic (ie the network structure) is dictated by such an arbitrary aspect (rate of key generation). This need for a ‘goldilocks’ network size based on the average rate of key generation seems a bit dubious to me.


I think it’s dangerous to allow nodes to pick their own name within a section. Perhaps it’s better to have a narrower target for relocation than the whole section?

I would also revisit using hierarchical deterministic names (as suggested a while ago in this dev forum topic Secure Random Relocation) as another option. Maybe there’s some flexibility allowed when choosing the next derived key, but not ‘total flexibility’ like the current brute force keypair/name solution.

It’s not that I can say the naming thing is dangerous because specifically X or Y or Z will happen, but it seems potentially dangerous because it allow various emergent behaviours that are not based on some underlying intended fairness but rather based on a side effect.

There doesn’t seem to be any immediate blazing red flags but since it touches on so many aspects of the security and economics of the network it seemed at least worth exploring a bit and seeing if anyone has other thoughts about this.

17 Likes

I thought, at least early on in the project, that the name was chosen for the node.

Allowing the node to choose the name is, as you demonstrate, fraught with danger, either from malicious action or accidental.

The other potential problem is that if you run many nodes then choosing positions near where a split off will occur then you may increase your chance of being an Elder.

7 Likes

What came to mind for me while reading your post was a similarity to blockchain mining. In Bitcoin, the problem was how to choose who can post the next block, and also how to control the rate at which this happens. Proof of work provides the randomness, and difficulty adjustment controls the rate. In SAFE, we have multiple parties (hopefully!) wanting to join the network/section, so it would seem similar randomizing and throttling are needed. Maybe by the elders dictating a target address for newcomers, along with an acceptable range from the target (akin to difficulty) these goals could be accomplished. This combined with a fixed vault size should be able to even out the load and data distribution across vaults.

3 Likes

This code exists and is a bit more tricky, but should be done. We are pushing to get a working network so cut some of this out, however it’s easy to put in when all tests are passing. It is also linked with promote/demote though, where right now all that changes is state Elder->Adult->Elder etc. What we should do is on promote / demote force the node to have a new name, also interval bounded.

So it’s in the pipeline. Elders need to be slightly different and treated a a whole section, so on split 50% go each way. It also makes the elder/adult containers more CRDT types and that is a valuable thing in terms of concurrent changes (mass churn).

14 Likes

Could a node not move in a useful way after some time?.. keep its kudos and make itself useful in the now better position. Above reading like a static prediction of future need. :thinking:

Edit: and a calc for that could be at reasonable interval consider if a move is worth the trouble; so, not necessarily every the interval that it might.

1 Like

What about making the net name completely assignable by the network? Im thinking of a separation of the crypto keys and the actual net name. I assume you’ve thought about that already, but what are the disadvantages or even impossibilites of it?

1 Like

Maybe the choice a node makes when they relocate can be used to judge that node:

  • relocating in a large gap (ie in a network-friendly way) means the node is a) high performance because they can generate a lot of keypairs and b) willing to use that performance in the best interest of the network

  • relocating in a small gap (ie not in the best interests of the network) means the node is either a) low performance because they are not able to generate many keypairs or b) acting in a way that isn’t in the best interest of the network and may be malicious or c) unlucky that they didn’t generate any suitable keypairs (unlikely!!).

I’m not sure if this judgement can lead to anything actionable but it’s an idea to consider.

It also highlights an interesting ambiguity of a more general nature - is a node incompetent or malicious? It’s not always going to be easy to differentiate.

5 Likes

I am wondering if the section causing the node to relocate cannot specify a portion of the a number of desired addresses and the node has to be within that range. How much work this will cause maybe an issue

1 Like

Even that smells of being open to abuse. A “large” gap is relative so a crafty attacker may still be able to run the borderline just passing any judgment checks while still relocating their nodes to “larger” small gaps closer to their malicious target, on average.

1 Like

I think I’ve mentioned this before, but if the network enforces a Poisson Disk sampling rule then vaults will be more or less evenly distributed.

6 Likes

I initially thought the relocation would need to change from a section prefix to a range of addresses, but realised today that simply a longer prefix achieves the same effect.

eg instead of relocating to section with prefix 0011 the node must relocate to a prefix 0011010 which would put the node in a specific part of that section.

So the code change would be really minimal. The hardest part is deciding which extended prefix to use.

Indeed. I think you touch on a broader point here of ‘what counts as section health’. No matter how it works it will need some sort of measurement. Which raises the question, is the overhead of measuring health worth it (compared to a lighter solution such as random relocation)? I feel that yes the overhead of measuring health is probably worth it, but it’s a good engineering challenge to try and quantify this.

6 Likes

Or evaluate if it is indeed possible looking forward? We know looking backwards a section is healthy enough if it’s still working. W may be able to work out it’s resource starved etc.

3 Likes