Open questions on the security properties offered by the XOR Space closeness relationship

I am writing a paper on ways to securely group nodes in XOR Space to ensure reliability of the system in the presence of malicious nodes. Since the grouping of nodes is such a fundamental operation of the SAFE network, that should help the analysis of the security properties of the entire system.

My main focus has been on formalizing abstractions that help building such a system and proving a variation of the algorithms MaidSafe is using to implement them (variant which are chosen because they have a concise implementation and are relatively easy to reason about but might not be the most performant or be easiest to implement in a particular language since the goal is first to show that properties of XOR Space can indeed be leveraged for building secure decentralized systems). As part of that analysis, I am now getting to the point in which I need to show it is indeed “hard” for an attacker to obtain a close group of identifiers. I want to obtain security properties in terms of minimal bounds on computing resources (computing power, number of IP addresses, etc.) required to disrupt a close group as a function of the size of the network. That should end up being analogous to the proof-of-work security property of the Blockchain that supports Bitcoin.

I believe some of you might have the required math background and inclination to do that probably faster than myself so I am looking for your contribution if that is the case. Should you end up with good answers to the following questions, please publish them such that they can be reviewed and built upon by others. I can totally help with reviewing one you might want to publish yourself. I would also be willing to add you as coauthor on my own paper (or another one).

Here is my attempt at formulating the problem and a first set of related open questions. I will add more as I think things through:

Problem

Suppose you have an identifier space I made of z-bit identifiers (ex: z=512 for the current implementation) from 0 to (2**z)-1. Suppose you compute the distance between them using the bitwise exclusive-or (XOR) operation. Suppose you have an arbitrarily sized set of nodes N each with unique identifiers taken from I. Suppose nodes are grouped together in different subsets of k nodes that are each XOR-closest to a different identifier i of I.

For every identifier i in I , you can compute the k-closest group of nodes by computing the XOR distance between every node n in N , sort them according to distance from shortest to longest and pick the k first.

Now suppose the identifiers are obtained from a random oracle R that guarantees identifiers are uniformly distributed through I and unique. Generating an identifier takes a time t that can be chosen a priori between 0 (instantaneous) and Infinity (impossible to generate).

Once generated, an identifier i can be associated to a node n using an oracle M by doing M.add(n,i). A node can only have a single identifier i and an identifier can belong only to one node. M.node(i), gives you the node n that corresponds to an identifier i. M.id(n) gives you the identifier i that corresponds to n.

Questions

In every case, the answer should be a function of the size of N, #N .

1 Let’s pick a random node n from N and get its identifier i = M.id(n).

1a What is the probability of drawing a new identifier i’ that is closest to i ?

Note, the probability of picking an identifier i’ that is closest to any other node’s identifier is trivially 1 because there is always one closest so that is not really interesting.

1b What is the probability of drawing a new identifier i’ that is within the k-closest nodes to i ?

1c What is the probability of drawing u different identifiers that are within the k-closest to i ?

1d What is the probability of drawing u different identifiers that are within the k-closest to any other node identifier?

2a-d What is the average number of draws necessary to obtain similar outcomes as for 1a-d. What is the variance around that?

3 Suppose that to happen at a particular identifier i, an operation needs the approval of at least l/k nodes to succeed (in the current implementation l=28 and k=32).

3a How many new identifiers would need to be generated close to i?

3b What is the probability of obtaining that many identifiers?

4 What is the minimal size of the network, #N, that makes the attacks described in 1a-d impractical,

4a given a time t of 10 minutes? (the amount of time it takes to generate a block on the block chain)
4b given a time t of 1/10**12 seconds? (approximately the latency of computing a hash using a modern miner ASIC chip)
4c given a time t sufficiently high to tolerate foreseeable improvements in hashing hardware technologies?
4d given a time t well beyond any foreseeable hashing improvements?

5 Would allowing a node to have multiple identifiers simultaneously change any of the results obtained previously?

2 Likes