From Metaquestions by David Irvine
This post is probably very brief and disingenuous for any mathematician out there, it is aimed at as many people as possible and I hope it comes across in an easy to understand way. We will see I suppose
I find myself explaining MaidSafe many times to many people and try to do so in a language and counting system they will understand. I ‘get it’ that people want this, so do I, but to understand MaidSafe it’s really not possible to think in our comfortable world of linear number lines and counting. The term used to describe what I am talking about is ‘non euclidean maths’. I dislike that a lot as its like calculus or similar, a term to scare people off. This is not necessary as it is way simpler than a confusing term. It is also way more simple if you immerse yourself in this type of maths, many people will not and never understand you, but it leads to many hours of quiet solitude and relaxation as a whole new world of possibilities opens up in front of you. I find that you can lose yourself in such counting systems, a bit like the Gauss clock counting base number systems, it’s very unusual and interesting. Not at all clever, but very different.
Down to business
MaidSafe is based on an XOR network at is root. This is not really the crux of what makes it different, but it seems to be what people focus on (“what if my computer goes off?”, “I can Sybil attack this easily!” etc.). I figured it was time to try and explain this as understanding issues like mutating data such as safecoin etc. are impossible without understanding this way of thinking.
So here is how we think.
0–1--2–3--4–5--6–7 (real number line)
And here is XOR
Immediately we will look and run our eyes along the bottom and think it’s our comfortable world again. It is far from it, we will see why soon.
XOR opens up huge possibilities and many unusual attributes (and I hope I do not sound like I am using a weird maths unintelligible language that makes people think this is clever, it is very easy, honestly). Many smarty pants people will tell you this is all irrelevant, but just think for a minute and realise you have moved on a level, understood a little more than the real number system. So lets looks at some different properties of this system:
1: Distance is not based on the normal number system, i.e. the distance from 4->3 is 7, but the distance from 2->3 is 1 (get a scientific calculator with xor in it and try).
2: Each node has a unique distance from every other number bar 1. (i.e. the distance from you to another is symmetric, but you share no distance from you to any other specific node with that partner, plus every other node is such a partner).
3: There is no straight line distance from any node to any other node (like points on a sphere, but no calculus, don’t worry)
4: Distance from any node to any other is almost the same as the height you need to climb the tree, you can verify by actually XORing the numbers if you wish (I do to check). Be aware though the last bit of the address is important, you are closer to the address ending in your last bit than the opposite of the last bit. I check the side of the tree in the last position, so if you are on the left you are closer to the node also on the left (if you look closely at the diagram you will see this) than the one on the right.
5: The other (maths) way to measure distance is to evaluate the most common leading bits of any number. (again you need to consider the last bit as if your address ends in a 0 you are closer to it than the address that ends in a 1)
6: Every number on this line sees a different network from any other number, so to see the network from a numbers point of view you need to become that number (after all its distances are unique to it, remember!).
A bit more interesting
Another thing we do as humans is revert immediately to straight line thinking, but additionally we also assume a fully populated network or tree as we picture above. If you stay in dream world a little longer and think what a sparse tree would look like then its eye opening.
1 - - - - 6 - - – -- 9-10-11-12-13------21----
So in the above tree we can see that its sparse, i.e. full of holes where nodes could be. This makes it more clear that if we know each node has a unique distance (to it) from every other node then it gets interesting. To make this clear, from any node all distances to every other node are unique, but the distance to any node is symmetrical, i.e. dist(A->B) == dist(B->A) but this distance will be unique to A and B, neither of them will share that distance with any other node on the network. There is a strange (sometimes called triangle property) property as well so [N:B ⊕ means XOR] ((A⊕B)⟺C)⟺(A⟺(B⊕C)) which basically means that if you XOR something with something you will get an answer. If you XOR the answer with any of the two inputs you get the other input as a result.
so 4⊕5 == 1 so 1 is the answer. If we then do 5⊕1 we get 4, or if we do 4⊕1 we would get 5. So this is a handy thing we use to obfuscate information at times. If the 5 is your info and 4 is a completely random number only you know and use once then you have a very powerful secrecy mechanism (if the 4 is the same length or larger than the 5 you have ‘perfect secrecy‘). On that note readers should take a look at self encryption for fun. Anyway I digress.
So XOR has some pretty strange peculiarities when you first come across it. We have listed only a few here, to whet your appetite to know more, but its this tricky and somewhat hidden set of attributes that has people looking at an XOR network and saying, “oh that’s obvious” etc. I thought so too and especially when I first came across Kademlia. That paper is very tricky, I read it and thought, simple enough let’s do this, then it became clear there was much more to this than met the eye. After digging it was also clear there were deficiencies in kadmelia networks that would be very hard to overcome. So on we go with our excursion into non linear counting methods and some really cool side effects. There is more info on some of these kademlia deficiencies (well from our perspective, for public only data its way good enough) here.
OK What about closeness
This concept takes the XOR thinking and expands on that a bit more, again with surprising side effects that do not seem to be noticed easily. In MaidSafe the notion of closeness is used to evaluate a nodes capability to make a decision on an action depending on the action type, data ID and some other parts (too much for this post, you can read about it here though). In other words closeness is important. So if we can assume a secured mechanism in place for address creation and maintenance (there is and its called PKI) then we can look further, normally you need to argue for days to get to this point if you are discussing MaidSafe so this is great progress.
If we imagine a close group is size 4 for now then look at this network
1 2 - 4 - - - 8 9 10 11
We can calculate the close groups, these are
for node 1 – 2, 4 & 8
for node 8 – 9,10,11
If we ask node 1 then node 8 is close to it, but if we ask node 8 then 1 is nowhere to be seen! We know that dist(1->8) == 9 and conversely the dist(8->1) is 9.
Wait! now look at what happens when a node appears.
1 2 - 4 - - 7 8 9 10 11
Now look again at the groups (and remember to stay away from linear thinking)
node 1 -2.4,7
node 8 -9,10,11 (not 7?)
now look at node 7 (4,2,1) its miles away from 8! (15 away actually), stay away from the dark side of linear counting at all times with this stuff, there be dragons
So there is a simple example and description of closeness, it is uni-directional or asymmetric and distance is bi-directional or symmetric.
A network of networks
First thing in MaidSafe to realise is that the address space is huge (2^512) in fact its way larger than all the atoms in the visible universe. The distances between nodes will be numbers so large they have no names. So these distances are vast, it makes little difference though. This ensures we do not run out of addresses and data locations to store (although in MaidSafe these are typed, i.e. we can have a node at position 001 and a data element of type 1 say at position 001, so not happy with this huge number we have gone a bit mad (or so it seems)) This basically means that we have multiple types in the same space, so message type1 has a 2^512 network, message type 2 has another 2^512 network etc. This can be thought of as each message / data type has its own network.
At this point it is easy to spot that every node sees a different distance from other nodes. We have kinda proved that above. Now this starts to hurt a wee bit, its like everyone sees their own rainbow, no two people ever see that same one (think, think). So from any node the network looks different. We also know that to measure distance from a node to another (especially closeness) we need to become that node. It’s not possible to do otherwise, in MaidSafe this means getting as many nodes around that point as we can (16) and then sort that list of nodes from the address we are interested in. Then we can tell which nodes are the closest. So we do not become that node per se, but we do look at the network from that nodes perspective, told you it was gonna hurt.
Right, so now we see we have a network of networks. Some still at this point will say “so what”, ignore them and continue.
The large interconnected Venn diagram appears
I am not good at maths, or in a traditional sense anyway. I see shapes and sort of dream myself into the problem, in this case the network and poke around it in my head, even while sleeping. It’s hard to do (for me anyway) and you almost get into a trance to really focus your mind on this. So I think Kademlia is like a semi organised collection of random nodes, this is indicated by the fact emule used to have 30% duplicate addresses and you cannot be sure you are close to data or addresses, you just search until you find stuff. It’s not good enough, we need more accuracy.
So we take our network of networks, our large Venn diagram and make a big change. We connect all nodes to the 16 closest nodes they know off, add in some more for uni directional interest and make the rest random across the address range (should be equally spaced in a perfect system, but it will not be). Now these nodes have great knowledge of each other and can spot when one changes, very quickly. The random part actually takes the shape of a kademlia routing table and each node monitors addresses to try to improve this table. This allows logarithmic search or in layman’s terms massively reduces the number of hops to a target.
When a node goes off or comes on the network it will affect its circle of closeness to it. It will also appear at a point in the network that means it should be doing something and looking after some small state (state is distributed evenly across all nodes). As no node has the same distances from any other node, then the state each node holds will be different from its peers. It will share some info, but its list will be unique. Each item will be on at least the number we have chosen as close groups (4 in our case).
So a node going off will affect the close part to it. The whole network is reconfigured without any other node far away knowing it is until they look. No node can look everywhere so the equilibrium of control establishes extremely fast. No need to transmit messages all over the place to let others know what’s happening, they do not need to know and can do nothing about it anyway.
I hope this very brief foray will be helpful, it’s a bit of a head dump, so may contain some info that could be better, it will be too simple for some. My intent is that anyone can read this and begin to understand the journey I hope to take this blog series