Step-by-step: the road to Fleming, 6: Disjoint sections

Time has now come to talk a little more about sharding in Fleming. It’s an interesting topic as it impacts many aspects.

Now the notion of Disjoint Sections is not a new one for Fleming, the original RFC is available at rfcs/0037-disjoint-groups.md at master · maidsafe/rfcs · GitHub and describes the fundamental concept pretty well, even though some parts are slightly out of date. Here we will briefly go over the similarities to Kademlia DHT.

Partition and structure the network

To bring some sort of structure to the nodes connecting to the Network, the name space is partitioned into disjoint sections. By that we mean that each address in the Network belongs to exactly one section where each section is defined by its bit-prefix. An example would be:

S(00), S(01), S(10), S(11)

It is a valid non-overlapping partition of the Network address space where all nodes in a section share the first set of bits.

In each Disjoint Section, nodes use PARSEC to agree on the order of Network events, making them in a sense a fundamental building block and a level of abstraction where you can start to effectively reason on what’s happening in the Network. Furthermore, communication patterns are highly influenced by this structure.

Kademlia

To understand messaging with Disjoint Sections it’s useful to consider some other well-established communication patterns, in particular Kademlia which is commonly used in other peer-to-peer systems.

Nodes that know about other nodes by having them in their routing table are referred to as connected. In a sense Kademlia defines how to connect nodes so that you traverse an XOR space in O(log(n)) steps while needing to maintain only O(log(n)) connections. This means that traversing the entire space takes about as many steps as the tree’s height, which is much less than the total number of nodes. For instance, a Network with millions of nodes can be traversed in well under 30 steps. It can be seen as a compromise between all nodes being connected to all other nodes, and all nodes being connected along a long line. In the first case the routing table scales linearly with network size but any message can traverse the Network in a constant number of steps, while for the line the routing table is constant but messages must visit a linear number of nodes on their way to their destination.


Example: Nodes connected along a line, and each node connected to all others.


Example: Kademlia centred around node 000.

From the perspective of node 000 the network is divided into 3 buckets, i=0,1,2, depending on the length of the common prefix. In each bucket up to k nodes are kept. In general:

  • Each node is connected to at least k nodes with i bits of common prefix. This means the routing table is O(k*log(n)) and knowledge of the network becomes “fuzzier” further away.
  • To reach a node far away jump to the node closest to the destination. Optionally use a route parameter to choose the route = 0,1,... closest node. This means it takes O(log(n)) hops to reach the destination.

Precisely which other nodes in a specific bucket a node is connected to is usually determined by past communication history, that is, the most recently encountered nodes are usually kept in the routing table.

To see how messages can be relayed let’s consider a concrete example. If node 000 is connected to 101 in bucket i=0, which in turn has the buckets given below


Figure: The three buckets of node 101.

Further assume that 101 is connected to 110, but not 111, in its bucket i=1. Then a message sent from 000 to 111 would take the path shown below.


Figure: Message relay across the network.

Disjoint Sections as a Kademlia DHT

For Disjoint Sections the structure bears many similarities to plain Kademlia, but adds further structure. As mentioned in the introduction, nodes are all grouped according to their prefix.

We require that for a node with section prefix pfx0, it is connected to all nodes in all sections pfx where pfx differs in at most one bit from pfx0. This means the Disjoint Sections satisfies the rules of Kademlia but further specifies exactly which nodes in the buckets to stay connected to.


Example: Disjoint Sections

In the example above the nodes in prefix 000 are connected to all nodes in prefixes 001, 010, and 100. These three sections are subsets of i=2,1,0 in the Kademlia example.

This for example affects how we can view message delivery. The arrows in the figure indicate how section 000 would send a message to 111. Using a route parameter we can always pass messages to the k-th closest (to the target) member in the next section and routes will not intersect in the absence of churn.

In Fleming, message delivery is a part that is undergoing heavy development and is likely to be further refined as it needs to be as reliable as possible, especially as sections constantly churn.

42 Likes

will there be any new post in this series or is this all for fleming?

We’re not 100% closing the door to writing new posts in this series if we’ve got something that jumps out and that we would particularly like to share, but this was the last topic we had planned to discuss for now, so it may be the last post in a while :wink:

14 Likes

Next is release of Fleming :star_struck:

4 Likes

How about resource checking and the actual process from the join node through to the nodes checking the ability of the joining node. There seems to be a invalid bottleneck and I’d like to know the procedure as its meant to be.

3 Likes

Just to be sure we’re on the same wave-length: are you asking about what we refer to as “resource proof”?

If that’s what you’re interested in, their could be some useful info in the routing_model diagrams: https://raw.githack.com/maidsafe/routing_model/master/index.html

If you start from the very last diagram, which is a sequence diagram, you can get an idea of the communication sequence we have in mind.

From then on, I would look for “Rpc::ResourceProof*” in the graphs to get more details on the context.

Arguably, that’s a bit of an advanced way to get the information, so feel free to give a shout if you think a nice cleaned up write-up that focuses on that and highlights the important bits is something you’d like to see :slight_smile:

3 Likes

Yes

Yes a write up would be nice if there is time to do it.

The issue is to do with the time taken to send the block of data to each node checking the joining node’s upload speed/capability. When it take over 60 seconds and less than 90 seconds to send 13 lots of nearly 20MB up a giga bit (120MByte/sec) network to essentially local within the data centre then I estimate there is a bottle neck somewhere. We are talking of it taking like 20 to 30 times as long as the transmission time to account for the overheads in the layers in the nodes. 5 times would be slow, and 20 to 30 times is too much for just delays in the datacentre routing etc. Especially when I can transfer larger amounts of data quicker across the internet from one data centre location to another datacentre location.

5 Likes

release at end of 2019 is realistic. Anything before that would be surprising.

Then I predict you will have a nice surprise! :grinning:

6 Likes

I’ve been here since the beginning, i know the difference between marketing and reality as i see in the code.

In a true Maidsafe fashion, I won’t give you a timeline; but noted :wink:

Uhm. Yes, I know why you’re bringing this up. It’s related to the issues you had on the community testnet, isn’t it?
We can definitely layout how we think it should work in theory and give some thoughts to what it would mean quantitatively in terms of resources for the user.

4 Likes

Yes, and if not tracked down then home users will not have a good time trying to join as a node.

3 Likes

Nice write up! Thanks!

6 Likes

Maybe it doesn’t fit in “the road to Fleming” category but I’d be keen to see some expansion on how parsec might be used for things other than network membership. Parsec is a really powerful tool and it’d be cool to see other examples of how it might be used.

5 Likes