RFC: Disjoint Groups

Mitosis! Nifty, need to read rfc yet

4 Likes

Great job @AndreasF seems very in depth (out of mine a bit unfortunately :smile:) but it sounds like a great place to start for messages and paving the way for data chains! Sounds like the routing and data loss are the issues correct? But the latter can be supplemented by data chains itself

1 Like

Looks good… though looking at it reminds me of donkeys equidistant between two equal bales of hay.

1 Like

Thanks!
The main motivation is simply that data chains and other planned features are way easier to implement if we change the group definition.

BTW, the current version of the RFC is here: rfcs/0037-disjoint-groups.md at master · maidsafe/rfcs · GitHub

7 Likes

Nice work! So if I understand well, there’s no need for every node to calculate the routing table from the perspective of the other nodes in the group? In the current case a node needs to calculate 8 X 8 different routing tables and decide who’s close to what isn’t it? In this new way it’s just 8 (or more) close nodes. So when I want to PUT some data, I get quorum size of signs, attach them to my chunk and pick a random node from the group (probably a faster one) to route my chunk forward?? And the next group won’t sign anything at all, they just pass my chunk with the signs to the next group?

no need for every node to calculate the routing table from the perspective of the other nodes in the group?

With the current approach that’s not even possible: If you’re in my routing table, I don’t know every node that’s in your routing table. Only for the groups I’m a member of can I calculate who the other members are. So if I get a Put request and I am in the close group of the data chunk, then I know who else needs to handle it.

And the next group won’t sign anything at all, they just pass my chunk with the signs to the next group?

That’s what the current implementation does and it’s not secure.

The RFC suggests that the next group sign it, too: If a group A wants to send something to a group C that it’s not connected to, so it has to send it via group B, then B would check A’s signatures, then sign it itself, and C would check B’s signatures. That’s because C doesn’t even know the members of A.

Possibly with data chains, in some cases we could omit the signing in between. Instead, A could include an old message signed by an old group D (which doesn’t even necessarily exist anymore) saying that A is legitimate and what its members are. If C knew D, too, that would suffice.

5 Likes

Some questions…

A group must have at least GROUP_SIZE members (that are fully connected i.e. satisfy the routing table invariant).

So this can be set by the devs I guess? When it’s decided to be 8 it’s gonna be 8? Is there any testing on this? Like the most ideal number to form a GROUP_SIZE?

The quorum cannot be a constant anymore, due to varying group sizes. It needs to be a percentage strictly greater than 50% instead, and in a group of size n, a number x of nodes will constitute a quorum if x / n >= QUORUM.

Let’s assume the GROUP_SIZE is 8. This means that with 62,5% QUORUM needed exactly 5 nodes need to sign a message. They do this by sending their signs to the closest GROUP (to a target) in the next GROUP. So I’m in GROUP A and want to reach GROUP B with a message. This means GROUP B will accumulate the signs until it has received 5 valid signs for an action? So a GROUP is only as fast as the 62,5% fastest nodes of that GROUP. Is that right?

And the maximum size before a group splits in this case (GROUP_SIZE 8) could be 18 before the split occurs?

And what happens when I PUT data to the network. let’s say the destination group has 5 signs allowing me to PUT a chunk to the network. Can I pick a faster node out of the routing table to PUT the data to the network? Even while it might not be the shortest route according to XOR? Or am I forced to used the shortest XOR route?

Does it make it easier to partition the network if each node is only a member of one close group? Couldn’t you perform a denial of service attack on all the members of the group pretty easily?

To perform a denial of service you need to find out what their IP-addresses are. That’s as good as impossible. There are probably several groups between you and the group you want to target. So the only group you might attack is your own group and/or group that you are close to. At the same time you don’t know which chunks from which files they store. And these chunks might be in cache all over the network. So especially when you request one of their chunks, caching jumps in. So quite hard to target a group I think.

Ah, I don’t have a sense of The Safe Network topology yet.

My conception of The Safe Network in comparison to Tor, is that TSN doesn’t work to obscure where requests originate or terminate, but only the data that is sent.

TOR is connecting you to regular websites on the WWW and hidden TOR sites. SAFE only connects to SAFE websites. No central servers (like with TOR) needed. It’s fully P2P without trackers, servers or anything else. So I want to upload data, it’s routed over nodes from several other users and it ends up encrypted and obfuscated on someone’s harddrive. That person doesn’t have a clue what’s in there. For data integrity it’s also stored on several other nodes in case some connection goes offline.

1 Like

Is there any testing on this? Like the most ideal number to form a GROUP_SIZE?

There will be! But that will only make sense once Disjoint Groups and Data Chains are implemented. Probably the criterion won’t be just a number in the end. It will be more like: If the group has three very good nodes (have all the data, have been reliable for a long time, etc.) in each of the two halves, then it can be split.

So a GROUP is only as fast as the 62,5% fastest nodes of that GROUP. Is that right?

Yes, but that affects only the part of the message that needs to be sent in a secure way, i.e. usually just a tiny hash value. The message itself will still take only a single route, and that gives us the opportunity to optimise for speed later.

And it won’t affect Get requests for immutable data at all, for example, which are sent from a client directly to a group. For those, it’s enough if one group member responds, because the data can be verified with its hash alone.

Can I pick a faster node out of the routing table to PUT the data to the network?

The client can’t pick a route or an individual node for storage, and Put requests will need to be cryptographically verified. However, that verification will only be the hash, and the exact route the full message takes is something the network could potentially optimise by itself: The new group definition leaves a lot of freedom when choosing the route, as it determines only which group each step belongs to. We could then choose the fastest (least busy, most reliable, etc.) member of that group.

Couldn’t you perform a denial of service attack on all the members of the group pretty easily?

I don’t think the change in group definition makes a difference for denial of service attacks. If anything, it will make it easier to identify misbehaving (e.g. spamming) nodes and disconnect from them. Also keep in mind that nodes won’t just accept a connection from anyone, so an attacker can’t just - even if they know their IP addresses - connect to all members of a group.

5 Likes

The problem is not the misbehaving nodes (dos) but the possible external assault (ddos) that is possible with the knowledge of the complete group. If every node is just one small part of a larger group and does not know all nodes in said group, then such an attack becomes impossible.

Yes, that is a separate problem, but it is an issue in both cases: With the current approach (no Disjoint Groups), every node also knows the IPs of several complete groups.

It will likely need to be addressed in several places (crust, routing, safe_vault …). E.g. with Data Chains, some of the affected nodes will be able to restart when they are ddossed, possibly with new IP addresses but the same node names. The network will accept only the most heavily invested nodes (those that have already done a lot of legitimate work on the network/are storing a lot of data) back in the same place.

6 Likes

I was curious to know the new and more efficient message routing with Disjoint Groups.

Here we have it …

Message routing

When learning about a change in any neighbouring group B (new member, group split or merge), each node in a group A signs the sorted new list of members of B and sends it to each member of A, so every node in A has a signature from each member of A, for the list of current members of B.

A new message variant HopMessageSignature is added, containing a hash and a signature. The HopMessage::signature field is replaced by signatures: Vec, and a new field hop_signatures is added, containing a list of signed group lists. The message is considered valid if each list in hop_signatures has valid signatures from at least a quroum of the entries of the next list, and the signatures itself has signatures (signing the message itself) by
a quorum of the entries of hop_signatures[0], if the source authority is a group, or
one entry of hop_signatures[0], corresponding to the source, if the source authority is a single node.

If a group sends a message, only the route-th node sends the full HopMessage, and the others only a HopMessageSignature. The recipient keeps the full message in cache until it has collected a quorum of signatures. Then it continues relaying the message as detailed below. If an individual node sends a message, it sends the full HopMessage with only its own signature.

To relay a message on a given route in a node n in group G(p) for a destination d:
Push the signed list of the previous hop’s group members on hop_signatures and drop the message if it is not valid. Then:
If n is a recipient, verify the signatures and handle the message.
If d is a single node in our routing table, relay it directly to d.
If d is a group in our routing table, relay it everyone in d.
Otherwise relay the message to the route-th closest entry to d in our routing table.

For any groups A and B, either everyone in A is closer to d than everyone in B, or vice versa. Therefore, the group that we relay the message to doesn’t depend on the route. Since everyone in our group knows everyone in the next hop’s group, this means that in the next attempt route + 1, a different member of B will receive the message. Unless there is churn in between the attempts, this guarantees that all routes are disjoint.

4 Likes

I’m guessing there’s a few sceptics watching this one…but I’m reminded of a @dirvine quote “Nice problem, but we will solve it”

4 Likes

Yes an Engineer’s quote. :grin:

Up there with “an Engineer can do for 10 cents what any fool can do for 1 dollar”.

1 Like

Nice, didn’t realize that, David does talk of himself as an engineer and so makes sense.

If every kid came out of school with the tools to be an engineer, we’d be in good hands I reckon.

2 Likes

Unfortunately the term engineer is devalued in our culture, and poorly understood except by engineers.

If Engineer was correctly positioned it would be aligned with Doctor/Consultant, Lawyer/Judge, Academic/Professor etc, but we are told a telephone engineer is someone who comes to our house and plugs a modern into the wall etc.

:frowning:

11 Likes

That’s probably a modulator-demodulator, still plugged in? But this interests me, the Yanks did this to you? Engineering this perfectly usable job title? Or the British Empire could not help itself and did end up one word short?

Comes as a surprise as for instance the Dutch language classifies a telephone engineer as “telefoonmonteur” and reserves “ingenieur” for the coveted bridge building professionals.

David was recounting how his engineering career came about a while ago and to me it did sound like a very natural progression into what he is pursuing now, even if it did extend to working as a mechanical engineer in a glass factory and partaking in rescue missions at sea.

1 Like