SAFE Network concerns from an old employee

the way i see it, the testing and alpha release and up and coming beta will show us weather we have any issues or not, it seems to be working very well up to now, the code will need to be fine tuned as time goes by, I don’t see any major issues in the concept that would cause any huge rethink or delay, however theoretical arguements are always good to make sure mistakes are avoided. As long and the testing continues to work as expected then these theorys will remain unproven.

Wait a second, you made the following statement some time ago:

So you said that people can target an address when they have “the most computing power” and when I point out that this isn’t the case as one can’t choose it’s own address in the network you ask me for an algorithm? So you didn’t had a clue in the first place then? Just firing some shots around without knowing how this stuff really works? Here’s a diagram by the way.

No they can’t. The network operates on XOR-level as you know. It doesn’t really care about IP’s. So when a datacentre goes down with 1000 nodes, chances are extremely big these nodes are in 1000 different groups. So if each group has 12 nodes active, they all go from 12 to 11 and there is no problem at all. The 1000 nodes can’t do a thing on their own as they’re removed from the groups they were in.

Group Consensus is one of the basics of the SAFE network.

9 Likes

100% here, even if not correct they can turn up snippets for consideration and that is good. We try (very hard) to keep the team focussed on launch right now though so when we get time these are valuable, but probably in the dev forum where they can get deeper into the issues I think.

Thought experiments are always good, but need balance to look at “what if” scenarios such as with partitioning and how CAP/Any acronym relates etc. When these last features and RFC’s are all in place we will be really desperate for such interventions from everyone. It may require we move docs on though to give folk a chance to speak with authority in these areas that helps us all. It’s the confusion over how things work that causes huge extra effort. That confusion wastes time and also reduces security. As we add RFC’s this clears up (they are very good to see how things work) and this helps security. Be great if folk commented there but sometimes things come up later. Then hopefully at least forum members chat here first, but we can only hope that happens.

10 Likes

1.- Sending multiple POST request returns MutationError.

2.-How do you know that the quorum in the group is lowered? How do you know what IP must DDoS? To do that you must belong and have the resource in the same group. What a coincidence!!!

3.-Presuppose that certain nodes are inaccessible and lost the quorum messages to make the post and change the data, but, suddenly, become accessible and sufficient in number to have fundamental influence and all that without there been any Churn. Another coincidence!!!

4.-And the data are not equally valid. Even in this almost imposible case you have always the version data and, between two choices, the nodes only need to choose the last version.

2 Likes

This all sounds very reassuring and logical from my understanding of the architecture. Unless there is a net split of global proportions, where the internet literally splits into two, huge, networks, there should be no issue - the smaller splinters just drop out of the group otherwise.

Hypothetically, if there was a split of global magnitude, would the network still recover currently, I wonder? Both sides would indeed think they are the source of truth and when the network connectivity recovers, presumably there would be address clashes. Does the largest group take precedence in this case? Could data chains be used to identify the most active group and chose it?

Ofc, blockchains suffer the same issue at a global level, with the biggest chain prevailing. Bitcoin would also lose data in this scenario too, so there is nothing new there.

Moreover, on a blockchain, all transactions during the split would be affected, where as safe net improves on this, by choosing the best source of truth on a group by group basis. This would result in the best data being taken from both sides of the split, which would be merged to represent the best combined network again.

With this in mind, surely the primary issue is how data is retained when there are too few nodes in a group to maintain quorum. I see that data chains will help to resolve this though, allowing the data to come online again, after previously disappearing (as what maybe the case on test 8).

6 Likes

Sure, but this is nothing new. @vtnerd is talking about the merge of groups. That’s what my reply was about.

[newer document][3] mentions group merging, but does not describe how
groups with different states will be resolved.

It links to the disjoint groups RFC. So he wasn’t talking about a split of the network, he talks about a group_split due to churn. And he states that the churned nodes and the nodes left on the network have a different “state” in the network. That would mean that the churned nodes could do whatever they like. But as I explained in my earlier reply this isn’t the case. All nodes in Groups A1 are connected to Group A2 for example and/or other close groups. There can’t be a split of quorum where Group A1 accepts quorum messages from a split version of Group A2churn and A2real. I explained it here.

And in his last reply he addressed these issues by saying:

So while the discussion was about a split of groups now we switched to talking about a split of the network?? After a datacentre went offline?? How is that even possible? A split of the network is far different from the merge and split of a group. And stating at the same time that you “strongly dislike the current group design that never tries to achieve consensus with other members of the group.” doesn’t make any case any stronger as the whole network is based on group consensus.

So we could have conversations like this with technical details all over the place but is doesn’t really make things more clear. It actually makes stuff more complicated. That doesn’t mean I’m not open to new or different ideas (I’m still learning about this stuff as well) but I think it would be a better idea to take an RFC (the disjoint groups for example) and dive really deep into it. And next make a reply explaining which parts for which reason will or will not work. Otherwise we’ll have discussions that are a bit about a network split and a bit about a group split and more.

EDIT: The churned nodes actually have a different state: they’re offline

5 Likes

edit: botched grammar again

If the network splits in any meaningful way, its possible for groups to split as well. There are scenarios where there are multiple groups for the same original resource. Adding an adversarial to the mix can complicate things, because it can be difficult to establish what the “real” SAFE network is in these scenarios.

I cannot recall seeing any code where the members of group agree on an order of messages received. If I missed it, then just provide a link or name the algorithm being used. The best I can tell you are going to use the “chain” as a means of determining order from the client. This does not solve the two writer case (i.e. a desktop and phone client writing to the same directory). Although this was a clever solution to the back-to-back single writer case + churn problem.

An attacker joins the network in a legitimate fashion. Then on a GET request it receives to relay, it replies with its own fake group. If the client / proxy sent the GET to multiple people, the network will send responses from the legitimate group and the fake group. Which is the right one? It cannot be the “closest”, because an attacker with enough CPU power will generate the longest prefix. There are likely some obfuscation techniques, but they get difficult to defend against at the network edges closest to the victim.

The lab environment will be easier unless someone is truly trying some adversarial cases.

In the naive implementation both sides think they are the SAFE network.

This error message does not provide state information for every member of the group.

A read call to the group returns the state of each node, indicating the ratio. If a node communicates through the proxy, that does not necessarily help because then you would DDoS the proxy specifically. The path to the proxy or node is determined by each path with the lowest latency (i.e. send a read to multiple connections, determine the latency, ask for a list of nodes connected to the lowest latency connection. repeat. could try multiple paths to help this heuristically). Its difficult to entirely obfuscate a connection if the network is trying to achieve lowest latency.

I never assumed anyone would lose quorum messages. I assumed that the group was not communicating with each other to coordinate writes. Two messages sent from two different clients could be received in a different order by each member of the group.

I am suggesting a fork in the history. Some nodes have one fork, and another group has a different fork. It is not obvious which fork is the valid one once quorum drops.

It depends on the implementation details. In many cases, the largest group should have a higher probability of taking precedence since it is more likely to have the closest quorum-sized nodes agreeing on the same value.

This does not make any sense. It would imply that you can merge two possible owners of a coin. This is why I keep talking about write inconsistencies, attackers want to use this anti-feature to their advantage. I mentioned the CAP theory because unless you manage to invalidate it, the choice between availability and consistency is tough. Permanently locking a resource is scary, and so are write inconsistencies.

Bitcoin choice write inconsistencies, and found a (fairly) simple model to defend against it - an attacker needs more CPU time than the remainder of the network combined. The attacker cannot easily “merge” a fake network ontop, because it will be rejected. An attacker can try to “surround” a person connecting to the network in an attempt to provide false information. However, the block height and hash provide a quick verification method (call a friend, what block number and hash?, etc.). It can also act as a trip-wire, if the block interval drops significantly then an investigation should be done to ensure you are still on the correct “network”. Validating peers is whole other long discussion.

This is not about churn. Two writers.

The discussion is not all over the place. The situation is closely related - both result in a legitimate nodes having a fork of the history. Typically this is called partitioning with databases, and is a more difficult to do than the attack I just described.

But where did you got this idea about the network??

I’m a bit surprised to read this to be honest. Feel free to clear it up.

2 Likes

That’s how the Kademlia DHT is used in Bittorrent, but as far as I know SAFE uses it as a routing layer, not for node lookup.

2 Likes

Please forgive if I am just adding to the noise, but I assume at a global scale, they become the same thing.

Splitting a group down the middle could lead to two groups which are unaware of one another, while both having sufficient nodes to form a valid group.

As I understand it, this could only happen with a huge split, as otherwise the splinters would be too small to maintain their own group. That is, here would be insufficient nodes to form a quorum.

The network topology will differ from each client. If two clients write to the same group at the same time, each machine in the group can receive the writes in different order. This cannot be resolved on the client side because both requests were issued at the same time. The remote side has to agree or “come to a consensus” on what happened first. I cannot recall an implementation to handle this situation, or a discussion of an algorithm that would be used to do it. Solving this problem adds lots of complexity. Asking people to never make two writes from different clients seems inappropriate given the security implications outlined above.

That was an algorithm for finding a node on the network. The latency differences in the responses forwarded by your peers will indicate the quickest route, and done recursively should get you close to the origin of the message.

What you are “truly” doing is finding the shortest physical path between you and the desired node with this technique.

Correct, and it could be interesting to think about resource “locking”. The partition problem might be rare, but node churn is more likely. Locking when quorum drops means you risk dropping quorum through churn too.

I don’t mean merge the forked groups - I mean in each case of conflict, the best group can be chosen as the source of truth.

To put it another way, we don’t need to take everything from one side of the fork or the other (like the longest chain on a blockchain). Instead, the best group can be chosen in each case, no matter which side of the fork they resided. Therefore, conflicts are likely to be localised to specific data items, as held by groups which have applied conflicting changes.

Unless you can fake a group or force entry of many nodes in to a group (a bigger problem regardless of splits), I don’t see how an attacker can influence the decision over which group is chosen to be the source of truth?

No, please add to the noise, makes me think :thinking:. Let’s say we have 1 million groups with 15 nodes on average. Now we have several big fiber cables attacked by sharks. The whole network splits. This means the groups will split as well, but this is where the data chain comes in. As all groups have these data chain (even parts beyond their own group) they could rebuild the network from scratch. But this is different from a simple churn where a computer or even a network with 1000 nodes falls away. These nodes are all in different groups (1000 nodes spread over 1 million groups) and when they lose connection they’re no longer part of the quorum of a group. They became just single nodes that where kicked out as they didn’t reply to requests from the other nodes in the group. An Archive Node might be able to join that same old group but it takes time to get back in.

3 Likes

I think that this…

… is/should not usually be accepted. Being able to stroll through the DHT like that seems a security risk to me. Other nodes should only share IP addresses of nodes that are close to you, which they can check. This should only be necessary due to churn or node relocation on joining.

2 Likes

The problem is, we will have 2 data chains, sharing a common long term history, but a different recent past. Which is the best source of truth, when both have conflicting data chains?

I think in these situations, there may be data loss from one side of the fork or the other, just as is the case with blockchains. We have little choice but to pick a winner and disregard the changes made by the loser.

Perhaps there would be a possibility of merging 2 data chains (i.e. transactions made by forked groups), where there were no conflicts too. Data chains should definitely give us a sequence of transactions on both sides of the fork, many of which may not conflict.

Either way, I would like to reiterate that blockchains have this same problem, but a side of the fork has to be picked (the longest blockchain).

Perhaps an interesting question is whether CPU hashing a fake longer chain is easier than faking/overwhelming a safe net group to ‘simulate’ a split, to change history. My understanding is that this cannot be done without enormous resources on safe net (you need many more nodes than the entire network), but it would be interesting to dwell on.

I think @vtnerd is spending enough time here to warrant a proposal on how to fix the issues he so clearly sees. He seems to understand this at a very deep level and is willing to defend his position that something is broken but what are his thoughts on a solution?? @vtnerd tell me with all of your brilliance, how you would fix the issue you have addressed?

1 Like

@vtnerd

Lee I said this

And you replied with this, which looks very much unrelated ?

However,

In MaidSafe all messages and events are always agreed and designed for out of order delivery, this has always been the case.

If you can answer the questions I did ask it will help an awful lot, I asked this to get specific and find any errors or flaws. It is an Engineering approach and should be specific and based on the design and implementation.

  1. If you can outline how group consensus works and churn is handled then show which part you believe is incorrect it would be really helpful?

  2. Separately then do the same for node joining (which is currently unrestricted but does force distribution).

15 Likes

Yep. Thats a fact.

“seems” is the operative word.

suggesting brilliance is not in the Maid camp?

I think the invitation is well intended but short sighted.

1 Like

@BIGbtc Perhaps I should have added /sarcasm? I thought it was implied :smile: but at the same time it is a challenge. I would love to hear a solution.

7 Likes

If this guy asks for code one more time I’m going to hang myself. :tired_face: He was directed to the code several times already. What does he expect during such a frantic development cycle? For someone to hold his hand and take him to the exact code segment for each of his concerns?

@dirvine has invested enough time explaining the concept at a higher level. The least you @vtnerd could do is review the code for yourself before returning with more baseless criticisms. Anything else is lazy and suspect.

How many more times are you going to return with these assumptions? It’s almost as if you hope we will get tired of responding to your empty claims so that this supposed architectural blemish remains on this forum for everyone (VC’s etc) to see.

If you can’t be bothered to review the code I see little other reason for you to doing what you have been these last few week with regards to this project. Read the code, point out the flaws and suggest solutions if you can. If you keep dodging this productive procedure we’ll all know your true intentions. Please don’t play games with us.

6 Likes