XOR address vulnerability

When for example Google secures data it replicates copies to several separate datacenters located at vastly different geographical locations.

As I understand it, in the Safe network data is instead distributed according to XOR addressing without any relation to geographical location. This means that the probability for data copies for chunks in the Safe network to become stored in the same geographical area is high, thereby there is a risk of data getting lost when enough farmers in a geographical area go down.

I’m interested in how you come to this conclusion. Could you explain?

1 Like

I recognized a difference between how big cloud storage providers deliberately spread redundancy into data centers located at different geographical locations. This ensures that the copies of data always are spread out into different geographical areas.

With XOR addressing on the other hand there is a significant probability that copies will be stored at the same geographical locations. This creates an attack vector as I see it plus risk of losing data. For example if the U.S. or China wants to stop the Safe network they could set up a large data center with lots of farming. When the data center starts to get filled with billions of chunks there will be some copies of data chunks all stored in that same data center. Then in order to harm the Safe network the data center is shut down along with all the data chunks stored in it.

But when you say the probability is high, how are you coming to that conclusion? What’s it based on?

Because if addressing allows for random distribution, and there are say 8 duplicates of each chunk, then you start to see exactly how much of the network an attacker would have to control for this to be a possibility.

I think there are four duplicates used in the Safe network. And let’s say that a large data center manages to gather 1% of all data chunks in the entire Safe network. Then there are lots of chunks which have all four duplicates stored in that one data center.

In contrast, a cloud service can make sure that all the duplicates are never stored at the same location.

I just read this topic and while the probability is low, it is almost a certainty with replication (now) at 4 that a region will have all the copies stored in its nodes.

Imagine China closing its internet to only a whitelist. Or maybe shutting it off completely for a period. China potentially with approx 18% of the world’s population, but frowned upon usage of SAFE, could still have 4 or 5% of the nodes. The mathematics for that would consider the world having say 25 regions and one region being shut off.

The maths would be :-
One in 25^3 chunks will have all their copies in any one of the 25 regions. And
One in 25^4 chunks will have all their copies in a specific region like China.

I verified this is consistent with simulating this over numerous number of regions.

Now if you looked at individual nodes then the maths is still the same except the number used is way higher. Like for the world with 1 million nodes say and for a specific node its one in 10^6^4 == one in 10^10 chunks will end up in one specific node. Or for any one node its one in 10^9 chunks stored.
EDIT: this can only be significant if a single node can be holding 10^9 chunks (1PB)

That works out to one in 100^4 (100,000,000) chunks stored.

But that is asking a lot for a data centre in a single location (building even) to have 1% of the nodes in a decent sized network with say 1 million nodes and less possible if 10 million nodes.

1 Like

Is there the possibility of elders, when admitting new nodes, to use IP addresses to gain the greatest geo-diversity in a section? Of course use of proxies would stymie that, but maybe a way to mitigate and reduce odds of lost data even further.

1 Like

Maybe if you used it for the XOR address of the node so that the node receives certain chunks, and then elders make sure the 4 nodes used are geolocated with diversity. Otherwise the complexity in general operation increases too much to be worth it.

But of course this potentially opens attack vectors or privacy holes.

Now if the replication is increased to just 5 then Ander’s (perhaps unrealistic at 1%) example increases to one in 100^5 (1 trillion chunks) to have a specific centre holding 1%.

I seem to remember that 4 is only the minimum number of replications held. I still have the gut feeling that the minimum should be higher, like maybe 6. And at 6 this problem is more theory than practical with maybe a handful of chunks ever stored only in one large region like china or India.