Geographically demand-based dynamic incentives should be implemented to promote geographically decentralised distribution of data!

I am understanding your worry here, but this is what the network has to do. So the number of replicants must be able to handle huge partitions. The issue though is more pronounced the other way around. So you are in Luxembourg and it cuts off with only 2% of the data, then the issue is what happens to those in that country. The rest of the world will be Ok but you had basically a 98% partition.

50% partitions etc. are fine you will still get the data, but moving on immutable data is another story, like two blockchains running in parallel and then need to merge.

[Edit : I should make it clear though you wont have 2% of the worlds data, but 2%/8 (at least) of copies of the data if that makes sense]

5 Likes

Yeah i guess if i want to store private yet important files on the network maybe it’s a good idea to store multiple copies even if i pay more safecoins. The network can help more with information sharing and opposing internet censorship rather than completely safe data storage for private data(such as bitcoin private keys for instance)

If you do store multiple copies make sure you change the length of the document by adding something at the start of the document. Otherwise dedep will cause only one copy to exist on the network if the file is exactly the same.

Adding to the end would only mean the last 1 to 3 chunks are different, so it has to be at the start.

But yes I would be doing the same with important documents. More likely I would rar up the files using a different password for every “copy” I uploaded. That way each of the “copies” would be different and thus actually stored avoiding dedeplication. The password would be simple like “a” then “b” then “c” since the rar files are private data and encrypted by the network anyhow.

2 Likes

Agree The Hawaiians want to be involved and gorgeous untouched by the Haole for 100’s of years

Getting a homogeneous geographic distribution may sound great, unless you want performance, in which case, I’d like mine spread across low-latency nodes, based on my network routing to them, rather than wanting x% in China and y% in Australia - where the time to first byte could be measured with calendars or sundials at the very least.

Actually, I’d pay extra to get a high-performance tier rather than the budget-beaters tier of service. I’m sure others might like the choice as well.

3 Likes

Statistically, there will be copies all around the world and local caching nodes. You will be unlucky if the fastest node is geographically far away.

I suppose increasing the number of times data is stored will increase the chances of a local hit and increase redundancy though. Perhaps having an option to pay more for more copies could help.

It will be interesting to see how everything performs once vaults are distributed.

But of course after numerous hops scatted around the globe the first hop or last hop will not be the major determining factor of the time to get the first byte.

I was thinking about this topic recently and an interesting thought came to mind. Perhaps its already been considered and abandoned, but what about having a dynamic allocation of redundancy on the network between some minimum (8) and some maximum like (12,16, 32, 64 etc.)? I read that even though there might be a lot of extra storage capacity, the network will only farm and consume as much space as required to store the 8 redundant chunks. But why not use a larger portion of the available space for added safety? If new were rapidly added files are added, and the percentage of available resources dwindled too quickly, some of those extra redundant copies would be overwritten until a minimum of 8 exist. Then as more farmers come online the number of copies might go up again until you hit 16 or 32. I’m sure the devs considered something like this before and decided against it, but I’m curious as to why. Too much network traffic? Unstable network dynamics? Just curious. I guess this type of system may or may not help improve geographic distribution but it wouldn’t hurt right? At 8 copies one would have about 1 per continent. At 24 to 32 copies, you would have about 4 per continent in such an easy to use automated fashion that would make using amazon S3 seem reckless/foolish.

Edit, to clarify.
With current xor distance based distribution, we have the following:

Geographically, the data density would tend towards that of the population density, with some deviation for datacenter locations.
I.e.:


Population density changes over time. The most drastic change we’ll probably see, is a doubling in central Africa region to year 2050.

Economically wealthier areas would have higher weight in the above example though, both as a result of higher rate of high bandwidth internet connections, but also due to higher level of spare resource per capita.
Economic wealth changes too. The high density areas are increasing their wealth.

These basic facts unravel many things.
One for example: exposure to threats to the data, that have geograpic dimension (solar flares etc), is as high as it is to the population in general. The xor distance based distribution provides no or little change to that.

Another is that in order to change this distribution (to spread it more evenly over the earth’s surface) you’d have to compromise the security in more than one way (where the first way is that of decreasing the level of anonymity): the amount of data handled per node would have to increase a great deal over sparsely populated areas. Antarctica for example. A great deal of data on just a few lonely reasearchers’ computers.
And when they travel back home, shall we track the movements of these computers? As to relocate the data. Seems like opening an abyss of problems.

1 Like

I had similar concerns when I first started looking at the safenet. After thinking it over I would agree that the devs have gone about it the most efficient way possible. Distance in XOR space is efficient and adequately randomizes the chunks among all the vaults. It’s a probabilistic approach. Like I mentioned in my previous post, in order to get better geographic spread they might be able to just dynamically modify the number of redundant copies… which I have just found out by reading old forum posts they already have considered this by way of “sacrificial copies”. More copies boosts your chances of higher geographic spread. I had once thought that maybe one might want to incentivise the use of very coarse geolocation location data, such as like knowing where a node was to within +/- 1000 km. However, this is just added complication while increasing your attack surface. The only other thing I could think of that might be simple way to factor in the latency between nodes sharing redundant chunks and factor that into distance metric. I figure that you wouldn’t want the latency between the nodes to be too high or too low. Very high latency would hurt performance, while latency that is too low might indicate that two redundant copies of a chunk are stored within the same 200PB datacenter. The devs are very clever and they probably have already considered something like this or done something better.

Another issue that isn’t considered here because of looking historically to present is that bandwidth everywhere** is increasing and the penetration of computers into the whole world is increasing year by year at a great rate. So the geo distribution will be improving all the time.

Now when archive nodes are developed and people implement them we will see a goodly amount of distribution in more developed areas of these nodes. These will greatly assist in the preservation of data and hopefully they will have 6 to 8 copies of each chunk in the network. I really have a gut feeling (& nigh on 50 years experience) to tell me that 4 copies of each chunk is insufficient for significant events. Not even global events just local events affect say one country will see data lost in my opinion.

I could be wrong but I think amazon automatically makes 6 or 8 copies within each S3 region. It is often recommended pick a couple regions in case a single region has a crisis. This makes total of 12-36 copies spread around the globe. For some reason I can’t find the reference to where I thought I read that. So I have no proof. Maybe someone else does.