What about a catastrophic event that wipes out millions of nodes

Random means there is still a little chance that all copies end up in one place.

1 Like

Yes, but a vanishingly small one once there are hundreds of thousands or millions of nodes

It’s not about the number of nodes, it’s about the geographical locations.

1 Like

Sure, but let’s say users are spread across North and South America, Europe, India, China, Australia - that’s pretty representative of this forum, and there will be other countries too - then those nodes will be geographically widely spread, and the more users there are the lower the chance that data is replicated across the same nodes.

1 Like

Seeing as humans tend to concentrate in very small areas called cities, we’re always going to have unhealthy concentration geographically.

Tokyo,
New York,
Sao Paulo,
Seoul,
Mexico City,
Osaka/Kobe/Kyoto,
Manila,
Mumbai
etc…

and add to that that SouthEast Asia is overrepresented.

However, It is said, that the largest population growth in the coming 50 years will not be that area of world, but Africa, going from 1 billion to 3 billions (or so…). And Africa is quite large. So, at least some balancing continent wise.
(not that such a (human) population increase is very positive in general though.)

1 Like

It is the contrary: the replication factor is going from 8 to 4.

Here are some links about current value:

and some links about its future decrease:

I completely agree with you, when you say that 4 copies are insufficient for a secure system.

My other issue is that the replication factor should be an independent parameter. If tests or simulations show that this parameter should be increased, this can be done without impacting anything else (like the number of elders in section).

1 Like

In case this is of interest to anyone, I made a spreadsheet to calculate the file size at which file loss would become more likely than not:

Anyone can edit it, so please don’t mess around with it too much, otherwise other people won’t be able to use it…

It only works for files consisting of quite a few chunks, and assumes a completely synchronised node outage disaster.

It shows that 6.93 GB is the size at which a file stands a 0.5 chance of survival. This file size is increased by a factor of roughly 10 for every additional chunk copy, but chunk loss is still inevitable for the biggest files. The ‘prognosis per chunk’ however is always good.

It seems unavoidable that upload services will pre-chunk files regardless of the number of chunk copies.

3 Likes

THere is a lot of confusion here and a bot of speculation being seen as hard facts. So here is a few points.

  1. We calculate and test everything. So if 4 is enough then it will be the case, right now it’s over 8 actually (it is whole sections so can be 20 or more copies).
  2. It may decrease to 4 I think it will, but and this is very important, only if we can prove that is secure. (tl;dr I will alway guess forward but he team forces testing and certainly where possible on these guesses). that’s innovation, what seems mental like going to mars may in fact not be mental given enough thought, 4 copies may be less mental than terraforming another planet :wink: )

Then the actual issue of catastrophic events.Here is my take on this without too much thinking of the edge cases.

  • The network should contain the knowledge of all humanity and with luck it will.
  • The addition of knowledge is extremely important (theories → laws etc.)
  • The manipulation of data is very important (mutation of knowledge, transactions etc.)

So we have immutable and mutable data as we all know. Immutable data can be stored anywhere and secured, Here archive nodes can help. So on these archive nodes, what would they hold and how much would they hold? We go back to my guesses, here I claim that storage is becoming cheaper and more capable very quickly, I expect to be able to store the worlds data on a single device sometime not far away. Until then I expect significant increases in small devices, particularly IOT types.

Then mutable data, now that is more difficult, but we have data chains where we can show a version was secured on the network at some stage, but we cannot be sure if it is the latest version. for that we need more info, if it becomes available. So silo’s can hold this as well, it may or may not be the latest data.

However as the network restarts, even with new hardware (if it was a huge emp type catastrophe) then these silo’s can connect, compare their data, data chains and find the latest known version of any data. Even if a single peer can come on line it can show a later version (this is powerful) that the restarting network can accept.

Anyway, this is a snippet of how such autonomous networks can restart after catastrophe, it is not about losing a chunk, it’s about keeping them all and away they can be re-inserted on the network. So that is data chain + some data that fits the chain. It goes a lot deeper, but anyway I think it is pretty clear to be confident this is a solved problem here and post launch it is not an impossible thing to solve, it just needs some thought as to exactly what the fundamentals of the data types and proofs of valid data are, where they were stored or held is then a simpler issue.

tl;dr archive nodes will not be difficult, may never be needed but may be the norm depending on advances in storage tech.

12 Likes

Interesting - can you explain your workings?

Everyone is free to use the public net. If companies want a private net, it is a different propositions.

I was actually just highlighting that SAFENetwork is a good way of adding data resilience and performance relative to their current private systems. Take it as you wish though.

2 Likes

It’s the reverse equation for finding the prognosis (chance of survival) for a file, given its size.

The green cells are the input ones (edit these values)
The teal cells are the output ones (don’t edit these)

6.93 GB = 6 930 chunks
1 in every 10 000 chunks is destroyed
each chunk has a 0.9999 chance of survival
the chance of all 6 930 chunks surviving is 0.5

2 Likes

All good to know, and I trust that the network is being tested properly and thoroughly. I just think that pre-chunking will be included in upload clients if it reduces the tiny risk even a tiny amount.

I don’t get why pre-chunking is any better? You will still need all (pre-chunked & normal-chunked) chunks to rebuild a file.
Or are you referring to some kind of additional parity by “pre-chunking”?
The parity approach is actually a pretty nice approach as it can be implemented completely on the client-side without adding a potentially buggy feature to SAFEnet (as opposed to the configurable amount of chunk copies). Cons: it would require a client “fsck” to repair the missing chunks.

If a file is pre-chunked (split by the upload service into multiple unencrypted files), then a missing chunk means that some of the file is lost. In a large video file, this might mean a few seconds goes blank, which could be irritating for the viewer.

If the network’s chunking system is used without any pre-chunking, then any missing chunk would mean the whole file becomes completely unreadable (unless I’ve got this wrong?).

1 Like

Ahh, yeah you’r right. I assumed a lossless approach, but it would work for videos.

I would love it if SAFEnet would use a rolling-hashing like eg IPFS or rsync for chunk splitting. But i guess that wouldn’t work with the self encryption for some reason.

For large files, I’d assume that a ‘master chunk’ to refer to all other chunks would be the best bet. A ‘sea anemone’ layout instead of an ‘eel’ layout. But of course I am not one of the experts, nor have I spent years theorising about and building this :^)

Perhaps you might elaborate on what you mean by this? I’m not picturing it too well.

1 Like

Try to calculate with 8 copies of each chuck which is what they’re actually implementing. And I did this before. It was something like 2% of networks data being lost with a 50% network destruction! Which seemed OK!

Also keep on mind popular data get replicated a lot more. And I also agree on the fact that people may have to split their data, and it’s due to the fact that the network can’t read data unless all chucks are gathered but if we split the video into 1000 pieces of 30 second videos let’s say, we can still make do without seeing a 30 second period of the video. But for the network, as long as one isn’t available all is unreadable.

Alternatively, store multiple copies if the same data if you have really important data. Like your btc private keys.

I think the idea is you store the data in your LAN SAFE network, then also store it in the public SAFE network. You can read the data quickly off of your local network, but rely on the resilience of the public SAFE network if your office suffers a natural disaster or something. It’s effectively like you just cache all of your organization’s data on your LAN.

2 Likes

As I understand it, it will be more complex than 8, and more of a “there’s these main chunks, but also these backup chunks, and these other kinds of chunks” kind of thing which makes the calculations a lot more complex. But assuming 8 for simplicity…

Under the simplified system, 50% network destruction to the power of 8 copies is 0.390625% chunk destruction, or 99.609375% chunk survival, which means every file consisting of 6 chunks has a 2.3% chance of being wiped. Pretty good deal, but the bigger files get a worse deal.

Storing multiple copies of a video file would cost twice the amount as pre-chunking the video file, and still have a much higher risk of the file becoming completely unreadable :confused:

I think people won’t split their data because that’s hassle. It will be every program that talks to the network that automatically splits the data. I must be missing something vital, because I can’t imagine why any program wouldn’t pre-chunk the data by default, thus making the network’s ability to store large files obsolete.

1 Like