This is very similar to the sacrificial data model that was in the original RFC for safecoin.
This and others ideas will be tested in the upcoming testnets and alphas/betas. It has even been suggested that 4+ copies is all that is needed. At this time I think its too low but testing will tell
The issues with your extreme example is that the ability for a section to know where the chunks it is responsible are. Think of the tables needed to keep track of x chunks stored in y locations where x maybe 10000 and y maybe 100,000. Its not always the data that the issue is with a region losing connectivity, but the elders being cut off.
I’m gonna go full concept here so if it’s crazy whatever…
‘Extra’ chunks could be stored where the location is hash(previous name). This could keep going until the entire network was full. This means that lookuptables aren’t needed.
Then to recover the original chunk you could check a lot of possible ‘future’ locations until it seems unlikely that it would have been replicated that far. Even if replication location 1 doesn’t exist, maybe replication location 2 survived, or location 3 etc, with ever decreasing chance of survival.
It also gives an interesting way to measure spare space by checking the number of extra copies available.
It would be possible re-assess the spare space periodically (eg every time elders changed) by getting a random chunk (eg the take a hash of the periodic event then find the closest chunk in the section by xor distance) then see how many copies exist. This would give a fair measure of spare space when averaged through time.
The idea of sacrificial data is not new but the idea that the network would always be full is new and I think it’s pretty interesting. There are probably lots of ways to make it work but the question is would it be efficient enough or better than alternative ideas. Maybe it’s so secure that it becomes wasteful?
There are some block chains that do this. You fill up your “mining space” with predefined “junk” and your mining is based off of that. As someone who plans to offer several TB of space, this is a royal pain. Writing that much to disk at once to be able to farm is a big turn off.
This type of idea seems like it would tie in with caching too. As requests are hopping to the section holding the chunk vaults could see if they held an extra copy of it. Maybe that would be a valid farming attempt if they are able to do so? Or would it be considered a cache hit and therefore not rewarded?
One other possible implication of this would be that when you are reassigned to a different section you may already have some of the data your new section is responsible for, so that process might go a bit smoother (depending on how much redundancy is happening).
I’m curious which ones…? I know burstcoin does and chia plans to. Any others?
One of the interesting and appealing aspects of this proposal is that none of the data is junk, it’s all (possibly) useful data for someone to fetch in the future. But the difference between junk and ‘useful but never fetched’ is a mere technicality, so the benefit of this proposal depends on the real-life matter of whether data is actually fetched or not.
Hmmm, not sure why writing so much is a big turnoff. Write to disk should be fairly fast. Maybe the poof-of-space algorithms are quite computationally intensive so take a long time? I know chia proof-of-space depends on verifiable time functions so they’re meant to take a consistently long amount of time. Proof of space almost always distils down to proof of time.
My feeling is the bigger pain for SAFE will not be the storage requirements but the bandwidth requirements since it will naturally and unavoidably be slower than the disk read/write requirements. Syncing chunks when joining/relocating will be fairly stressful to most home internet connections if it’s TB worth of data.
how can you write to your own disk if it is on the safe network? Surely the network decides, and just because you happen to be in the same room as your vault, it doesnt mean that you are stroring all your stuff there. And assuming the network is live, it will cost you lots of PUTS.
Also I thought that the network was supposed to de-duplicate by default.
When someone runs a vault they will dedicate some amount of space on their hard-drive to the SAFE network. The vault will be assigned to a section of the network. Each section is responsible for a certain subset of data held by the entire network, so the vault must store that data (or is it a subset of that data, with the members of the section as a whole covering the section’s data responsibility?). If that data that the vault is responsible for storing doesn’t take up the entire allotted space of the vault (as we’d hope it doesn’t), then what should the extra space be used for? This thread proposes filling up that extra space with data from other sections that the vault normally wouldn’t be responsible for. This provides greater redundancy of data, which means it is less likely to be lost during some sort of adverse network event, like a continent being severed from the internet. It also may improve the networks ability to deliver data due to fewer hops, so lower latency. As new data is written to the vaults section, the vault would discard some of it’s ‘extra data’ and store the new data that it is responsible for.
I think we need to come back to how to validate the data a vault is storing from another section.
The section normally validates the chunks held by the vaults within that group in the section.
Now if it is supposed to protect from massive disruptions of say a country/region being cutoff then how could you validate the data a vault is holding for another section? What if both sections no longer have enough elders?
The real issue is not the data but the elders. If you do not protect the quorum of elders so that there is enough elders then what good is holding all that data, because it can no longer be validated and supplied to the network.
It is not just a question of storing enough copies but also validating the data and also having a quorum of elders in each of those sections so that consensus can occur so that data can be supplied.
Once the cutoff nodes return then a recovery process can occur and will the extra stored data have helped at all.
Now for a method to judge spare space then that is another story and akin to the sacrificial chunks concept in the original safecoin rfc