Make as many copies of data to fill all the available space by vaults and make copies all over the world

c0dr · June 17, 2019, 12:19am

I would like to propose the safe network to do many copies of the data till it fills all the available space.

so all vaults are used in maximum, when new data gets added to the network it replaces some data that is very redundant.

make a chunk get stored equally in the whole world

this way a file can be really secure, not even an internet shutdown in one big region cannot take down access to one file.

neo · June 17, 2019, 12:42am

This is very similar to the sacrificial data model that was in the original RFC for safecoin.

This and others ideas will be tested in the upcoming testnets and alphas/betas. It has even been suggested that 4+ copies is all that is needed. At this time I think its too low but testing will tell

The issues with your extreme example is that the ability for a section to know where the chunks it is responsible are. Think of the tables needed to keep track of x chunks stored in y locations where x maybe 10000 and y maybe 100,000. Its not always the data that the issue is with a region losing connectivity, but the elders being cut off.

mav · June 17, 2019, 1:22am

Really interesting idea.

I’m gonna go full concept here so if it’s crazy whatever…

‘Extra’ chunks could be stored where the location is hash(previous name). This could keep going until the entire network was full. This means that lookuptables aren’t needed.

Then to recover the original chunk you could check a lot of possible ‘future’ locations until it seems unlikely that it would have been replicated that far. Even if replication location 1 doesn’t exist, maybe replication location 2 survived, or location 3 etc, with ever decreasing chance of survival.

It also gives an interesting way to measure spare space by checking the number of extra copies available.

It would be possible re-assess the spare space periodically (eg every time elders changed) by getting a random chunk (eg the take a hash of the periodic event then find the closest chunk in the section by xor distance) then see how many copies exist. This would give a fair measure of spare space when averaged through time.

The idea of sacrificial data is not new but the idea that the network would always be full is new and I think it’s pretty interesting. There are probably lots of ways to make it work but the question is would it be efficient enough or better than alternative ideas. Maybe it’s so secure that it becomes wasteful?

wes · June 17, 2019, 2:58am

There are some block chains that do this. You fill up your “mining space” with predefined “junk” and your mining is based off of that. As someone who plans to offer several TB of space, this is a royal pain. Writing that much to disk at once to be able to farm is a big turn off.

drehb · June 17, 2019, 3:21am

This type of idea seems like it would tie in with caching too. As requests are hopping to the section holding the chunk vaults could see if they held an extra copy of it. Maybe that would be a valid farming attempt if they are able to do so? Or would it be considered a cache hit and therefore not rewarded?
One other possible implication of this would be that when you are reassigned to a different section you may already have some of the data your new section is responsible for, so that process might go a bit smoother (depending on how much redundancy is happening).

mav · June 17, 2019, 6:05am

I’m curious which ones…? I know burstcoin does and chia plans to. Any others?

One of the interesting and appealing aspects of this proposal is that none of the data is junk, it’s all (possibly) useful data for someone to fetch in the future. But the difference between junk and ‘useful but never fetched’ is a mere technicality, so the benefit of this proposal depends on the real-life matter of whether data is actually fetched or not.

Hmmm, not sure why writing so much is a big turnoff. Write to disk should be fairly fast. Maybe the poof-of-space algorithms are quite computationally intensive so take a long time? I know chia proof-of-space depends on verifiable time functions so they’re meant to take a consistently long amount of time. Proof of space almost always distils down to proof of time.

My feeling is the bigger pain for SAFE will not be the storage requirements but the bandwidth requirements since it will naturally and unavoidably be slower than the disk read/write requirements. Syncing chunks when joining/relocating will be fairly stressful to most home internet connections if it’s TB worth of data.

Thanks for your insights @wes

Astroman · June 17, 2019, 10:06am

this thread lost me.

how can you write to your own disk if it is on the safe network? Surely the network decides, and just because you happen to be in the same room as your vault, it doesnt mean that you are stroring all your stuff there. And assuming the network is live, it will cost you lots of PUTS.

Also I thought that the network was supposed to de-duplicate by default.

jlpell · June 17, 2019, 10:57am

Mentioned briefly during our sacrificial chunk discussion.

However, you raise an interesting point @mav about simple and efficient routing in the case of chunk maximalism.

isntism · June 17, 2019, 1:08pm

I’m fairly sure they’re talking about vaults in general, instead of having empty disk space on your vault, you’d store (extra) chunks you wouldn’t if there were less copies.

That may give you a higher payout, and decreases the chance of a file being lost, you as a user wouldn’t have to pay extra since copying the file a certain number of times is included.

The copies are stored in different locations in case of disasters or the vault simply turning off, but it still deduplicates (public) data by simply noticing that that exact chunk is already stored.

drehb · June 17, 2019, 1:20pm

When someone runs a vault they will dedicate some amount of space on their hard-drive to the SAFE network. The vault will be assigned to a section of the network. Each section is responsible for a certain subset of data held by the entire network, so the vault must store that data (or is it a subset of that data, with the members of the section as a whole covering the section’s data responsibility?). If that data that the vault is responsible for storing doesn’t take up the entire allotted space of the vault (as we’d hope it doesn’t), then what should the extra space be used for? This thread proposes filling up that extra space with data from other sections that the vault normally wouldn’t be responsible for. This provides greater redundancy of data, which means it is less likely to be lost during some sort of adverse network event, like a continent being severed from the internet. It also may improve the networks ability to deliver data due to fewer hops, so lower latency. As new data is written to the vaults section, the vault would discard some of it’s ‘extra data’ and store the new data that it is responsible for.

neo · June 17, 2019, 1:28pm

I think we need to come back to how to validate the data a vault is storing from another section.

The section normally validates the chunks held by the vaults within that group in the section.

Now if it is supposed to protect from massive disruptions of say a country/region being cutoff then how could you validate the data a vault is holding for another section? What if both sections no longer have enough elders?

The real issue is not the data but the elders. If you do not protect the quorum of elders so that there is enough elders then what good is holding all that data, because it can no longer be validated and supplied to the network.

tl;dr
It is not just a question of storing enough copies but also validating the data and also having a quorum of elders in each of those sections so that consensus can occur so that data can be supplied.

Once the cutoff nodes return then a recovery process can occur and will the extra stored data have helped at all.

Now for a method to judge spare space then that is another story and akin to the sacrificial chunks concept in the original safecoin rfc

Astroman · June 17, 2019, 1:47pm

thanks - i get it now. i misunderstood it earlier.

Topic		Replies	Views
Nodes offline, copying chunks? Autonomi Network Token	3	760	June 28, 2017
Measuring storage needs of the network Features	6	945	July 4, 2015
Vulnerability for data deletion Development	55	2929	May 19, 2015
Grouping vaults by owner, to strengthen redundancy? Features	30	3600	November 29, 2014
Massive vault failure handling? Development	5	975	July 21, 2016

Make as many copies of data to fill all the available space by vaults and make copies all over the world

Related Topics