Optimal vault size

Chris A. @caguettaz corrected me on noting that the Datamanagers are close in XOR space to the chunk name, not the PMIDs! So this argument needs revision!! So for now, don’t believe what is written here in my name !

problem: if a big single vault (say 20TB) would have a single PMID in XOR space that is surrounded by several small vaults (eg many 500GB vaults), then if this vault goes offline (either through an attack, or an innocent shutdown). It would/could create an immense churn and flood all neighbouring smaller vaults, as the network seeks to restore redundancy of all chunks stored in that vault. Potentially disrupting the network.

solution: All clients should automatically divide a big storage space provided by a farmer into appropriate ‘optimally sized’ vaults of similar size. Just as file chunks are all 1MB and spread across XOR space; a big storage space should also be divided into similar sized small vaults - magic number alert - of eg 500GB (or less). If a single big storage goes offline, rather then one massive vault going dark, many small vaults go dark across XOR space, each of them neighboured by roughly equally sized other vaults that can easily share among them the burden of taking up the new chunks.

This optimal size should be further investigated and tested. Running a vault has a computational cost on the machine, so pushing the size too low is at the expense of responsiveness. What is important is that all vaults are roughly the same size, at least same order of magnitude. So maybe the network should be able to gauge what the average disk space provided per machine is. Of course to avoid network overhead, it’s pretty clear that this number will be somewhere order 100 GB, and can increase with time as average disk spaces grow with new computers.

I am not arguing to exclude arbitrarily small vaults though! They can contribute, but it should be a minority of vaults that deviate to a relatively small size. So a maximum size, not a minimum size.

Would this mean data distribution is not completely random?

I’m not sure whether I should answer ‘yes’ or ‘no’.

No, the proposed optimal vault size does not (in a fundamental way) change the way data is distributed over the network,

Yes, data was never ‘truly randomly’ distributed over the network. It is distributed through a chain of mathematically non-invertible functions and dependent on the actual availability of vaults on the network. It is by all means theoretically and practically untraceably stored on an ever changing network. Strictly speaking though the hash function is not a random function at all. Chaotic and irreversible, yes, so that it is untraceable, but repeatable so that Vault managers can check whether the vault is not cheating and properly storing the chunk.

Hope that helped more then it confused people :slight_smile:

I get most of what you’re saying.

As long as it cannot be traced back to its origin by a 3rd party. That is what people want to know.

1 Like

Even if you had some way of recollecting all the chunks of a given file. It would still be useless because the datamap contains the keys to decrypt it. If you would try decrypting it by brute-force, you would never succeed as the chunks were obfuscated before being encrypted. This means that if by magical chance you had guessed the correct key for one chunk to decrypt it brute-force, you wouldn’t know you had the right guess.

Rather it requires you to combine the correct guesses for each chunk separately!

So if you had to guess the correct key for one chunk (with AES256) you’d have to guess the correct 256 length key. You have a chance of one in 115792089237316195423570985008687907853269984665640564039457584007913129639936 (115 quattuorvigintillion 792 trevigintillion 89 duovigintillion 237 unvigintillion 316 vigintillion 195 novemdecillion 423 octodecillion 570 septendecillion 985 sexdecillion 8 quindecillion 687 quattuordecillion 907 tredecillion 853 duodecillion 269 undecillion 984 decillion 665 nonillion 640 octillion 564 septillion 39 sextillion 457 quintillion 584 quadrillion 7 trillion 913 billion 129 million 639 thousand 936 - clearly someone once came up with these names) to make this correct guess.

Say however your file is 10 MB, so you have 10 chunks to decrypt separately, you’d have to win that ridiculous lottery ticket ten times in a row to be able to read the file. Even stronger, if you missed one chunk, or had a wrong chunk, you are theoretically unable to determine whether you had won the lottery for all the other chunks ! (because of the obfuscation)

Before you can even start to decrypt these chunks (you can’t), you’d first have to find them, in a sea of unrevealing chunks, all encrypted, all nameless with a simple hash as a name tag. That’s not even finding a needle in a haystack. That is finding the correct straw in a mountain of identical straws.

Without the datamap you don’t even know how many straws you are looking for or in what order they have to go.

Self-encryption, in combination with the other mind-blowing innovations, simply nukes the board on reading your data ever again without the proper login credentials. As soon as SAFE is online, I’m putting all my work on it. It is beyond measure the SAFEst place to put them.

2 Likes

Vault has to be determine by proximate location (IP) as one can have many machines running SAFE so even if size of vault is limited power shortage ca put down many machines at once.