Not related to your general point about offering premium performance for a price, but I don’t think it works like that. I think you are assuming a file is stored in a Vault (replicated to three other Vaults, each containing the whole file). Whereas it is a chunk (1MB) of a file that is stored on a Vault, and replicated to three other Vaults. Therefore you can’t request one chunk from Vault1, another from Vault2 etc, or rather, this is what always happens.
So SAFE always delivers the get each chunk from a different Vault performance feature and there is no benefit from upping redundancy beyond 4 Vaults.
I was calling the 1MB parts slices and a chunk a part of that slice, perhaps I should’ve used the terms differently.
There’s no technical reason why a client wouldn’t be able to request only a part of that 1 MB part, though it may currently not be implemented. If so I think it should be, it would give a huge boost to download speeds, and the idea in my OP would become possible.
My understanding (I think David Irvine said this recently), is that yes, the API can request a small piece of 1MB chunk, but it is always the whole chunk that is returned (and presumably vice versa). I can see how it might seem inefficient, but efficiencies in systems like this are often counter-intuitive, so it may not be so.
I thought there are at least four copies of each chunk.
Lets say Vault A - D contain the same replicated chunk. When Vault B goes offline a new vault, Vault E, will replicate and start serving the chunk. So, there are always at least four replicated copies of the chunk on the Network. As I understand it, when Vault B comes back online there will be five copies of the chunk.
Over time, it is likely that there will be more than four replicated copies of each chunk; but the network will always try to keep four copies available at all times.
The network will calculate the min number to never lose any data! This is important, we are guessing 4, but it will be dynamic later. There should be no data lost of any kind whatsoever, but more importantly a human should not try and calculate this.
This is common problem in data redundancy on servers. Since a chunk is 1 mb it is so easy to replicate it on demand. Since the nodes that are storing data are so far apart from eachother the event that causes an outage on an individual node does not correlate likely.
Plus a data chunk requiring a strong node to serve the data goes to the strong nodes; A data chunk not so frequently accessed goes to a node with probably less bandwidth and thus that node is relied upon less frequently. It is important to note also that data chunks get stored first on nodes thats are strongest; and lastly on weak nodes if at all.
This being said. The data redundancy of four is sufficient enough on this dynamic network. One must remove the server idea from the equation that there is not just only one node available to the data chunk to store the data. There is the whole planet of nodes at disposal to immediately replicate the data chunk. Small data pieces replicate quickly is what allows for this to take place.
Storing more data chunks per file will simply cause more space to be utilised; and that would make sense to make more fault tolerance in huge servers but even there they cannot afford to hardly replicate beyond three times. Four is such a gift; not to mention it is done so in a distributed manner which further adds to evolutionary reliability in terms of data storage.
Not smarter ;-), by far, it is a guess to be tested, based on older kadmelia networks like guntilla/emule where 8/20 replicas was enough, but when all connections were very light, i…e not checked for many hours/weeks/months between churn events. As we are milliseconds between churn events then the chance of 4 nodes going down in the average churn event seems unrealistic. This is good, but potentially too good, we may not need 4 copies (kademlia republish is 24 hours, refresh == 60 mins). 4 copies may be way too much IMHO.
The bottom line for us. is that we lose no data, beyond that is just more caching really and not necessary.
I think it’s part of the deduplication process, if the file content is exactly the same, then only one file will exist in the mist…the name doesn’t alter that.
However if you slightly changed the file itself, it might get past as unique.
There are discussions on here that debate this topic from the perspective of video files, different compression schemes, throw in an extra frame etc…it’s going to be interesting to experiment with this, should provide some good threads on here.