SafeCoin payment for extra redundancy/download speeds

Perhaps a good way to increase SafeCoin recycling rate is to offer extra redundancy at a price. I’ve been thinking a lot about SafeCoin and the MaidSafe cloud storage service lately, and I’m getting more and more convinced that bandwidth is going to become a very significant factor, perhaps even more significant than storage space itself.

I’m not sure if the network already works this way, but since by default every slice of data is stored in 4 different locations on the network, when downloading a data slice the client could connect to all 4 vaults at the same time and request different chunks of data of that slice from every vault. The client could thus get a download speed equal to the upload speeds of the 4 vaults combined. Following from this, if data would be stored in 8 different locations instead of 4, the client would theoretically get double download speeds.

In addition to higher download speeds, the data would be even safer through the extra redundancy. Even if significant parts of the network would suddenly fail for whatever reason, the odds that at least one copy of every slice survives is significantly higher when 8 copies exist than when there are only 4.

For the average user and smaller files the default of 4 copies with a relatively low download speed is enough, but I can imagine rich/corporate users would be willing to pay for more redundancy and better download speeds.

Perhaps this source of SafeCoin recycling could be enough, allowing the network to give virtually unlimited storage to everyone, but charging extra for better speeds and extra redundancy. More recycling → bigger farming rewards → more farmers/vaults → more storage space.

1 Like

Whoah…hang on a minute but is this not contrary to net neutrality ideas…exactly what we don’t want? Not a techie and thanks for simplifying how this other technical aspect works, I get it now. No prob with doubling the doodah as long as available to all users. Ahhh… I think it just clicked that it would technically have to be universal…in which case, if technically easy/quick to do now or implement later, would be a brilliant idea,… .just ignore me.

1 Like

Not related to your general point about offering premium performance for a price, but I don’t think it works like that. I think you are assuming a file is stored in a Vault (replicated to three other Vaults, each containing the whole file). Whereas it is a chunk (1MB) of a file that is stored on a Vault, and replicated to three other Vaults. Therefore you can’t request one chunk from Vault1, another from Vault2 etc, or rather, this is what always happens.

So SAFE always delivers the get each chunk from a different Vault performance feature :slight_smile: and there is no benefit from upping redundancy beyond 4 Vaults.

At least that is how I understand it.

1 Like

I was calling the 1MB parts slices and a chunk a part of that slice, perhaps I should’ve used the terms differently.

There’s no technical reason why a client wouldn’t be able to request only a part of that 1 MB part, though it may currently not be implemented. If so I think it should be, it would give a huge boost to download speeds, and the idea in my OP would become possible.

1 Like

My understanding (I think David Irvine said this recently), is that yes, the API can request a small piece of 1MB chunk, but it is always the whole chunk that is returned (and presumably vice versa). I can see how it might seem inefficient, but efficiencies in systems like this are often counter-intuitive, so it may not be so.

1 Like

I don’t particularly are about extra download speed, but what if I do want greater redundancy? Is there a way to designate a particular file for double or triple redundancy?

Should there be?

I thought there are at least four copies of each chunk.

Lets say Vault A - D contain the same replicated chunk. When Vault B goes offline a new vault, Vault E, will replicate and start serving the chunk. So, there are always at least four replicated copies of the chunk on the Network. As I understand it, when Vault B comes back online there will be five copies of the chunk.

Over time, it is likely that there will be more than four replicated copies of each chunk; but the network will always try to keep four copies available at all times.

Did I get this totally wrong?

No :slight_smile: totally correct.

I understand that, and thats fine for most of my files, but say I have a particularly important file, a file that I want more copies of.

Is there that capacity? Is that a bad idea?

The network will calculate the min number to never lose any data! This is important, we are guessing 4, but it will be dynamic later. There should be no data lost of any kind whatsoever, but more importantly a human should not try and calculate this.
[edit] 5->4

1 Like

I can understand why the network would not allow someone to go below a minimum set by the network, but why would it be bad to allow someone to go above that, assuming that you are charged more.

It is a waste of resources to go beyond that :wink:

2 Likes

ah indeed, but if its a waste of my resources, if the network is properly compensated, why should the network care about that.

Unless this is an image decision, a guarantee that everyone’s data is equally safe? Or equally valued

Except that I don’t value all of my data equally. There are certain files that I like but I could recreate from other sources and certain files that I value far more.

This is common problem in data redundancy on servers. Since a chunk is 1 mb it is so easy to replicate it on demand. Since the nodes that are storing data are so far apart from eachother the event that causes an outage on an individual node does not correlate likely.

Plus a data chunk requiring a strong node to serve the data goes to the strong nodes; A data chunk not so frequently accessed goes to a node with probably less bandwidth and thus that node is relied upon less frequently. It is important to note also that data chunks get stored first on nodes thats are strongest; and lastly on weak nodes if at all.

This being said. The data redundancy of four is sufficient enough on this dynamic network. One must remove the server idea from the equation that there is not just only one node available to the data chunk to store the data. There is the whole planet of nodes at disposal to immediately replicate the data chunk. Small data pieces replicate quickly is what allows for this to take place.

Storing more data chunks per file will simply cause more space to be utilised; and that would make sense to make more fault tolerance in huge servers but even there they cannot afford to hardly replicate beyond three times. Four is such a gift; not to mention it is done so in a distributed manner which further adds to evolutionary reliability in terms of data storage.

1 Like

Hmmm, I guess at an emotional level, I just don’t get how reliable a 4X copy really is. I’m game to be convinced.

Though at a conceptual level, if I try to explain why 4X is enough copies, I don’t have an answer other than, people smarter than me (such as @dirvine) tell me so.

1 Like

Not smarter ;-), by far, it is a guess to be tested, based on older kadmelia networks like guntilla/emule where 8/20 replicas was enough, but when all connections were very light, i…e not checked for many hours/weeks/months between churn events. As we are milliseconds between churn events then the chance of 4 nodes going down in the average churn event seems unrealistic. This is good, but potentially too good, we may not need 4 copies (kademlia republish is 24 hours, refresh == 60 mins). 4 copies may be way too much IMHO.

The bottom line for us. is that we lose no data, beyond that is just more caching really and not necessary.

1 Like

This could be done simply by the user and/or app saving twice with different file names. (Unless I’m missing something)

I think it’s part of the deduplication process, if the file content is exactly the same, then only one file will exist in the mist…the name doesn’t alter that.

However if you slightly changed the file itself, it might get past as unique.

There are discussions on here that debate this topic from the perspective of video files, different compression schemes, throw in an extra frame etc…it’s going to be interesting to experiment with this, should provide some good threads on here.

Would a local backup option make sense? Possibly run several vaults on your own machine, and run your own mini virtual Safe Network?

It would be an interesting experiment to use say ZeroVM or Docker to fire up as many vaults as the hardware permits.

I could also see this being done commercially in Data Centres with something like Openstack to offer massive redundancy and security of internet 1 data.

i.e using SAFE opensource on Servers via containers/ VM’s effectively competing against global SAFE…unless the patents guard against this outcome?

There’s a project called TripleO (Openstack On Openstack) who’s mission it is to build a seeding machine that could spawn and commission a whole data-centre in an hour or less, automatically.

I was thinking out loud about the local SAFE situation in a business context here: