Storage proceeding

I don’t see the point in a paranoid option.

The network is designed to not lose data. If the design needs tweaking to ensure this is robust, that will happen.

Obviously no system can ever be 100% though, even with a paranoid option. We have “the thing we never thought of”, and “the thing we thought would never happen” to thank for that.

So as is standard practice, if you want to dramatically improve your data security, you need backups that are independent, held on separate systems, in separate locations, regularly tested etc.

@janitor has also said this, and I think it is the only sensible alternative really. Bumping up chunk redundancy is a very poor substitute IMO.

Maybe David will one day convince us that decentralisation is just as good, but it still has single points of failure at this point (protocols, code, compiler etc). One day that may change, but until then, I think I’ll be keeping backups of data I don’t ever want to lose.

2 Likes

That way it would cause tremendous punishment on all responsible farmers. Total socialization of risk (for no payback to those who would be paying for it).

At least in the less terrible version data owner would have to GET the chunks (and pay for that).

What is the connection between the question (loss of all chunk replicas) and your “solution” (prevention of non-targeted data corruption)?

Spoken like a true uploader! Farmers already look likely to get screwed and you’re proposing yet another non-paying workload for them.

Well, those who read this post below may be already convinced that the level of protection is good enough
Hilariously, that topic was started by Seneca who was a capitalist back then (quote: “I can imagine rich/corporate users would be willing to pay for more redundancy” - whereas now he’s arguing the cost should of added protection for “the rich” should be paid by the farmers :slight_smile: ) and David already addressed those questions by posting multiple comments.

What? No, it’s just a few extra data chunks that farmers get paid for hosting. And farmers only look likely to get screwed in your own head due to the assumptions you make.

Yes, the parity files don’t have a big overhead, but verification has very high cost in terms of compute and network resources.

Can you explain, since the network already checks status of all chunks, what is the benefit of having this extra layer?

Is it “if all 4 replicas of the same chunk get deleted or corrupt at the same time, they can be recreated”? What is the likelihood of that happening in a 10,000 vault network?

It is not uncommon for large geographic areas to have internet outages that go on for hours…

I think a better solution would be to create persistant vaults that serve as backups for the rest of the network…

Farmers will not “get screwed”

The market will do what it does. I suspect that farmers are going to abundant, and thus they will get paid poorly. This doesn’t mean they are “Getting screwed” The value is in the network, not the $$ return… If the network isn’t enough payment for you, then don’t farm. If enough farmers do this the abundance will go away, and prices will increase. It will nearly always be a break even endeavor however – It’s a commodity market with little barrier to entry…

But that is irrelevant to issue at hand.

You well know that replicas are randomly scattered around the world, so not only 4 “regions” (continents) need to go down, but also at about the same time, and also not come back.
I don’t know why it’s so often necessary to re-explain things in one and the same topic. Just 2 comments ago I asked Seneca to tell me his estimate of the likelihood of such catastrophic loss. You probably saw that, but you still made a comment about isolated regional downtime (which doesn’t even involve data loss of a single chunk’s replica (let alone four), but is rather temp downtime of one chunk which is almost impossible to be detected by the user).

It’s not market behavior when irresponsible farmers cause non-paying workload for responsible farmers.
That is the same like Obamacare: you don’t want to participate in mandatory insurance, but you have to, and the more irresponsible you are, the more system resources are allocated to fix that.
In an interconnected system such as this one some “socialization” must be present as it’s not always possible to precisely calculate and allocate costs on the fly without causing outages, but why add more unnecessary overhead without rational justification (according to David’s comment I linked above))?

What would you tell to people (may they be the farmers, uploaders or downloaders) who “exploit” the network?
Is there any use of the network that you would considered economically unethical (say, some users taking advantage of the bad economics at the expense of other users)?

Interesting reference. So I guess what you´re saying is that Obamacare is flawed and you assume that the system will not work out because it gets naturally exploited. As a citizen of a state where insurrance has been obligatory for long time I can tell you that the system works way more efficient than the handling in the US prior to Obamacare, precisely because people see some value in not being ill. That´s not only a good metaphor for what I suppose @jreighley is saying here: the value is in the system. Let´s say it costs you 10 EUR to store 100MB on SAFE. That doesn´t sound like a good deal to you, I guess? Yes, this is not competetive to Googles free plan for 15GB, but we all know that this is NOT free and if you host your own server then that comes with some drawbacks as well. At some point people may have interest in buying this (potentially!) expensive storage of which they know that its securely encrypted and that people cannot take it down (easily).

There is a huge difference in that I don’t have to farm…

I get fined if I don’t buy insurance.

The users will pay for PUTs and the network will pay the farmers whatever it needs to pay the farmers to have enough farmers to do whatever needs to be done. If you don’t like it, don’t farm. If you do, do. There is no coercion in the deal, thus you are not “getting screwed” if you don’t get paid what you think you ought to get a paid.

Those who “exploit” the network will have to pay the price that the network decides to charge them. If they pay what they are asked to pay, then it isn’t “exploiting”

Folks will use SAFE because they don’t want their data to be hacked. That is plenty of incentive to hand over as much hard disk space and bandwidth as is required to sustain the amount of data that you care to store… You don’t need to be paid in FIAT of any flavor for this to be a worthwhile transaction because you would be having hard drives and bandwidth that your are paying for either way. Using SAFE just gets you the benefit of being rather hackproof, and redundantly backed up…

Okay, that explanation makes sense - a matter of positioning (not necessarily for the lowest cost of service).
David claimed that 4 replicas is plenty enough (maybe that could be even lowered to 3, he said on that link) and I agree with that (because the likelihood of 4 copies of the same chunk being destroyed within 1 hour or so is ridiculously small). Note that in this extremely unlikely scenario there would be a loss of 1 file (not some user’s entire SAFE data, etc.).
Related to Obamacare, adding more insurance would be like bundling mandatory insurance from some rare jungle disease to all plans, I suppose.

Yes, but you’re effectively transferring cost of premium service to users who don’t need that service (who think 4x is fine).
Or maybe they do. I personally would be happy with 3x. Not many enterprises have more than 3 copies (not willingly anyway) of their data.
At some point (4 replicas, 8 replicas, etc.) the cost starts impacting adoption, and you might start wondering what is the priority of the project: Iron Mountain-like service for enterprise customers, or secure, safe, low-cost anti-censorship content distribution network for (nearly) everyone.

I discuss this above:

X is the fraction of vaults which disappear permanently and simultaneously. N is the file size in megabytes. Possible scenarios where X is large:

  • A country doing deep packet inspection suddenly cuts off access to SAFE.
  • A group of farmers, feeling they are being “screwed”, collectively agree to run a shutdown -h <time> command.
  • An investor/speculator wants to drive down the price of Safecoin compared to fiat, so they invest in a huge amount of vault capacity, wait years, then shut it off. Buy up Safecoin on the cheap. Then submit a PR to github which addresses the problems discussed in this thread. Safecoin price jumps once confidence is restored. Speculator gets rich.

edit: the formula above has the number 4 as a constant, which is too pessimistic, on average

  • That doesn’t destroy your data
  • According to @Seneca’s solution, all data integrity checks would pass just fine (since the data hasn’t been destroyed, and he said the checking is done network-side, i.e. close to vaults) :slight_smile:
  • They need to coordinate world-wide (next to impossible - you can’t even tell if someone actually is a farmer!)
  • Assuming co-conspirators aren’t lying and are indeed SAFE farmers, they need to be in this for some sort of gain. Maybe they’re short SAFE coin, but it is equally likely that some of them may go long to create a short squeeze on those who really want to go ahead with their plan
  • Long bet, too risky. I did a quick calculation - you’d need about US$1 million to pull this off, and then another question is what if it worked better than expected and everyone just left and didn’t come back (you’d lose earned SAFE, your new acquired SAFE would lose all value, etc.).

Yes, there are always risks, but like I’ve said many times here: it’s long tail stuff for which we can easily tell it’s extremely unlikely, but people like to discuss about (it’s exciting, etc.). On the other hand discussions about critical issues (like daily economics of the network) appear once a month (boring stuff).

1 Like

Ok, some figures from the horse’s mouth (sorry David!):

"So when we say 4 copies it can be 2-6 and 16 off line. It’s just easier to say 4"
ref

and

…based on older kadmelia networks like guntilla/emule where 8/20
replicas was enough, but when all connections were very light, i…e not
checked for many hours/weeks/months between churn events. As we are
milliseconds between churn events then the chance of 4 nodes going down
in the average churn event seems unrealistic. This is good, but
potentially too good, we may not need 4 copies (kademlia republish is 24
hours, refresh == 60 mins). 4 copies may be way too much IMHO.

The bottom line for us. is that we lose no data, beyond that is just more caching really and not necessary.
ref

2 Likes

Where those quotes from before or after the decision to go with non-persistant vaults?

Doesn’t non-persistent vaults eliminate “offline copies” ?

I’m not sure but you can check the dates of the posts.

No.

2 Likes

Both quotes where before the decision to go with non-persistent vaults. There are no off line copies anymore and there are 4-6 on line copies, if I understand correctly the following document:
SafeCoin Farming Rate:

A DataManager is a specialisation of a NaeManager. It has the responsibility of storing data and ensuring it’s integrity. Each DM group will monitor 2 copies of each ImmutableData type. There is a primary DM group, a backup DM group and sacrificial DM group for the three types created for every ImmutableData packet.

That makes 3 groups monitoring 2 copies each, but we can’t be sure that the sacrificial copies are always stored:

The third data type ImmutableDataSacrificial which is the network measuring stick. These types are only attempted to be stored, whereas other types MUST be stored. In the case where other types cannot be stored then copies of Sacrificial data will be deleted from the PMID nodes

1 Like

I don’t pretend to know this, but I think you are reading this incorrectly. I don’t think this means there are - or might be - no offline copies. To resolve it we need someone from MaidSafe though.

Backup copies are not yet implemented, so it is possible that these copies will be saved off lines. This needs confirmation from MaidSafe.

As I understand it, there will be offline copies (whenever a vault goes offline), but under normal circumstances those aren’t acknowledged and accepted by the network again when the vault comes back online. David has been talking about network recovery after a huge outage, which can be detected by the average density of addresses being much lower than before. Under such conditions vaults reconnecting to the network shouldn’t be wiped immediately, but first checked for any “lost” data.

1 Like

Used that quote to revive the discussion on storing data correction.

If the reference (see below) is still valid then when the chunk is being transmitted to the vault there is a measure of this in place, but not stored, just to ensure chunk reaches the vault quickly and intact. This is not storing extra error correcting info in the vault, just each chunk transmission error detection/correction.

The Rabin’s Information Dispersal Algorithm is not implemented and, possibly, never will.

1 Like