Vulnerability for data deletion

I guess I am saying that for some the costs are not important. See “More than anonymity” thread for an example of the cost this person is putting in to try and convince people to build this very feature into SAFE

EDIT: I have slightly overstate building in this very feature. Its more building in a feature that would also allow this attack to happen automatically and when a chunk is stored only in nodes doing this then effectively this attack is activated on that chunk/file.

2 scripts

Creator:

  • Makes x copies of your content
  • Slightly changes each copy (say, by inserting a pixel in a random location in each image/video)
  • Uploads x copies to x sites
  • Deletes all the copies locally, leave the original for backup/repost purposes

Checker:

  • Checks if any of the files have been destroyed
  • Repeats the create & upload process x- times if some files are missing
  • Posts new file links to Twitter ridiculing the activists trying to take the content down

Seems like an awful lot of work to use a program that it supposed to secure your data in the first place.

2 Likes

I don’t contend it is hard, but that it is a cost.

Any cost is a barrier, and based on what you’ve presented I am not convinced this is an attack worth worrying about. Each time I think more about it, I am less convinced.

For example, looking at the other side of this: the payoff to the attackers.

What is the impact? How does this become known, significant? What impact does it have? To have an impact they would need to have something pretty devastating. Taking down a specific file we’ve agreed (I think) is very costly, so you suggested a random file, maybe a few files per year, even month. Is that significant enough for people not only to set this up but to be drawn into helping it and continuing for say a year or more?

Consider this. Most data is uploaded and never accessed again. Another big hit on the potential impact.

I’m not saying this is not an attack, just that you have not elaborated it enough to convince me that it is. You want to believe it is a valid attack - that’s good - but not a reason to mitigate. We need a more solid case if it is to be worthwhile expending effort because there are many potential attacks and we have limited resources to mitigate them.

IMO, if you are to help, your task is to do the groundwork. That means working out in more detail both how this would work and providing analysis - some maths - that shows the full impact of this, and explains why that is going to harm the image or actual usefulness to a significant degree.

I think if the network was to lost one random file per day (whether just due to a bug or an attack) nobody would notice. So you have to explain why even this would be significant, not just how many files (random or specific) could be taken down. You need to become the attacker - at least in your mindset - in order to do this, and to really understand whether they would choose this attack or something else.

You can’t just say, fanatics will do what it takes, so we must defend against this attack. Even if they would do what it takes, they will choose the most effective attack, and I’m really not convinced this can have a significant impact, even if it is feasible. And for that reason I don’t think attackers would bother with this.

As I say, that is based on what you have argued. If you can present a reasoned case that is more thorough, not just handwaving, then I may well be convinced. But so far I’m not.

I’m not trying to discourage you from pursuing this. If you think it is real I hope you will demonstrate it better.

Holder for explanation, that I’ll be building throughout the day… Work gets in the way of my projects. I don’t want to lose a draft.

I’m not talking about deleting rand files. While that would be significantly easier, there are not enough people that are willing to make chaos for chaos sake to have enough people to pull it off.

I’ll try to look up some statistics on amount of polarizing data on the internet (isis, kp, spam/ads etc). I’ll also have to do all analyses using percentages because we don’t know any kind of hard data figures, and they would surely be more arbitrary than percentages anyway.

Well, like @happybeing said, you’re and those who like the argument are more than welcome to give it a try.

We’ll see if I’ll get tired before you guys go bankrupt.
If you try hard enough I may eventually have enough funding to turn my script into a full fledged Content Management Platform for MaidSafe.

Not really sure what you’re getting at, as there is no monetary cost to this…

I don’t know why I’m spending so much time arguing this point. It sounds like the consensus is the attack is valid in theory, is not profitable, and potentially feasible. I’ve already proposed a fix to this issue, regardless of how hard it is now, we can make it nearly impossible by adding even a slight bit of entropy to the storage process by varying the number of chunks stored without any set magic number of chunks. That will make it so that you can’t know how many chunks you’re looking for. I’ll start looking in the code for where this is done and making a fix. It seems silly to argue about if it can or can’t be pulled off as is when we have an open source program…

I would bet money that an attack similar to this will happen sooner rather than later. I doubt the first wave of attacks will involve people making active decisions on individual files but rather will be an automated attack.

Who knows what the motivations of the attacker will be but if I put my tinfoil hat on I guess the likes of the NSA would be motivated to discredit the network early on, meaning there will be slow adoption and therefore it’ll be easier for them to keep an eye on a small network in the long term.

The fact that the network is purely P2P puts it in a dangerous position. There will be no way to 100% confirm that you are talking to an non-compromised node. Companies have spent millions in the past on security and it’s been cracked in days - I’m not talking encryption, I’m talking cracking software security measures. Who knows how nodes will be compromised; viruses, hacked auto-update process, etc. etc.

You have a certain level of comfort with centralised systems because you can gain reasonable confidence that they have not been compromised and if they are then dealing with the threat is much easier than in P2P because you don’t have to fix the problem in literally millions of places. Look at the problems MS have had in the past due to users not keeping Windows updated and the security issues this causes. They’ve been battling this for years and is a big reason why they’re rolling out Windows 10 (with ongoing updates) for free to people who are using cracked versions of Windows 8.

If you manage to compromise a high number of nodes then you own the network. This is going to be easiest when the network first goes live as there will be fewer nodes. If enough damage can be done then adoption will be low, the network will stay small and you can keep ownership of the network.

If one of the random files was important to you I think you would class this as significant. If it was me I would stop using the network and recommend other people do to. If it happened to someone with a name then it would be very significant because they’d tell the world not to use it.

1 Like

Really? And who said this?

An app could be designed to keep track of specific file names to known “bad files”.

And this (emphasis mine):

It seems that one could compile a list of all known data that they hold (by scraping public data, converting to chunks, comparing to what they hold in their vaults) and requesting that data from different places all around the world.

Let’s see:

  • Write an app
  • Track “bad files”
  • Rent enough VPS and buy a huge amount of bandwidth to download millions of files from multiple locations
  • Destroy your own vaults
  • (Profit?)

That isn’t what I said at all… Not even close. I’m not talking about a lone attacker Google attack.

I’m also not talking about scraping all public data. I’m talking about targeting specific data. Kp, confidential documents, ad files, anything a large group of people don’t want to support. Search that out and get the chunk names. This doesn’t require massive bandwidth, just time.

That list is looked at by everyone running the app. If they have it, they ask who else has it and count responses. (your emphasised quote isn’t even fm my post on this attack, but the safecoin generation attack, two separate attacks with vastly different end goals. One is distributed, one would be done solo)

Deleting would not destroy your own vaults, but would take a small bite out of anyone’s vault participating, distributing the “pain” to a little here and a little there, not a big chunk out of one person’s holdings.

Basically, you took both my attacks and combined them to argue it won’t work, but they have different goals, and are not meant to be combined. One idea just led to the other.

This I doubt. When you had a disk error and lost data, did you stop using computers, hard drives…? Also, you wouldn’t even know why the data was lost.

I think the polarisation stats will be of little value. I was referring to the details of the attack: how many nodes would be needed, what they would be doing, what they would break and how often.

As you’ve now returned to the “specific” file case, we know the stats on that (80% of the network to target one chunk) though you could attempt to modify this based on saying “any chunk of a file” and “we know the chunks because”.

To say their is no monetary cost is not really true, but not relevant anyway. There is a cost. As you have just demonstrated, even thinking about this and writing a response is something you are finding it hard to do! :slight_smile:

And suggesting we mitigate this because some people think it is valid and would be easy to do, ignores the cost of adding in code (complexity, bugs) when not necessary. There is a danger in coming up with lots of plausible sounding attacks and patching for them, when not really needed, and reducing the security of the system by so doing.

It sounds like you have already tired of trying to define this attack sufficiently well for it to be evaluated. It is hard isn’t it.

I know disks can fail so I backup important data. Cloud services do this for you as a service, and SAFE tries to do it as well with maintaining 4 distributed copies of every chunk. It (implicitly?) promises your data won’t be lost. I wouldn’t trust a commercial cloud service if it was known they would sometimes randomly lose files. Especially if the reason for the data loss is unknown! So if SAFE starts losing data randomly, it would be a significant hit to it’s reputation.


Anyway, I just considered that this attack would be pretty easy to sabotage by joining the effort and reporting false positives. The legit attackers will delete chunks, but because the saboteur lied the network still has copies, so the attack fails and the legit attackers are punished with (at least some) rank loss. It wouldn’t be hard to cause a lot of chaos like this.

Since the idea is that this attack would be attempted by a relatively large community, keeping the attack a secret won’t really be an option.

Hard drives aren’t marketed as a “redundant, self healing, secure” storage either.

I never left that idea, I’m not sure where you thought I did. I also don’t need to" modify" because I stated from the start my premise. Deleting any chunk of a target file off the network will corrupt the whole file because of the self encrypted nature.

Now you’re being insulting. I’m not tired, and I don’t find it hard. I work. I take my break and lunch and any spare moments I have during the day to try and explain. I don’t have the time during work to do in depth research.

And not mitigating an attack because some people don’t think it’s an issue is also a problem. Two way street.

You’ve taken my comments personally rather than in the way they were meant.

You haven’t established the parameters of this attack. You’ve started a discussion and some people think its a valid attack, and some don’t. I don’t see it as something worth mitigating against in its current state because I don’t see from what you have said, either how feasible it is (analysis) nor how if some (unquantified) amount of (specific or random - I’m not clear here either) was to be lost, how impactful that would be.

It is easy to say things like “if the network were to start losing data I’d…” etc, but it doesn’t make it true. If statements like this are going to determine the design of the network, they need to be set out in detail, quantified where possible, and reasoned clearly all the way through.

I made the point about hard disks not because they are the same as SAFE, but to illustrate the need for sound analysis, along with complete, and clear reasoning. This hasn’t been presented to my mind. You may not have time to do this (which is what I meant by you’re finding it hard, not that you are stupid or something personal about you), but it needs to be done if it is to be of value in strengthening the network. See for example the work @erick is doing: Open questions on the security properties offered by the XOR Space closeness relationship. Before anything is going to be done to mitigate a potential threat, it has to be based on something rigorous and that can be considered in order to assess its seriousness and decide what if anything to do about it. If you don’t do this, someone else will need to do it before it will go anywhere.

1 Like

It seems that it’s necessary manipulate around 80% of the nodes to achieve an specific behaviour.
Then, if the 80% of the farmers use an “unofficial” software that allow deletions on specifics chunks of data, it means that most of the farmers don’t believe in the official safenetwork client… and it could be considered as democratic. I mean that it could be assumed as an improvement, not an attack.

Figures like 80% aren’t meaningful without being very specific about the parameters and goal of an attack. Is the goal to take over a consensus group? Chained groups? Is it about deleting a specific chunk? Any chunk of a particular file (file size immediately becomes a factor)? Any chunk of any file in a particular set of files? Over how much time can the attack take place? Churn changes the data landscape over time and can be seen as an additional roll of the dice. Whether that makes an attack easier depends on the goal. For example, if you have five dices and your goal is to get three dices on matching numbers, every additional roll increases your chance of achieving your goal. Any change in the parameters changes the odds, possibly significantly. Likewise, every different attack on the network will require different types levels of resources.

1 Like

It would be a good time to remind ourselves that faith in the project may turn into fanaticism or dogmatism. It happens all the time.

Reacting defensively against criticism is the first symptom.
Criticisms should be welcomed, EVEN if it is non-constructive it can shed some light at a genuine problem.

Regards

1 Like

I agree. I for one am not acting defensively, if I may add.
I allow that attacks such as described in this topic may be possible.

What I am saying they’d be futile, because they would require a lot of resources for little (or no) benefit to the attacker.
Furthermore, this destruction of data would only make it more profitable to not destruct data, so it would attract more “honest” farmers and further lower the likelihood (and increase the cost) of repeated success.

2 Likes

It would be interesting to see it simulated in the testnet.
The experiments shall set us free

1 Like