Data Density Attack

Just a wild idea: maybe we can discourage targeted puts by gradually increasing the cost of a put based on the amount of data already stored in the same section by the same user?

8 Likes

I tend to agree. If there’s to be indirection, I’d think it should be handled network-side rather than client-side. For example, if a given section is struggling to cope with the volume of stored data (or client accounts for example) these could be offloaded to a quieter section with pointers held by the original section. The pointers would map individual chunks/accounts, or address ranges.

If the original section is struggling for bandwidth, the client could be redirected to the new section and deal directly there. If the issue is purely disk space and not bandwidth, the original section could just act as a proxy and not bother the client.

There are a few other ways to address this too (e.g. archiving data, storing data with an expiry time and paying more for long-living data, detecting and punishing malicious behaviour like storing many ImmutableData chunks with very similar names), but as far as I know, none of these are agreed or fleshed out.

1 Like

Good call, although it’s maybe not necessary to even specify the same user. An attacker would probably just try and use multiple clients. Any significant imbalance in data distribution across the address space probably indicates an attack, so charging everyone more seems fair, since honest users won’t be hitting the affected range very often.

12 Likes

I thought about this, wrt data imbalance, but I was thinking about relocation. Not necessarily a good solution.
Increasing PUT cost when a section is nearing an upper threshold is one possible thing.
There could be other things too, more efficient alone or together with other measures.

Unless you meant everyone storing to this section, I wonder if this is not risky too. Increasing PUT price (or causing any effect) globally can be an attacker goal too, as it causes disturbances. Yes, it increases cost for attacker, and so maybe in this case makes it unfeasible anyway, but generally, allowing global effects as response to malicious behaviour can backlash. If possible, effects should be contained.

2 Likes

Yes - I did mean just for that affected section. So an attacker targeting a single section faces ever-increasing PUT costs since he’s storing all his useless data there. But an honest user would target an even distribution of sections and would be charged much less than the attacker since most of his data will go to sections which don’t have elevated charges.

7 Likes

I think this indirection should always exist, not just when a section is overloaded.

The hash of an ID chunk or an MD name wouldn’t be the address of the data anymore but would be the address of an indirection element that gives the real address of the data. This final address would be computed from the hash of the initial address concatenated with an invariant of the section, agreed between data managers at creation time (possibly rehashed several times until the destination section is far enough).

This indirection would also solve the problem of data managers storing data whose id is the same one that is needed to get the data. An attack to get free safecoins is currently easy: launch a vault and a client program that issues GET requests over the ids stored in the vault.

Three years ago there was a topic about it. Gaming the farming reward. AFAIK this hasn’t been solved yet.

3 Likes

Your suggestion should certainly help make a data-density attack harder. However, if the protocol used by the vaults to decide on the final name of a given chunk is fairly deterministic, an attacker can just accommodate that when creating chunks and still be able to target a single section.

I’m also not quite clear on how this suggestion would thwart the farming reward attack. Unless we try and hide the real chunk name (and so also the contents for ImmutableData since we can easily deduce the name from the contents) from the vaults storing chunks, wouldn’t it still be easy to generate GETs for all the data your vault stores?

2 Likes

That’s not easy because the condition is dynamic and only known by the data managers (invariant wasn’t an appropriate name). Certainly much harder than an offline generation.

My proposal is the following:

  • client asks for ID1

  • vault managing ID1 stores the corresponding ID2 and ask for it

  • vault managing ID2 returns the data (possibly directly to client)

The second vault gets the reward on ID2 but doesn’t know that he has to issue a get on ID1 to fetch the data.

The 2 vaults should be far enough so that in case of merge the 2 ids don’t risk to be stored in the same vault.

2 Likes

What I understood back then is that the chunk ID stored locally would be different(a hash of…) than the real ID that represent an address in XOR space. It was the data manager that was responsible for mapping both together. The local farming computer had no idea which real chunk ID it contained. So you couldn’t just spam the network with GET request to exploit the reward system since the ID you would see wouldn’t be the real one.

I don’t know if that is still the case or relevant anymore.

5 Likes

We’re definitely at risk of going a bit off topic! Anyway, to make sure I understand properly, let’s say the client starts by wanting to store the string Laphroaig as an ImmutableData chunk. (yes, it’s whisky o’ clock here! :smile:) The name of this chunk (ID1) is the SHA256 of Laphroaig which is e85...

When the vaults covering e85.. receive the PUT request, they generate some psuedo-random new ID2 (let’s say 6ff..) for this chunk and they keep a record e85.. → 6ff... They forward the actual chunk over to the vaults at 6ff.. who then store Laphroaig under the key 6ff..

However, the vaults storing the data - the ones which will be rewarded via farming - can regenerate ID1 by hashing Laphroaig. If we want to avoid this, I imagine we’d need to do something like having the first vaults encrypt the data before sending it on to the second ones. But that means we have to pass the data back through these first vaults on a GET so it can be decrypted again for the client.

As for the data-density attack, if we can find a way to make the generation of ID2 unguessable by the client, even if it has a single malicious colluding vault in that section, then I think we’ve got a fix for the attack. I’m just not sure myself that there is such a way. We can certainly make it harder for an attacker, maybe so that only a fraction of his PUTs actually end up in the targeted section, and that along with the safecoin cost might be enough to deter the attacker. Definitely worth further investigation though; maybe there’s a decent approach here I’m missing!

3 Likes

If there was delayed creation of ID2 I think it could be achieved; the chunk is stored ‘normally’ using ID1 until a new block in the data chain is found for that section. Since the content of that block cannot be known in advance it can be used as a secure random number source. The randomness is used to derive a verifiable-but-random ID2 which can be where the chunk is finally stored. Both sections for ID1 and ID2 can verify the move is valid using the latest block.

7 Likes

Nice idea! Using the next block rather than the current should make things as good as impossible for an attacker to predict or influence. There are a couple of drawbacks though I guess.

It’s a shame to have to build in this indirection for every chunk stored since it comes with some overhead (latency in chunk ops, bandwidth, code complexity). Most blocks wouldn’t need this if most clients are honest. If we only reserve this mechanism for when a section is getting swamped, then it might be simpler to just take an entire tranche of blocks within a single address range and pass those to another random section.

That probably wouldn’t require the random relocation target to be hidden from the attacker since as long as the same target isn’t chosen every time, the attacker’s clumps of chunks will get dispersed evenly across all the sections. That means we wouldn’t have to delay the transfer until a new block is added to the chain, we could start the transfer as soon as it’s needed.

The other less severe drawback I can see is that it could be quite a large pile of chunks that get relocated when a churn event (i.e. new block) happens. Given that a section could already be working relatively hard to accomodate the churn event, it’d be good if we could avoid adding to that workload.

Having said all that, I still think using the next block as a source of random data is a great idea.

7 Likes

Seems like it would be impossible to launch such an attack against the datachain prior-chunk indirection fix you describe. However, it also makes me wonder if just using SHA3-512 hash would provide such a significant increase in difficulty to the attacker that double SHA256 indirection would be unnecessary.
I guess you could argue that eventually a single SHA3-512 could be just as susceptible to the density attack though…

1 Like

An address book is an index - really

I am not talking domain names here.

But I am talking of database, token, and any other sort of APP that wants to KNOW the address of the dataobject without referencing an index (address book)

  • database - has its optimised indexing structure. Your “address book” then means another indexing layer on top of that making the database 3 to 10 times slower.
  • tokens - Instead of using the token number to access the token the APP now has to index into the “address book” (primary index) to get the token’s actual address. The token may not be produced in order so this “address book” now has to be indexed, or shuffled each time a new random token is generated
  • Other APP that deterministically determines address - Now has to maintain an “Address Book” (=== INDEX) in order to access the data.

This means that all those programs (ie most) will now need an extra layer of indexing/shuffling in order to process MDs as is intuitive to do. This requires extra MDs to store the index (address book), extra processing in the network and extra cost for EVERY App that wants to access specific (numeric/binary) addresses.

MDs are MORE than domain names

Exactly. Hide the process completely from the APPlication and then this allows for deterministically determining the MD address (numeric/binary)

ONLY if the user request time deletion. Of course the network does not use time does it :slight_smile: You were seeing if we actually read your posts weren’t you :slight_smile:

As to @tfa’s idea of indirection. The indirection is only needed if the section is being loaded down. So while its an extra part of the code you could do this

  • if section is fine storage wise then simply store the MD as normal.
  • If the section is getting loaded down (certain %age) then set indirection and pass off the MD (or chunk) to be stored elsewhere
  • record the indirection link. And when user request s the MD (Or chunk) then the first section is queried and if it holds the MD/chunk then simply return it. Otherwise it passes the request to the section that now holds it.
  • It is possible to have the indirection recursive (with a limit obviously)
  • If this causes a recursive “loop” (hits limit) then the MD is not storeable at this time

This allows for dynamic control where the indirection only occurs in situations where the section is loaded down.

REMEMBER that when a section’s spare space is low the cost of “PUTS” automatically rises so in a loaded down situation the cost will rise anyhow (current model) when storing in that section

@mav I am wondering if the fact that the storing cost is controlled by the spare space that this attack would end up seeing the attacker eventually being charged one coin for each PUT when the spare space is quite low. And this is under the current model for charging and without any suggestions above.

6 Likes

Or, the self encryptor could generate a random ID1 instead of deriving it from the content. This way vault 2 cannot guess ID1 and the data could be returned directly from vault 2 to client, without transiting on vault 1. Probably a checksum should be also generated to check that data hasn’t been tempered with.

self encryptor runs on your PC and is opensourced so it can be manipulated

1 Like

While I’m often guilty of this myself, seems like the dynamic double indirection approach, although effective, would veer away from KISS rather significantly.

Seems like this hits the nail on the head. Safecoin to the rescue. Just for fun I’m now going to give another rally cry in the hope that someone else will jump on the SHA3-512 bandwagon, or convince me to jump off it, or push me off it… :grin:

3 Likes

Not sure significantly.

The loaded section simply generates a new address based on whatever (datachain block hash?) and sends the MD/chunk off to the new section as any other MD/Chunk is. So the code is one subroutine/function and send if the section is loaded. So still very simple just an extra step.

Obviously the request now has a indirection counter for detecting too many indirections.

Yes, sometimes we get caught up in trying to figure great ways to prevent attacks and the simpler answer is the back pocket nerve.

2 Likes

Busted again! :smile: OK - I’m definitely not pursuing this here - that’d certainly drag the thread way off course!

+1

The key difference to @adam’s suggestion above is that he’s talking about the amount stored rather than free space available. The free space isn’t as good an indicator of malicious storing as it could equally be a result of a random cluster of particularly high-capacity vaults. Also, the amount stored can be more accurately measured.

2 Likes

Yes, but it doesn’t matter because the attacker cannot control ID2. It can still control ID1, but we cannot do anything about it with offline generation and it cannot overload a vault with indirection objects because they are very small.

1 Like