Data Density Attack


#61

The only issue I see with this is that under an attack this allows the attacker to know they can attack the neighbors concurrently (a coordinated attack). Thus the potential is still there that a problem will occur.

Perhaps a solution is to ask one of the (random) neighbors and then the neighbor asks one of its (Random) neighbors and so on till limit hit.

It may take longer if the first neighbor cannot take the chunk, but still have some benefits of reduced latency.

In most cases I would expect the first neighbor asked to be able to take the chunk. So the extra latency by “hoping” the request would only be in the more rare case.


#62

Yes but not to any significant degree if the sections are relatively healthy (1 chunk in 10, 1 chunk in 1E10 or 1 in 1E100, it’s up to you). When they start to become unhealthy you ease into the indirection rather than everything going off at once, analogous to slow yield and expansion vs. an explosive fracture.

The general idea was to smooth out the response in the vicinity of your conditional criteria, so I apologize if I did a poor job of explaining earlier. A simple true or false about some criteria is the same as an analytic heaviside step. Instead, use of a smooth sigmoidal/heaviside function allows you to use a single scalar parameter to modify the behavior as you see fit. With the smooth approach, one can easily test how a more constant level of indirection vs. your minimalist emergency indirection, and everything in between, actually work out in practice. I am biased by mechanical analogy, but I think a smooth response regardless of the width of the region you end up choosing about the conditional would be beneficial. Hypothetically, if a section is approaching an unhealthy state / storage level under any condition, directed attack or not, the idea is that the “pressure” will start to leak off in a non-linear fashion to other sections and may buy some time for merges or other options. The indirection probability carried by incoming chunks naturally gives an indication about neighboring section health too.

If a variety of sections are unhealthy then at least the nodes start to get a bit of advanced warning sign before the flood gates are opened. This may allow you to make other better informed control decisions. How leaky you decide to set the default, and what the critical health level would be, is up for the devs to determine. Maybe higher levels/probabilities of indirection in particular circumstances improve obfuscation, or have other benefits, with a negligible performance hit due to caching or other optimizations. I don’t know you tell me, it’s just a brainstorming suggestion.

It also seems like indirection latency may be less of a performance hit than we might think, since no one really cares about a 2x or 4x latency hit in the case of data that is stored once and never looked at again, or once every X number of years. Caching handles the very popular data naturally minute by minute. It’s just us folks in the middle who get penalized by the bad actors. Qi_ma’s idea of reversing the indirection when sections become very healthy would require additional work but it is an rather interesting idea to help out those of us in the middle. Seems like you could/would recover any short term performance loss over time.

Perhaps an easy way to do it would be to have sections with health metrics above a certain level periodically look through their indirection maps and ask for a copy of the chunk to be transferred back. Maybe this could be made more intelligent by comparing their cache to their indirection map, and any hits would have the node request for the chunk to be transferred back. The originating vault could then delete its copy or just mark it as sacrificial, etc… (Are sacrificial chunks still in style?) I suppose you could make this procedure with respect to a “super healthy section” conditional also follow a smooth probabilistic transition too.


fuzzy-indirection
In the example image, I’m referring to some “absolute” measure or health criteria (free storage space, etc.) minus whatever hard constraint you decide as being the start of critical operation. The probability that a chunk stays in place (which I used in the toy pretend pseudo-code example) is one minus probability of indirection. The different colored curves are the result of changing the smoothness parameter.


#63

There’s some difficulty with this because the discussion involves both a ‘fixed’ network protocol and a ‘fuzzy’ control system.

My background is mechatronic engineering and there is definitely a strong ‘controls system’ element to the incentive structure - taking fuzzy human behaviour and a network state measurement as input then iteratively adjusting a set of incentives as the output to ‘control’ the feedback loop of behaviour vs health.

The network has positive and negative feedback loops with many variables contributing to them. To me, the targets are not clear enough (hence the work on network health metrics document), and thus the control system is difficult to design or model. Oscillations, positive feedback, harmonics, distortion, delays… these are not perfect network protocol things yet are important to factor into the design.

But I’ll say again, I have not yet formed any strong preference either way to using indirection vs safecoin.


#64

I think we need BOTH. In my considered opinion Its not a case of one vs the other, but as your attack shows we need to handle the (nearly) full scenario.

Safecoin provides the gradual control that @jlpell suggests is needed, since this (dis)incentives the adding/remove of vault storage according to the storage requirements of the section. But safecoin can only do so much as it is the humans that add/remove the vaults and we do not often act in a sensible manner. So then we need the protocol control to do indirection when needed. For instance the attack is one case when safecoin control system is unlikely to be effective.

The reason I am against the gradual coming into indirection is that it shows a fundamental misunderstanding of what is actually happening and what is being achieved by indirection. Each chunk is an independent item that arrives at the section. If the section has reached a critical limit in its storage then it simply asks other sections to store it. There is absolutely no need to gradually come into the situation. The control system for choosing whether to redirect a chunk is on a chunk by chunk basis and is basically a step function for each chunk anyhow. And adding fuzzy logic is really unnecessary and just complicates things. The section either needs to redirect that chunk or it doesn’t. The vault is filling, it doesn’t help to do one and not the next. Just start doing it when storage has reached a limit point. (was 50% in previous at home vaults).Its not a human interaction thing. Let safecoin handle the gradual bringing online storage before it gets critical.

Whereas the safecoin component is indeed wanting to bring new storage online in a gradual way. If it was a massive increase in price then we humans would bring online large amounts of storage then we get the opposite effect with large amounts brought online, the price then drops and people remove storage quickly and thus massive oscillations as @mav mentions. And could even see too much storage removed. So a controlled (gradual) increase/decrease is needed rather than a step function for this.

So yes gradual increasing of safecoin price as storage is being used up and as it approaches critical the RFC has the price increasing at a faster and faster rate till its around 1 coin per put as avail storage decreases. Then if storage becomes critical (eg attack) then indirection kicks in.

The reason indirection would kick in before 99.9% of storage used up is to allow for vaults to still come and go and for merges etc. I think in the last vault test at home the vault was considered full (for new chunks) at 50%


#65

And this is the problem. To bring it into a mechanical scenario.

I can think of a few things and each approach it differently.

  • household water tank. You only let the water go if the tank is full. You never “gradually” release water as it reaches say 75% full. The issue is that the water may never get to 100% full and to then add in gradual release of the water is to waste resources. To indirect early is to waste precious resource to handle unneeded indirection chunks.
  • items arriving randomly along a conveyer belt. It the packing box is full then the next box is filled up. A factory does not fill up multiple boxes on a gradual basis.

Its not like trains where you are concerned for the human’s comfort and add extra services when the trains are typically 80% full. Chunks and the network just fill up disks. The disk management of keeping it to 80% (used to be) capacity for optimal OS operation is not what vaults are about. That is handled by the person not filling up their disk and not by SAFE software. The vault can utilise 100% if it wanted to. The 50% (from last at home) is do with network considerations.

There are a few things you have to realise and take into consideration

  • the chunk is sent to the section that will finally store it.
    • indirection is not increasing the traffic of chunks. Its not like the whole chunk arrives at the section and then the section says do i redirect it or not.
    • the request is sent then the section will determine if it has enough spare space for it or not and if not then asks other sections to handle it
    • Only after the actual section is found is the chunk sent to it to be stored. And the orig section records the indirection information
    • the network will have increased load when retrieving the indirection chunk since two sections have to reach consensus and so it is of utmost importance to keep indirection to a minimum.
  • The traffic from the redirection requests made from the section are small, but consensus will take time.
  • Each chunk is its own entity and a decision of whether to redirect it or not is a step function for that chunk. You do not split it up obviously.
  • Safecoin will handle the “fuzzy” stuff to get more storage

Because there is no major network traffic differences in the storing of the chunk being redirected there is no real need to do any special traffic flow for indirection. The indirection requests are the same as any other messages being passed between sections and the differences in traffic flow is minimal. Its not like we expect a section to get millions of chunk storage requests per second.

In an attack the fuzzy logic would be passed almost instantly (a few seconds to minutes) and every chunk redirected anyhow. So no use for helping traffic flow there. In every other scenario the traffic flow differences between indirection and not redirecting is small.

This is where the safecoin comes into play. It kicks in raising price at a very early stage and does the gradual control you need. It is approaching the problem from an angle that is analog in nature and very suitable for control systems. The redirection is really a digital since no real difference between indirection or not on the network dynamics.

Remember safecoin control system is two pronged. It makes the costs to store higher this disincentive to add more data and then it increases the farming reward rate which incentivises for people to add vaults.


#66

Agreed.

Agreed.

I think you misunderstood my intention, or I wasn’t clear enough. So far we’ve considered the following cases in this discussion:

  1. No indirection.
  2. Emergency Indirection.
  3. Constant indirection.
  4. Smooth/Fuzzy Indirection.

In all cases the lesser number of indirection hops (~1) the better, but indirection is reversible if given time. However, indirection reversibility really only makes sense in the Emergency Indirection case, since it concludes that indirection offers no other benefit than protection from the data density attack, and should be reversed to minimize latency (and the larger costs involved by requiring consensus by multiple sections) when the attack is concluded and section resources are plentiful.

If viewing indirection from the perspective that it has no other value than to protect against a singular “data density attack” then I agree with you that emergency indirection is the only way to go. The question/hypothesis I was probing was in hope that someone could identify how more liberal usage of indirection can offer any other security benefits for the extra work required (consensus/routing). And if so, what the cost/benefit ratio looks like. From your perspective the costs are just too high to warrant any other use than a dire emergency. Ok, point taken, agreed, 99% probable you are completely correct in your assessment. I didn’t say I know the answer is different, but was just willing to ask the question if it might be; hoping it might spark a brainstorm about some ideas on the benefits of general indirection, not just the performance drawbacks. Maybe @mav will be inspired to think of another attack vector for us to consider. I do recognize why the initial view is that it’s just a drag on the system more than anything else. With your experience it may be easier to see that there are zero general security benefits, but it would be nice if we could at least take the time to consider it explicitly in thought experiment even if no one ever actually ends up testing it. We don’t always know what we don’t know. From an implementation/experimentation perspective the fuzzy route is a generalization that allows you to test all approaches by modifying a single scalar parameter, the niceness of that feature for experimental/optimization purposes was mostly what I was trying to convey.

At this point I’ll ask again for clarity: can anyone come up with examples of how additional more frequent use of indirection can offer some security enhancements? Are they benefits that outweigh the consensus cost? If there are no other ideas, ok, no problem, moving on. I know I don’t really have any that immediately come to mind, but see it as an analogous operation to other network obfuscation principles which have good justification. I also don’t have @neo’s intuition with respect to just how expensive multi-section consensus is.

I would prefer to look at it as a generalization of the emergency indirection approach that allows for finely tuned experimentation in case more general forms of indirection are found to offer benefits that justify the costs. :grin:

Considered, yes. Obviously. Good list.

This is where my lack of experience/intuition with respect to vault operation derailed the thought-train and in that moment I was operating under the assumption that “the safecoin effect” would react too slowly to improve things enough, so a more local “micro effect” similar to safecoin might be helpful. Yes, rapidly rising PUT prices as a result of a density attack are going to put a damper on things rather quickly, likely way more than adequate, and simple emergency relief via indirection that is reversible at a later time is a nice contingency plan. It will be fun to confirm this with testsafecoin.


#67

If you are talking of other uses involving a fuzzy logic thing for indirection then I think @mav touched on it when mentioning the dynamics in control systems.

You would then be creating two control systems that are affecting each other. The safecoin control system and your fuzzy logic indirection control system. You could create oscillations that make farmers dizzy as they add vaults one day since rewards are high then remove them due to no coins rewarded the next day. Or your fuzzy indirection control starts 100% indirection while prices drop. All the delays can cause the two control systems to be out of alignment with each other. Its better to have set logic for when indirection is done and when its not. Even for others uses of indirection since redirection is only acting on the protocol and nt trying to regulate other things (like humans and their behaviour in normal uploading or in attacks)

Unless you are willing to sit down and clearly define the precise functions involved then I doubt i would be risking it when there is not yet a purpose for it.

But I still would be skeptical that there is a reason to introduce fuzzy control system for indirections.

But I do agree that if some significant security benefit was to be gained by always indirection that exceeds the cost of always indirection then go for it. OR maybe sometimes indirection where you can define the reasons for indirection and create a simple or reasonable test for doing or not doing indirection.

Although I am at a loss to think of one at this time.

In normal operations where a section is filling up then the safecoin effect would likely be enough in most situations. The increase in cost has an almost immediate effect as people slow down their uploads because of the rising costs. Then slower is the vaults being added (coming online).

Thats the expectation anyhow. So it’d be good to know indirection would kick in before it got bad.


#68

Me too… other than completely confusing anyone thinking about directly attacking (or using) the network, let alone trying to do so. :sweat_smile:


#69

Another usage of indirection highlighted earlier in this topic is to prevent a vault owner to know the ids of the chunks it is storing. The aim is that he cannot launch a client issuing get requests on them to game the reward system.

This would be a constant indirection, complemented with data encryption as emphasized by @Fraser so that original id cannot be recreated from stored data.


#70

I thought there was another method to thwart this?

Even if not then that user would have to have a number of things work in their favour.

  • They need to have the fastest response enough times to be worth it.
    • But the hops are random so what is the fastest one time isn’t 7/8ths (depends of network locations etc) the other times.
    • even a 40Gbps link doesn’t really help because of the location of the other nodes and hop nodes
  • Caching kicks in before you get very many rewards. Possibly even before you are rewarded one coin.
  • This only makes the attack slightly harder.
    • The user would only have to try a range of public files till they got a hit in their vault.
    • the attacker has a large vault and simply scans through the public files till he detects a corresponding hit on her/his vault. Check and then do attack.
    • security though obscurity is not security

Is constant indirection worth the minor hicup for an attacker. Especially when the network already has defense?

tl;dr

Constant indirection protection is simply undone by

  • the attacker has a large vault and simply scans through the public files till he detects a corresponding hit on her/his vault. Check and then do attack.

The attack already has a defense and that is caching. The attacker has to do enough GETs in order to get a coin. Possibly 8 times the GETs for the number of hits on their vault. But caching kicks in fairly quickly when one chunk is hit many times.

Better to protect against this by not allowing the vault owner any ability to decode the IDs on their machine. Basically the vault owner is not a party to the encryption used to store the chunk on their particular vault. This is quicker and better than constant indirection which double the consensus work for every chunk stored AND retrieved. And for what savings when caching mostly defeats a profitable attack using this idea.

Remember indirection while easy is a major drain on resources because it involves two sections coming to consensus for each chunk stored and retrieved. Basically if a static defense can be used then it should over always indirection.


#71

Yes, I thought this would be a nice feature of indirection too, but like neo…

I recall reading somewhere on either this forum or safedev that the plan was to stop this type of attack via other means using encryption and caching as neo has pointed out above. Seems like indirection might be a nice easy way to help if not too expensive, I think it’s ok to have redundant diverse mechanisms at play… that’s how living things operate. That’s why I thought it was important to try and see if we can brainstorm other attack vectors or problems that might be solved via more liberal use of indirection in order to “amortise” the cost, since it looks like emergency indirection will be needed anyway for the case of the data density attack. Have hammer -> let’s find some things that look like nails, etc.


#72

I am not sure of this. This topic is beginning to be very long and so I may have missed something, but I only found these elements::

And they are variations of the same solution (constant indirection). Encryption is not mentioned there but is necessary to avoid recreation of ID1 from the stored data.

This is true only if the attacker issues GET requests over and over the same few ids. This isn’t true anymore if he is able to issue GET requests over all the chunks in his vault, which is what this solution is trying to prevent.

With this method he can get only a few ids, which is the case where caching kicks in.

I agree with this principle but not for encryption, otherwise the same could be said about chunk encryption done by self encryptor or communication encryption done by crust, which is clearly not true.

In summary:

  • constant indirection with encryption is a solution to reward gaming

  • I don’t know if a better solution has been found elsewhere and proved correct (I am referring to myself with my failed attempts in earlier replies to avoid encryption)

  • it is also a solution to data density attack


#73

Yes there was

It was to stop people knowing what was on their vault. Not sure precisely waht was the method. But it would stop this attack too, even though it was to prevent any investigation knowing what was on a vault.


#74

Even over 10000 IDs, you will not normall be getting a coin for a few thousand GETs, much much more I’d say or else all the coins would be given out in the first year under typical use. So caching would likely kick in. And if you go slow enough to see the chunks fall from cache then what a long wait for coins in an “attack”

There was definitely suggestion by David to “double” encrypt where the vault has no idea of the chunk and cannot hash the chunks either to determine the ID.

This attack was discussed at length a long time ago and this was the suggestion to prevent both accusing a vault owner of storing illegal material and the GET attack.

The ABCs could self encrypt their library of illegal material and then run the IDs across your vault and then if any found accuse you. The solution was encrypt the chunk before storing and this solved the targeted GET attack you described


#75

@tfa

One thing you did not answer.

If you did do indirection on all chunks then what is to stop a slightly longer attack and scan through all the public files and look for a hit on your Vault. Then do the attack anyhow. So indirection DOES NOT solve the attack, it only makes it take a little longer before you can get results.

Even better indirection still leaves the chunk intact so you could do a hash and generate the original IDs on all your chunks and then just do the attack as you describe

So indirection or double indirection or multiple indirection does not solve it at all.

If you then do (double) encryption then forget indirection and just do it on the current system


#76

This seem the right path. Would prevent several attacks, including data density and reward gaming, and add legal security to every farmer. Something very important in these times of constant loss of rights.

What I do not see clearly is how to implement it and the computational cost that it adds.


#77

I suppose that first encryption is the one done by self encryptor.

The second encryption cannot be done on final storing vault, otherwise a modified version of the code could reveal original data (original content for a MD, encrypted content by self encryptor for an ImD).

This means there must be another vault that does it. Hence, this is probably a variation on the same solution that was proposed by David (constant indirection with encryption).

I would say several orders of magnitude longer instead.

The waiting is active in my attack, meaning that while the attacker is waiting for a chunk to disappear from cache, he can send get requests for other chunks. Let us get some figures (examples that simplify mental calculation):

  • the vault has 24000 IDs

  • The attacker can the get the source ID of 24 of them by scanning through all the public files he knows about

  • An item disappears from cache after 1 day when it is not requested

If constant indirection with encryption is not implemented, the user can loop over the 24000 IDs and requests 1000 gets per hour without hitting the cache.

If this solution is implemented he loops over the 24 IDs he knows and requests only 1 get per hour. Furthermore, as the IDs he knows are only of public files, these IDs have a higher probably to remain in cache because other users might also read these files.

Of course, real figures will be different, but there will be several orders of magnitude between the rates of gets the attacker can issue with and without constant indirection with encryption.


Next step of safecoin algorithm design
#78

No, because constant indirection with encryption means that data is re-encrypted by the first vault before sending it to the second one. So, rehashing this data will give ID2 and not original ID1.


#79

Thanks tfa, here was some other related discussion from a while back:

To be specific, starting at post 8 :


#80

Dirvine is essentially talking about constant indirection from the vault manager to the vault then, is he not? Is our current discussion about the data density attack simply reinventing the wheel or have we identified another valuable layer of indirection to stop someone from being about to " go to lengths beyond what a normal user would do"?