Unencrypted Data Question

From previous forum reading: The safecoin element will provide economic incentive for folks not to mess with chunks they store.

Delete certain stored chunks and earn less SAFE, correct?

Removing stored chunks can get your client flagged and kicked out of the “close group”?

But where is the deterministic encryption

Its a case of

file → self encryption → chunks (this part is deterministic by you or anyone
chunk → encrypted by data managers → sent to vault

The “agreed” key by the data managers is not known by you or anyone and only if the seed value used to create the encryption key is known by you can it be deterministic. So it is reasonable to say that the seed comes from some (semi) random network events that stop it being known to you/me

Remember that there is meta data held by the data managers that can hold this generated on the fly key which the vault will never know.

3 Likes

True, but al it takes is a group of “ban bad files” to run otherwise healthy/behaving vaults and only when they can see all the copies of chunks in their controlled vaults do they delete the one chunk form the various vaults. They are not concerned with losing a safecoin, and they simply restart those vaults to start anew

Maybe the vaults should use a DHT table for WoT IDs and incorporate that as a sort of proof-of-stake system. Bad actors lose reputation and make less as their reliability/faithfulness decreases. Earnings power goes to the nodes that have the best long term behavior.

They do, but for security when a vault restarts it starts from scratch. No chunks and clean slate. So if they have 4-6 vaults restart in their war on bad files that would be a drop in the bucket, and they keep driving forth. We have proof of resource and that is good. THe penalty for restarting a vault is having to wait for chunks to be stored and requested which doesn’t worry the “ban bad files” brigade and may be an advantage since they get a new set of chunks to test.

I have been following this and am very interested in the topic, and @neo’s explanation of the mitigation.

So if a vault owner was given a list of encrypted public data, as it stands now, he could scan the chunks in his vault and see if he had any on the list.

This list can be generated directly from the uploader’s datamap - potentially the uploader and the farmer are even the same guy - for the sake of technical discussion, lets assume this is unwanted behavior though.

To prohibit this association between the uploader’s datamap and the chunks in the farmer’s vault, the data manager could encrypt the chunk a second time using a randomly generated key that is held by the data manager, or its close group, etc - but is never made known to the vault.

Then when the chunk is requested, the request is - following protocol - passed to the data manager, who then looks up the second encryption key and requests that chunk from the node(s) necessary. It then decrypts the chunk and passes it along under the initial (self) encryption.

  1. Does this effect the routing table by messing with the chunk’s (at rest) hash?
  2. Does routing - on a GET request - currently require the requested chunk to be returned through the data manager?
    a. If not, how, where and when would the second decryption key be made know to any who would require it?
    b. Who would require the decryption key?
  3. Would this really prevent this purported “unwanted behavior”?
5 Likes

Indeed a very interesting topic.:slight_smile:.

AFAICS there are two places where public data should be protected if you want to achieve plausible deniability for chunks:

  • Protect the Datamap, so that noone can read the hashes of the chunks directly

  • Protect the chunks, both in the vaults and in transit, so that noone can determine with a few requests which chunks belong to a specific public file

I am not a security professional, but I expect a solution might be possible with commutative encryption. That is, if an encryption is used where several layers of encryption can be applied, that can be removed in any sequence. To get this working two extra datamanager groups are needed, one for encryption and distribution, the other for decryption and forwarding.

First of all the Datamanagers create a self encrypted public datamap as usual. Then they generate a keypair K1 only known to them, for a second layer of encryption to apply to each chunk.

Those encrypted chunks are sent to the second group of datamanagers that create a second keypair K2, encrypt those chunks again and distribute them to the vaults as usual.

The requester must include a key KR in his request. The public datamap is send directly to the requester. Key KR and the address for the decoder group are forwarded to the second group of datamanagers. They forward both to the vaults, the vaults encrypt the chunks with KR and forward them to the third group.

The third group receives both decryption keys. K2 from the second group and K1 and the address of the requester from the first group. Both keys are applied, leaving the chunks only encrypted with KR and forward them to the requester. The requester kan now decode the chunks with his own key and apply the datamap to the decoded chunks.

Now I expect that to work a few times, but use it to often and statistics will beat you anyway. So this must be complemented by a way to change the second and third group after a while and a way to remove those old chunks. Altogether maybe more effort than what its worth :worried:

As for the protection of the public datamap itself… I have no idea :cry:

If anyone can shoot holes…:microscope:, be welcome :wink:

What about caching?

If I’m given a datamap or specific chunk hashes, then I - running a vault - can check my cache to see if I’ve been passing those chunks.

Sidenote: My friends and I can collaborate to see how popular our blogs our by scanning our caches to see if any of our chunks reside there.

It follows too, that I can change the code to reject any chunk that my vault is supposed to pass if it’s identified in the graylist - leading to lost chunks.

So with the above proposition, the chunk may be unrecognizable at rest, but not in transit.

1 Like

For the record here, a guy suggested fountain codes (or other erasure codes) to me to make the data more unique per user to make it less likely that they can identify them. Specifically he pointed to this paper (PDF) about authenticated raptor codes essentially, which they call falcon codes.

The paper is a bit beyond my comprehension, but thought I’d just mention it on this thread about public data plausible deniability. I don’t think it applies to the safe network nor would I like to see any pivot that may slow the project down. Just food for thought for anyone else researching this subject.

True, but you must be in the route of that data. So it opens more issues in tracking. This area for sure will get a lot of scrutiny during the Alpha releases. At the moment the chunks come to you in pieces at a time as well. A crude FEC type really and there are lots of opportunities for further obfuscation there. All in all an interesting area for research. I have a few ideas as well, but the key is caching, obfuscate per person and caching is dead in the water, unless we do something a bit smarter with dynamic obfuscation. This means you don’t know what you hold but when asked for it will know. So the request key xors with obfuscated keys you hold, when it xor’s to 0000 it’s the data, so schemes like that work, but we need to confirm the best one and if it is indeed an issue to solve.

5 Likes

I see 4 problems that need to be solved

  • When I PUT a chunk I don’t want my close nodes to know what’s in it (rainbow table safe).
  • We want caching to work, so all nodes need to see the same version of a chunk to make caching workable.
  • When I cache chunks I don’t want to know what’s in, because when some awful organisation breaks into my house they might read out my cache when I’m out for diner with friends. So we want the cache to be rainbow table safe as well.
  • When my vault is storing chunks, I don’t want to know what’s in. Again, rainbow table safe.

PUT the data
I find the datamanager that’s most close to the hash of a self_encrypted chunk and and use the public key from that node to safely send them my chunk. I think this part is already secure, although things have changed in the architecture. Correct me if I’m wrong here.

Chunk XORred before going into a Vault
Now what should the datamanagers do? They know several Vaults but they don’t want the Vault owners to know what’s stored. So they obfuscate the data by XORring it with a random 32 bit number only known to the DM-group. So this part is secure, the Vault-owner doesn’t have a clue what’s in the Vault, it’s completely rainbow table safe.

Caching and GET problem
This one is tricky, as we want as much caching as possible. So let’s see…
The DM request the chunk out of the Vault and unXOR the chunk to the original version. So here we have the chunk as it was after self_encryption. Now we receive a GET from a node. So this chunk will probably go over 5 to 7 nodes where it get’s cached.

They create a random 32 bit number again which is used to XOR the chunk (different from the one used to store the chunk!). Again we have a completely random unrecognizable chunk that only the DM can un-XOR. So when a node asks for chunk “ABC” his close nodes will forward the request to the right DM-group and say; hey we want the chunk with hash “ABC”. The DM-group says: okay, here’s the chunk but the hash is different as we did obfuscation. And they send the obfuscated chunk to the node that did the request. They also send a message to that node (encrypted with the public key of that node) with the 32 bit in it to un-XOR the chunk and get the real one back. What’s the purpose of this? If the file went over 7 HOPS all of them don’t have a clue what was in there. So they can’t perform a rainbow table attack. But still, whenever another node asks for the same chunk, cache is still working. Now there still is a problem, the request for a certain hash went a particular route, so people might still see that a chunk with a certain hash was requested. And they might guess that the other chunk that came by was holding that data. A way around this is to do the request for a chunk with a certain hash over a route that’s a little different from the one where the XORred chunk comes by.

Now one might say, I request the latest episode of GOT and I’ll make a rainbow table of these chunks so I will see them come by in cache and recognize them even when they’re XORred as obfuscation by the DM. But the DM can alter the 32 bit XOR data every 4 to 8 hours or something. Making it extremely hard to hold rainbow tables as they change up to 6 times a day (caching still active though).

2 Likes

In brainstorming mode :
I tried to search the forum but didn’t find something that would talk about not writing anything to disk.
Would it be possible to imagine memory-only vaults ? While it drastically reduces the amount of data that could live on the network, I wonder if it could be meaningful for data that is problematic enough that one can be concerned about deniability. Maybe a memory only version of the Safe network for highly exposed data ?

Would it solve the problem at all ? There would certainly be nothing to be investigated once the vaults is stopped, as the variables simply disappear ( one could even rewrite them many times before the vault stops to prevent further memory readings ) , but the problem remains that one could be “caught” hosting some chunks in memory while the vault is running.

1 Like

It’s a proposal, so the owner of a Vault never ever has a clue what’s in there. Only the DM know. And when they request the data from the Vault they un-XOR it and do the trick again before they send it to requester. So from the moment the DM have a chunk under their control it’s always obfuscated. Even when it goes over several HOPS which cache the chunk.

Yeah, could be several options when it comes to time.

1 Like

Yes this has been considered a few times (and should again). The main issue is data retention in event of catastrophic failure. I feel this one is not over though I always think about solar powered nodes etc. independent energy, satellite comms or better (entanglement :smiley: ) etc. we should keep probing.

1 Like

Yes, that’s indeed the question. I assumed XOR is faster. I also like the idea of taking workload of the DM and let the Vault do the work, but there’s the problem that a vault can find out what was in there after all, so I would prefer not to do that.

32 bit’s is just an idea, a Vault could make 4,3 billion alterations and un-XOR them all. They change 1mb chunk into 4,3 TB that way but at 64 bit we should be good. Or 160 why not.

Here’s another option that uses this idea to PUT data as well:

PUT data
Node does self_encryption on a chunk. Asks close nodes to PUT the data. Node pays some Safecoin and close nodes say: okay, give us the chunk. Before sending out the data it does an extra XOR using random 32bit. So now the original hash after self_encryption changed and the chunk ends up at a completely different group of DM. These DM get a message containing the 32bit to un-XOR and do the PUT again to the right group of DM. So the node that did the PUT is completely safe here, as the chunk passed maybe 3 to 7 nodes before ending up at a DM group. Only at the part from DM group 1 to DM group 2 the real hash (after self_encryption) is visible.

The second group of DM which got the chunk do obfuscation before storing it on a Vault as described before. And when a chunk is requested, they use the obfuscation as described earlier so that caching still works even while the chunks could not be recognized.

Of course this is all in theory :yum: so I don’t know if these steps are taking too much CPU and calculations. But the part where the chunks get obfuscated before they go to the Vault should be implemented IMO. But as David says they’re already thinking about that. Cache will be like 100 or 200 mb. in RAM so that shouldn’t be the biggest problem.

I keep confused by the persona’s to be honest. I thought the was a difference between the 2. But still confused :yum:. My sense says: I’m connected to a group of 32 close nodes. If I’m ABC and the group of 32 decide I’m the closest one to a piece of data, they are my DM when I store the chunk in my Vault. But that would imply my close nodes know a lot about me. So close nodes can always prevent me asking for a certain hash. So if a few of them are evil, they might spot I’m requesting some wikileaks data and block me from getting it.

This would be cool, but there must be a place where someone requests a hash. So the network must know where it is stored. The group of DM responsible for a certain chunk always know the real un-obfuscated version of the chunk. Otherwise, how would anyone know where to find the right DM to get the chunk?