Self_encryption and compression, questions and thoughts

I’d like to have the choice to not compress. I know what I’m doing when I’m using a computer, so forcing me to compress stuff makes me feel like I’m being treated like a baby.

Sounds like I’ll be able to write a program that allows me to not compress if I so choose, but feels weird to not simply include a flag to disable compression in a generic uploading tool.

1 Like

Not sure I’m following … my proposal was to send the SE+verify-code wrapped in an encryption layer + the semi-secret key … then the node decrypts the outer layer to see the global verify-code … if it can see the code correctly, then there must be a level of encryption in place. The semi-secret key can be disposed of - only the user needs to keep it.

If the node does SE, then it would have to give the key to the user - otherwise, you’d have to go through the exact node again -unlikely scenario. Also nodes private key for SE can’t be given out … so would have to be a separate key anyway … meaning they’d have to create and store key-pairs for lots of data?

The global verify code could be unique to the network as well - could be used to create forks.

Further wondering if XOR-URL can be created by the user on the original data not the SE, then that could be added with the verification code for the node to get … in that way de-duplication is based on the actual data and isn’t just unique to the user (as post SE, data is scrambled) … Or am I ignorant here and XOR-URL is already derived from original data and passed to the node?

Edit, maybe XOR-URL on original data is a bad idea as if one knew what the original data was, then they could find the data and work to go after those storing it. Sort of de-anonymizes the data. :frowning_face:

meh, as I said in previous post, I’m probably missing some obvious things here and just pointlessly speculating.

1 Like

Ah ok I get it. So the node would just make sure there is at least one layer of encryption, that makes sense.

I’m not a fan of adding a verification code to everything, so for that reason I like the idea of double SE.

The node wouldn’t have to give any key to the user because the user can already generate the key themselves by doing SE and then SE again on the result (which is what the node will do).


My understanding is that the XOR URL is simply a hash of the data, which would mean the node can generate the XOR from the data themselves, but I’m not read up on it.


It would be nice to have an up to date primer. Looking forward to that.

1 Like

If audio and video aren’t deduped then that’ll lead to a lot of redundant/wasted space will it not? Or would there be a process to check for that data first to hand the map over rather than just uploading the same stuff?

SE always gave me the impression of a level of file protection for the original producer/first uploader of content by the network only giving control to the original uploader.

2 Likes

It might well lead to wasted space if it’s not.

My understanding is that a user can even check if the data is already uploaded by generating the XOR URL for the data locally and then ask the network if it exists. Uploading the same data twice to the network should be literally impossible since it would end up in the same location (imagine trying to put two identical files with the same name in the same folder).

SE is indeed a file protection for the original producer, that’s my understanding too. But the issue here is if someone actively chooses to not encrypt their data, that could be an issue for node operators.

EDIT:
Sorry, I misread your last paragraph. If it’s what you write then I am not understanding how it works. I thought the control of the original uploader would be enforced through the permission system by the nodes on the network. This could all very well work very differently from how I think it does.

3 Likes

I’m still not following how that would work though. You aren’t likely going through the same node to get the data, so you would get encrypted data that you couldn’t decrypt. SE is a specific term for the user’s main private key being used to encrypt/decrypt their own data, so I think if the node is adding the layer of encryption themselves, then it would have to be a new unique key (cannot be a global public key as that renders it useless), and as you aren’t likely going to be getting the data via the same node, the user has to have a copy of that key … meaning tons of new unique keys needed to be stored by the user … but maybe I’m just not understanding your proposal.

yes, I’m unclear if it the hash of the SE data or the data itself (pre-SE) … suspect the former as would be a security hole if that latter I think. So de-duplication is relative to the user data, not all data globally I guess.

edit: thinking about it, it’s probably not much of a security difference as tracking XOR-URL to IP would be really hard maybe? Would be nice if global de-dup was doable.

2 Likes

I thought the SE was referring to the data being encrypted using a hash generated from itself. Otherwise de-duplication wouldn’t work at all on the network, right? Are you sure that SE uses the users main private key to encrypt?

If it works like I think, that means that all you need to be able to encrypt the data is the data itself, and de-duplication would work across the network, since all encrypted files are encrypted using their own data and would thus end up looking identical in their encrypted form.

This also means that encrypting the same data twice with SE would also result in the same data, so de-duplication is retained.

I’m fairly sure the XOR URL is generated from the encrypted data, otherwise users would get to decide where on the network their data gets to live.

3 Likes

You may totally be right. I’m not sure at all. edit: thinking further I’m pretty sure you are right … I’d forgotten about that little trick. I knew there was something special about SE, just been so many years since I thought about it, totally forgot!

1 Like

Haha, and I may be totally wrong. xD

Would be great to be able to know how this all works, I like knowing how things work. :smiley:

Gotta get that primer…

2 Likes

That’s how it works. File is chopped into chunks and first chunk encrypts the next chunk and so on, hence self encryption. That leads to a hash that is unique to that exact file which makes it easy for the network to check against globally to enforce dedupe.

My above questions remain.

5 Likes

Okay, so I formally apologize for leading everyone astray with my stoopid. :grimacing: :frowning_face:

I suppose double SE would work in this case then?

1 Like

If all chunks are handled identically, then XOR-URL space is conserved - if data exists at XOR-URL already, then nothing new is written. De-dup for the win.

So compression also has to be handled identically across the network - if done pre-SE.

2 Likes

No need to apologize! You’ve helped clarify some truths about how it does work (I trust Nigel at least xD ), so I’d rather give you thanks!

3 Likes

This is the bit I’m interested in. I knew it was possible to stream the chunks (was used a long time ago to show download progress) so I’m wondering how optimal SE is to media when clearly it has other benefits/protections to the publisher.

That last bit of new clients, is that like a CDN type client? Something for the future to speed up content delivery and streaming on the network??

1 Like

I kind of assume this as well but I don’t know for certain.

1 Like

I’m pretty sure this is how it works. Xor space has to be a single address space for all chunks, so writing data at a new address saves the data. Doing the same again has no effect.

4 Likes

So the perceived benefits of SE regarding file ownership and control also apply to uploading unencrypted just because of safe being a pk infrastructure and the XOR url stuff then?

The main point of default SE is that all data is by default encrypted client side before ever going into transit. Those other benefits are also applicable to SE but not exclusive. That’s what I’m gathering so far. Anyone want to confirm?

1 Like

My understanding, which could be wrong, is that if a file goes through SE and is uploaded again without SE you get different chunks, so different addresses, so deduplication only happens if data is treated the same way.

7 Likes

Yes. If public data then SE is all there is I guess. The concern being that if a particular evil? client disables SE and posts illegal data, then it ends up being unencrypted on someone’s drive. @lukas idea is for the node to just add a new layer of SE (at the node). I’m not certain how well that mitigates the problem - maybe fully?

If it’s private data, then I assume no evil actor is going to out themselves! So not relevant.

3 Likes

I dunno … oh boy, here I go again, preparing to put foot in mouth…

double SE means double hashes that have to be tracked - one is the XOR-URL, so no problem, but if node does SE, then how which hash - first SE or second is the XOR-URL, and where is the other one kept - as needed for decryption right?

Wondering if my original idea wasn’t too far off track. Just add verify code to data (post-compression), then SE, then node can undo SE and see verify-code … and hence know that SE happened.

That seems simpler. But I’m probably missing something again. :wink:

@mav, you are too kind for liking all my crappy posts! :wink:

1 Like