RFC 55 - Unpublished ImmutableData

Discussion topic for RFC 55.

17 Likes

I think unpublished immutable data is a fantastic compromise between delete/no-delete.

The Network SHALL enforce that the GETs are only allowed by the owner(s). For this we SHALL use the special OwnerGet RPC

Where on the route does this validation happen?

Seems like it’s covered in RFC-0054. MaidManagers do the validation within this route:

Client <-> MaidManagers <-> DataManagers

Might be worth including in this rfc a reference to the RFC-0054 Unresolved Questions since the issue relates equally to both.

4 Likes

I thought immutable data was not meant to have owner data.

ImmutableData will only have an owner if it’s Unpublished.

From the RFC

The published ImmutableData is the normal ImmutableData we have just now. There are no changes to that.

That introduces all the issues of identifying the owner doesn’t it. What if I find the first chunk and inquire the owner? Just search addresses for the chunk.

What about Dedup @maidsafe. I upload privately and then someone else uploads publically then its not the same is it and I become discoverable since the owner info is still there

I guess I better read the RFC

3 Likes

In these cases the validation is at the elder group of the holding section (datamanagers), the reason being the request goes all the way through signed. So on receipt the sig is checked, if that is the owner then the reply is sent to that owner ID. If we validate earlier then we could have replay attacks or weird things like the signed Get request being handed out to folk. So the Get is signed by the owner and delivered back to that owner, if that makes sense?

An aside

Something we need to get clear is un-published != totally private as vaults holding the data can read it etc. so it really means only owners can get it. So effectively private (you can encrypt etc. on top of this though). similar with identities (public keys), they are all created anonymously and if folk share the id then they are no longer anonymous. so there is confusion about public/private/anonymous/publishable/nonpublishable

You are pretty good at disambiguation Ian so as we go through RFCs any help there would be magnificent as I know I had issues trying to convey this originally and we have had things like publicId and privateId in code that did lead to confusion. The other confusion is accounts verses id, accounts can hold secretkeys etc. but id’s are public keys and you can have loads of them, whether they re public or not is the users decision. This part is also confusing for many I think.

7 Likes

Unpublished data will not have de-duplication. The address of the chunks are calculated as sha3_hash(hash_of_data + owner). So no two similar pieces of data will have chunks in the same location. Unless ofc, the same person uploads the same data twice as unpublished immutable data. In this case a conflict error will be returned.

11 Likes

This sounds like some data held by vaults is not encrypted - or do you mean something else?

3 Likes

There will be an RFC for obfuscation at vaults. Also if you use self-encrypt it is all good, but people could upload owned immutable data that is not encrypted if you see what I mean?

2 Likes

Hmm, this is news to me and I’ve been saying everything is encrypted by default for a long time now. See also (emphasis mine):

Fundamental #19: The SAFE Network will only ever allow encrypted traffic and encrypted services.

Put simply, everything, including web traffic is encrypted by default. Everything. This is non-negotiable for a Network that demands privacy for every one of its users. You can of course choose to make information public — but this has to be your choice alone. So this means you can be safe in the knowledge your data will always be secure

I think it’s a fundamental [cough] problem if unencrypted information can end up on vaults unless I write code do extra stuff to ensure that doesn’t happen, and is not what people will expect.

Personally I don’t think unencrypted data should ever end up on a vault unless it is explicitly public/published.

My understanding is that non public, non published info can end up exposed on a vault unless the developer explicit adds their own encryption layer. That’s non trivial (except for the MD case) - not that hard, just outside most Dev experience, so a learning hurdle. So in many cases I expect it won’t happen, and if I’m representative, many devs won’t even realise it’s necessary (and will end up misleading users).

9 Likes

That is still true. It has to be uploaded without the (default) automatic self encryption being done

4 Likes

Ah OK, I misunderstood that.

But I’m not clear now if this is what David means, so I hope for confirmation wrt fundamental #19. Have to say I’m surprised this didn’t make it into the top five! Seems an important point to get across to those who will read the Fundamentals.

3 Likes

You will see an RFC next week or the week after on this, well obfuscation. Vaults will not be able to decrypt even clear text that is uploaded a elders will encrypt it. We strongly advise against anyone uploading unencrypted data by bypassing the APIs but we cannot 100% stop a bad app doing that. We can ensure Adults (vaults) hold no unencrypted data though.

7 Likes

Thanks David. If using the APIs ensures an app conforms to Fundamental #19 I’m happy.

I assume by bypassing the API you mean using some of the lower level APIs, in which case I suggest the documentation for those specific API calls each make it clear that they do not encrypt by default, and point to what needs to be done to ensure encryption (a ref link to an explainer should do).

5 Likes

Yes we will doc these and hope any app doing it would be classed bad. The issue the RFC will prevent though is a bad actor using those APIs to upload bad stuff to vaults on purpose to try and break the network.

5 Likes

I’ve read RFC 54 and 55 a couple of times and also looked over a fair amount of Maidsafe code on Github. I figured I’d share some thoughts / feedback.

It sounds like elders have to get involved every time a user wants to PUT deletable data (“Unpublished ImmutableData”) on the network. That seems to imply:

  • Assuming a lot of people want to use this feature (i.e. for backups), the network could require a high ratio of elder nodes to vault nodes.
  • The network could have scalability problems with elders having to receive, encrypt, and broker messages for so many chunks.

I’m hoping next week’s RFC will shed some light when it comes out. On a related note, some of what I don’t yet understand is:

  1. How come unpublished ImmutableData can be unencrypted but published ImmutableData cannot?
  2. Why should network access (GET operations) be restricted? If the chunk was encrypted by the elder, it can only be read by the owner anyway, right? My assumption is that the elder uses the owner’s public key to encrypt the chunk, e.g. using an operation such as Sodium’s crypto_box_seal().

I had thought the same thing when I read the RFC yesterday and became concerned about the concept of controlling GET access at the network level.

I saw a similar point/question made in RFC54 - adversaries could collude outside of the SAFE network to share chunks stored in their vaults. I feel that part wasn’t fully addressed by @dirvine’s response. On the other hand, the question assumes that the data stored in the vaults is potentially unencrypted and readable by the vaults (the upcoming RFC is supposed to make vaults unable to read their stored unpublished chunks).

The following comment was helpful for me to better understand the motivation behind this RFC:

If all this is largely just to guard against vaults storing risky unencrypted data (I’m guessing illegal content uploaded by adversaries), it seems to me that even a vault could simply encrypt the chunk with the owner’s public key prior to saving to disk (i.e. not require an elder’s resources). The owner would have to undo another layer of encryption once they GET their data back, but I don’t think it should be a blocker (if anything, I think it would be easier and less overhead than developing an Owner-Get messaging protocol).

This could be developed to be a deterministic process (e.g. the owner generates a second keypair and includes the generated “private” key as part of the chunk). Multiple vaults could have the identical chunk saved (and know it’s valid). The owner’s software would be able to calculate the encrypted result stored on a vault’s drive and send a signed hash. This would enable the vault to replicate data to other vaults (along with proof from the owner that the encrypted file is correct). Also, in the future, a proof of storage feature could conceivably make use of this.

5 Likes

Obfuscation is not enough. Vault owners need all data to be encrypted to avoid liability for storing (fragments) of data that any government might consider illegal.

5 Likes

Elders don’t encrypt or break chunks. These tasks are performed on the client side.

The main differences with the current Immutable data are:
.-The header contains the set of owners and the reading and deletion of these data will be restricted to them.
.-Data will not be deduplicated. The address of the chunks are calculated as sha3_hash(hash_of_data + owner) instead of sha3_hash(hash_of_data).

4 Likes

From what I deduce, the Unpublished InmmutableData not only is non-transferable, but not sharable with someone who doesn’t own them. Neither can you remove or add an owner.

What is the benefit of using such a restrictive data type?
Wouldn’t it be more useful to handle reading and deleting differently and allow read sharing? (Could use, for example, a nonce instead the owner to calculate the address).
Does deletion require everyone’s signatures or only a majority?
As usual, the limit of 1MB per chunk is maintained and the client will manage it. Right?

2 Likes

Hmhmm - if the owner is an alias it is transferable and sharable to a selected audience

Just removing owner is still not possible - but its immutable so not really surprising then maybe :thinking:

3 Likes