RFC 54 - Published and Unpublished DataType

oetyng · July 1, 2019, 3:03pm

I wonder what the rationale is for locking the optimistic concurrency to a data structure by dividing them into Sequenced and Unsequenced.

To me it seems like this could be much simpler and more flexible, by skipping that subdivision entirely, and just provide an ExpectedVersion parameter. This is a standard when working with event sourcing streams I would say.

You can look at my AppendOnlyDb at GitHub, where I’ve implemented it like this.

ExpectedVersion.Any => Always writes.
ExpectedVersion.Specific(u64) => Only writes if version matched.
ExpectedVersion.None => Only writes if empty.

Then we can get rid of a bit of this cognitive load with the type count explosion, and shorten the names, and not limit use cases (I might want concurrency check sometimes and sometimes not for my data structure instance, too much assumed here).

oetyng · July 4, 2019, 5:19pm

Instead of using Sequenced and Unsequenced as data types, I would much rather prefer to be able to set concurrency check on a case by case basis, by having it passed in as parameter.

That gives flexibility for the user.
Fewer data types.
Avoiding these long names.
Overall feels cleaner and simpler, yet more powerful.

Since implementations are started and RFC is still open, I wonder:

Is there a hinder to doing that, and what are the reasons for not doing that?

Lindsey · July 22, 2019, 2:07pm

Final Comment Period

The Published and Unpublished DataType RFC will remain open for the next 10 days to allow any final comments to be made.

Thank you for your contributions!

jlpell · July 23, 2019, 1:58am

Is there a reason why they can’t be called Public and Private data types? Published and Unpublished is a tongue twister and difficult naming system to use in a sentence or whimsical online banter. Written down the type system becomes unnecessarily long and unruly.

happybeing · July 23, 2019, 10:04am

Or Published and Private?

SmoothOperatorGR · July 23, 2019, 10:51am

what do you call an unpublished data that you share? private?

jlpell · July 23, 2019, 12:20pm

Shared data.

Private data is owned by one person/account and not published. When describing the unpublished data type the RFC describes it as being private. So why not just call it “private data”. Much shorter and sweeter, and self explanatory.

Same thing for published data. It is described as being public data. So just label it “public data”. Again this is short and sweet and self explanatory.

Data that is unpublished but has multiple owners is shared data. This is how it is described in the RFC. So just call it “shared data”. Short, sweet and self explanatory.

jlpell · July 23, 2019, 12:31pm

Public and Published have the same root. Public is shorter and self explanatory. Published is not self explanatory. Public is the better label/name for the class of data.

dirvine · July 23, 2019, 12:36pm

Private is more pseudoPrivate though. The vaults can see it, I mean we don’t enforce encryption there. So there is a marketing/PR issue with private, perhaps?

It cannot be successfully published on SAFE though. Public is perhaps simpler, published == public to a great extent. Private/unpublished is a bit more tricky. Not simple, but we do need to make it simple, but not misleading.

nbaksalyar · July 23, 2019, 12:42pm

This might be confusing because we already use ‘private data’ for referring to encrypted data which can be published at the same time (e.g.: private/public data in Alpha 2).

oetyng · July 23, 2019, 1:09pm

Can the distinction between private and secret be useful?

Public
Private (i.e. pseudo-secret)
Secret

dirvine · July 23, 2019, 1:13pm

I suspect 4 things at play here (possibly 3).
Published
UnPublished (but some could be later, not mutatable, only append and static)

Plain/readable
Encrypted (secret)

This is the issue we can have secret data that is either published or not and likewise we can have publically readable data that is published.

The vaults don’t/can’t enforce encryption (ignoring how they store it etc. from the users perspective they don’t from the farmers perspective they do, but this is all user we are discussing), so clients are the ones who decide secret (encrypted) or not.

happybeing · July 23, 2019, 1:45pm

Can you elaborate? I have a spreadsheet of my finances, a diary, a file of passwords… stored in my private folder, and the vaults can see it?

dirvine · July 23, 2019, 1:52pm

If you store that plaintext (from the low level API) then it is plaintext everywhere. The vaults storing it will have it obfuscated, but in flight then the nodes getting the message will see it, unless you encrypt it yourself. This is where high level API’s should ensure all data is encrypted, but folk can bypass that in the low level API easy enough. Vault don’t care really it is only data to them and they look after it.

If we ignore data at rest where vaults will take steps to encrypt on the holders etc. as a farmer protection. Then the vaults only see bit and bytes of stuff, they cannot enforce encryption, they can enforce formats (like the reserved types).

Hope that makes sense. The network can be used by a bad app to store data without any encryption, even bypass self_encryption and so on.

happybeing · July 23, 2019, 3:26pm

Thanks, I forgot about the difference between low and high level APIs.

bzee · July 23, 2019, 10:11pm

I agree that ‘published’ and ‘unpublished’ don’t seem right. What about ‘restricted data’ (as in guarded from access)?

oetyng · July 23, 2019, 10:19pm

This data type is further sub-divided into two categories, Sequenced and Unsequenced. For Sequenced MutableData the client MUST specify the next version number of a value while modifying/deleting keys. Similarly, while modifying the Mutable Data shell (permissions, ownership, etc.), the next version number MUST be passed. For Unsequenced MutableData the client does not have to pass version numbers for keys, but it still MUST pass the next version number while modifying the Mutable Data shell.

Fundamental to this concept, is the version. I wonder, why we need to expand on the vocabulary, by introducing Sequenced and Unsequenced, when the most natural and closest related naming would be Versioned and Unversioned. This is an existing nomenclature that seems suitable.
I believe there is a long standing pattern in the code base of inventing new words for existing phenomena, or very closely related concepts. This is not a good habit IMO. It obfuscates the technology, it’s concepts and principles, to new developers and others. I think it is important with stringency, and minimalism in the nomenclature, trying to keep it small where possible (i.e., if we are talking about versions, use Versioned, not another different word to mean the same thing), and trying to keep it related and close to existing developer nomenclature.

oetyng · July 23, 2019, 10:27pm

Current structure and nomenclature:

Published / Unpublished

Unpublished

Private
– Encrypted
– Plain text
Shared
– Encrypted
– Plain text

Published

– Encrypted
– Plain text

Alternatives (same structure)

Private / Public

Private

Secret
– Encrypted
– Plain text
Shared
– Encrypted
– Plain text

Public

– Encrypted
– Plain text

Restricted / Public (1)

Restricted

Private
– Encrypted
– Plain text
Shared
– Encrypted
– Plain text

Public

– Encrypted
– Plain text

Restricted / Public (2)

Restricted

Secret
– Encrypted
– Plain text
Shared
– Encrypted
– Plain text

Public

– Encrypted
– Plain text

tfa · July 25, 2019, 9:12pm

I am not clear on the boundary between of low level and high level. I would consider self-encryption as low level, because it is automatically invoked when files are uploaded. The only problem is that small files (< 3KiB) are not encrypted.

I think users should be warned when they are about to upload private small files. This would give them an opportunity to check that these files are not confidential. For example, the dry-run option of the CLI could indicate if a file won’t be encrypted.

Mindphreaker · July 25, 2019, 9:24pm

I’m also not the biggest fan of Unpublished, however Restricted seems wrong too because Published is actually kind of more restricted, as this data can’t be deleted.

Intuitively I would say that I like the idea of splitting Private into Private and Shared and additionally having Public the most.

Topic		Replies	Views
RFC 55 - Unpublished ImmutableData RFCs	74	3586	August 1, 2019
[RFC] Data Types Refinement RFCs	104	5278	February 4, 2020
RFC - Unified Structured Data RFCs	12	3232	September 22, 2015
SafeDataType =>? Support	4	465	July 11, 2020
An Overview of the New Data Types Development	40	1998	October 21, 2020

RFC 54 - Published and Unpublished DataType

Current structure and nomenclature:

Unpublished

Published

Alternatives (same structure)

Private

Public

Restricted

Public

Restricted

Public

Related Topics