Yes, we discussed this problem internally and we’ll be making the naming consistent sometime soon. Thanks for the suggestions!
The following changes have recently been made to this RFC:
Requester field has been removed from the Requests.
BLS-PublicKey has been replaced with the PublicKey enum.
Missing RPCs have been added for AppendOnly Data.
Missing index field has been added for AData owners and permissions manipulation.
Common response type will be used for mutations.
I wonder what the rationale is for locking the optimistic concurrency to a data structure by dividing them into Sequenced and Unsequenced.
To me it seems like this could be much simpler and more flexible, by skipping that subdivision entirely, and just provide an ExpectedVersion parameter. This is a standard when working with event sourcing streams I would say.
You can look at my AppendOnlyDb at GitHub, where I’ve implemented it like this.
ExpectedVersion.Any => Always writes.
ExpectedVersion.Specific(u64) => Only writes if version matched.
ExpectedVersion.None => Only writes if empty.
Then we can get rid of a bit of this cognitive load with the type count explosion, and shorten the names, and not limit use cases (I might want concurrency check sometimes and sometimes not for my data structure instance, too much assumed here).
Instead of using Sequenced and Unsequenced as data types, I would much rather prefer to be able to set concurrency check on a case by case basis, by having it passed in as parameter.
- That gives flexibility for the user.
- Fewer data types.
- Avoiding these long names.
- Overall feels cleaner and simpler, yet more powerful.
Since implementations are started and RFC is still open, I wonder:
Is there a hinder to doing that, and what are the reasons for not doing that?
Final Comment Period
The Published and Unpublished DataType RFC will remain open for the next 10 days to allow any final comments to be made.
Thank you for your contributions!
Is there a reason why they can’t be called Public and Private data types? Published and Unpublished is a tongue twister and difficult naming system to use in a sentence or whimsical online banter. Written down the type system becomes unnecessarily long and unruly.
Or Published and Private?
what do you call an unpublished data that you share? private?
Private data is owned by one person/account and not published. When describing the unpublished data type the RFC describes it as being private. So why not just call it “private data”. Much shorter and sweeter, and self explanatory.
Same thing for published data. It is described as being public data. So just label it “public data”. Again this is short and sweet and self explanatory.
Data that is unpublished but has multiple owners is shared data. This is how it is described in the RFC. So just call it “shared data”. Short, sweet and self explanatory.
Public and Published have the same root. Public is shorter and self explanatory. Published is not self explanatory. Public is the better label/name for the class of data.
Private is more pseudoPrivate though. The vaults can see it, I mean we don’t enforce encryption there. So there is a marketing/PR issue with private, perhaps?
It cannot be successfully published on SAFE though. Public is perhaps simpler, published == public to a great extent. Private/unpublished is a bit more tricky. Not simple, but we do need to make it simple, but not misleading.
This might be confusing because we already use ‘private data’ for referring to encrypted data which can be published at the same time (e.g.: private/public data in Alpha 2).
Can the distinction between private and secret be useful?
Private (i.e. pseudo-secret)
I suspect 4 things at play here (possibly 3).
UnPublished (but some could be later, not mutatable, only append and static)
This is the issue we can have secret data that is either published or not and likewise we can have publically readable data that is published.
The vaults don’t/can’t enforce encryption (ignoring how they store it etc. from the users perspective they don’t from the farmers perspective they do, but this is all user we are discussing), so clients are the ones who decide secret (encrypted) or not.
Can you elaborate? I have a spreadsheet of my finances, a diary, a file of passwords… stored in my private folder, and the vaults can see it?
If you store that plaintext (from the low level API) then it is plaintext everywhere. The vaults storing it will have it obfuscated, but in flight then the nodes getting the message will see it, unless you encrypt it yourself. This is where high level API’s should ensure all data is encrypted, but folk can bypass that in the low level API easy enough. Vault don’t care really it is only data to them and they look after it.
If we ignore data at rest where vaults will take steps to encrypt on the holders etc. as a farmer protection. Then the vaults only see bit and bytes of stuff, they cannot enforce encryption, they can enforce formats (like the reserved types).
Hope that makes sense. The network can be used by a bad app to store data without any encryption, even bypass self_encryption and so on.
Thanks, I forgot about the difference between low and high level APIs.
I agree that ‘published’ and ‘unpublished’ don’t seem right. What about ‘restricted data’ (as in guarded from access)?
This data type is further sub-divided into two categories, Sequenced and Unsequenced. For Sequenced MutableData the client MUST specify the next version number of a value while modifying/deleting keys. Similarly, while modifying the Mutable Data shell (permissions, ownership, etc.), the next version number MUST be passed. For Unsequenced MutableData the client does not have to pass version numbers for keys, but it still MUST pass the next version number while modifying the Mutable Data shell.
Fundamental to this concept, is the version. I wonder, why we need to expand on the vocabulary, by introducing
Unsequenced, when the most natural and closest related naming would be
Unversioned. This is an existing nomenclature that seems suitable.
I believe there is a long standing pattern in the code base of inventing new words for existing phenomena, or very closely related concepts. This is not a good habit IMO. It obfuscates the technology, it’s concepts and principles, to new developers and others. I think it is important with stringency, and minimalism in the nomenclature, trying to keep it small where possible (i.e., if we are talking about versions, use Versioned, not another different word to mean the same thing), and trying to keep it related and close to existing developer nomenclature.