[RFC] Data Types Refinement

That’s good to hear Mark!

Regarding permissions, yes most of this was written before the Labels. We’ve been discussing this briefly, I think @joshuef can perhaps fill in with his view on how this fits together. At this stage there’s no hinder to upgrade I would say anyway.
(In a live network, upgrading with a new format of authorization, would be trickier. Mostly in that the history of previous permissions would be using the old schema, and a migration might not be perfect. So, keeping the old schema would need the old API as to keep access to that information. But on the other hand, a migration might be absolutely fine as well.)

It could be part of Fleming, and that’s an aim :slight_smile:
Most of the code for this has already been written now (comes a bit as a bonus for me because I do spiking / designing iteratively), and there’s been some code review.

But there are still reviews and merges to be done, and they could take some while from what it seems.

Additionally, there are quite widespread updates to do throughout the libraries (safe_client_libs, safe-api, safe_vault, of which I’ve done a great deal in safe_vault already as well), and more tests to be (re-)written.

So, we’ll have to see in the end.

9 Likes

I think theses changes don’t have to be considered in terms of Fleming. If we have them, then sweet :+1: .

IMO they have more impact when considering API releases etc. Though again, I think we can be building out the API with these in mind, using any new naming we settle upon in APIs to get ahead of the curve and avoid any more breaking API changes.

w/r/t labels, I don’t think there’s that much knock on here. This RFC changes some data types and adds some functionality, but isn’t touching permissions on data themselves. So this should work well alongside labels, I believe.

9 Likes

I hear music in my ears :wink: Let’s hope so as that would be nice. It seems so from first glance so hopefully any cross over would be minimal.

9 Likes

Just seeing this thread now (didn’t see it before?) and I think it is excellent! Well thought through and a big improvement!

Nothing critical to add and having private data encrypted by default seems absolutely logical. Looking forward to seeing only integrated!

8 Likes

while in reality, as long as it is not encrypted, even Private data can be read by nodes when it is in transit

basic safe network question here: why is any data sent unencrypted, especially Private data?

if I understand correctly, all data in vaults is encrypted, so vault owners can’t be responsible for what is in there.

It seems to me a simpler story to tell people that all data is sent encrypted across the wire, period. hence: Secure.

8 Likes

I don’t know. I’ve wondered that myself. I’m afraid someone who’s been here longer than me will have to answer it.
(Here’s a relevant post from some time back that talk about this as well: RFC 55 - Unpublished ImmutableData - #10 by happybeing)

But I’m totally with you there, everything leaving the client should be encrypted. So, that’s what I’m advocating based on what I know.

Maybe there’s some use case when it’s not important, but it should be opt out then IMO.

Edit: I’ve been informed that this task is on the roadmap: Safe Network

I’m not sure of the scope at the moment, but at least that would cover the data-at-rest part. Then there is in-transit. But that feature would be a major part of addressing this issue. Let’s await the guys who knows this, and they’ll probably enlighten us on the subject.

5 Likes

The issue is just that. The client needs to encrypt. The network itself won’t care. So if anyone writes a client (and their own SCL etc.) then the network will say, OK store what you want.

This is where the network cannot enforce all clients do X and they still cannot (the network only stores stuff and cannot encrypt / decrypt client data all the way from the client). However moving critical things to the network is good, where possible. Encryption is tough as the network needs to create keys and cannot really do that securely so we abdicate to clients, but they need to use SCL/safe_nd etc. they still could bypass all of that.

14 Likes

How did I miss this post for over two months?? Thank you @happybeing for bumping it back to the top. It’s hard to convey my thoughts on how superb this RFC is. You really have outdone yourself @oetyng! (Again, I repeat, 5 out 5 stars and two thumbs up.) This RFC does a spectacular job at distilling the essential properties of the past datatypes into a readily understandable and coherent system.

A few recommendations on wording/nomenclature:

Instead of having delete and hard_delete, or update and hard_update, I would use unique terms for each operation. Based on your description IMO you should have something like delete, update, kill, and revive. These also fit well with your “tombstone” nomenclature.

I like your line of thinking here and see your reasoning. However, it only became clear after your explanation. Also, “SentriedSequence” is a bit of a mouthful. I would recommend using an alternative and often used term to convey the same meaning, ie. “Protected” instead. So a “ProtectedMap” or a “ProtectedSequence” are Maps and Sequences protected from concurrent race conditions.

When one think’s of “ImmutableData”, the concept implied is a form of data that is a permanent, rigid, carved in stone… an unchangeable construct. However, the term “blob” invokes the exact opposite connotation, ie. an amorphous, variable, undefined, and fluid thing that changes every time you poke it. My suggestion is to use a term more intuitively inline with the intent of immutable data. For example, “Block” invokes a more rigid mental image. This term is not really ideal though due to its overuse in recent years, ie. blockchain, parsec blocks etc. (EDIT: Maybe Block really is a great term to capitalize on and take control of?) For me the best imagery that comes to mind are stone tiles such as those found in sumerian cuneiform, or hieroglyphics carved on a tablet or slab of granite.


200x300

So instead of blob, what about something like these?

  • Tile, PublicTile, PrivateTile
  • Glyph, PublicGlyph, PrivateGlyph

Of those I think Tile is the best… it even rhymes with file, such as “a Map of many Tiles makes a File.”

The only concern here is that if you eliminate the Private and just go with Map and PublicMap things might get confused with the std::Map datastructure if you are not careful with namespace. Map is nice because it fits well with the terminology used by dirvine since the beginning, ie. the “Data Map”. It also represents quite literally a map to find your data. The only other terms off the top of my head that is similar would be “Chart” or “Graph”.

I do think Sequence is nice, but is there a reason why you didn’t pick List instead?

I agree with you here, that NotEncrypted is a poor choice and distinct antonyms are better for improved understanding and readability. I also agree that PlainText is a poor choice since it’s not always text. How about one of the following combinations?

  • Encrypted / Deciphered
  • Encrypted / Decrypted
  • Encrypted / Decoded
  • Encrypted / Raw
10 Likes

I think it was posted to a limited group (not me) and only yesterday was that restriction removed.

BTW I think you make good suggestions on naming.

5 Likes

So, this is what I’ve been having in mind.
At least we could make sure that anything we implement for client-side execution, do this?

Then if someone else comes and do a separate SCL, it will stand on its own merits, and if users want encryption by default, and it doesn’t do it, then probably less luck out there for that implementation.

Yep, this is correct.

Thanks @Traktion, good to hear :slight_smile:

Very happy to hear it @jlpell and thanks for the kind words, and the input! Very valuable.

I like it, would you like to expand on those suggestions? I’m a bit unclear about the revive one for example.

Tombstone is old database word, so I can’t take credit for that one :slight_smile:
Tombstone (data store) - Wikipedia.

I’ll toss in a a bit of background to how I work, for fun and fact :slight_smile:
My process when it comes to design is to always look for standard wordings in the domain. Most often every usage of it has its own flavor attached to it, depending on context, so there is always wiggle room for interpretation, overlap etc. and we can choose which of them we find most relevant. That’s why it’s always a bit of research to find the most suitable words to include in the vocabulary for the specific context.
Then, I most often do an extensive lookup of synonyms, and their usage in both nearby as other contexts. I weigh words against each other and see how they fit with existing concepts and together with a current set of words under consideration.

I’m fine with inventing words, and carve out a space for my domain in the world by claiming that this is now the meaning of this word, and be confident that it will break grounds. But, that is a practice I reserve for remarkable / innovative things where it could be motivated. In all other cases I prefer to be as lean and comprehensible as possible and make everything fit in nicely with well understood existing concepts.

This kind of work is often completely left out when doing near the metal coding - where its all about byte shifts and elliptic curves (so to speak :kissing_heart:), and if no-one picks up on that task, we often end up with quite confusing and alien code, APIs and in the end also the product that will be used.

So, that is the basis for a coherent system. The language all through. By getting a clear language, also the concepts become clear, and the actual system flows and logic can not only be organized in smarter and more intuitive ways, which makes them more robust, but also actually solve the real world problems and not artificial problems that arise out of accidental complexity and concept confusion.

A misconception is that low level code doesn’t need to be nice and easy to understand, that design is only for UX and not for actual code bases, and if only the most hard core devs understand it, then only better since it proves their elevated master mind position :slight_smile: . And that is a path to failure IMO. It’s a lack of understanding that tools, all tools, need to be ergonomic. Programming languages are just tools, for problem solving, and they should never stand in the way of the important work - the problem solving.

So, I specialize in problem solving, and as any engineer I am very keen to see the tools in a good shape, easy to use, sharp and fit for their purpose.


Ah yes. I think your reasoning is sound. It’s good to emphasize the immutability (while keeping it slick). Block, Chunk, Tile are good candidates. We have a slight problem with Block and existing connotations, but we can also choose to be bold and claim the word for our context.
I’d love to hear more people chip in with suggestions here.
I’ll be revisiting this specific part actually, since there are currently ongoing internal discussions about the concepts here, as well as a new proposal brewing which relates to this. But more about that another time.
Blob is used in cloud storage world to denote a big piece of (more or less) immutable data. Now, nothing is truly immutable in today’s blob storage, but it often has a bit more … inertia … than other types of storage. So, that’s why it was chosen, basically for closeness to existing related usage.

Yeah, I agree. And I’ve had similar thoughts. I think Protected is a good alternative, but also it is overlapping a bit with other things, like private and encrypted, often used interchangeably with those. So, since we already are dealing with those concepts, it seems to me that there can be confusion and the user still has to look up what exactly this means.
The benefit of Sentried in that case is that at least there’s no overlap with other words there.

I considered it. But to me personally, it is very closely associated with operations that are not available on this data structure. I’m not alien to go with List anyway, but I think it will disconnect us slightly from the notion of append-only there (which could be an OK compromise).
So, Sequence, while less familiar, I think intuitively conveyed the append-only nature a bit better.

Nice. Good ones. Out of those I would probably pick (with slight modification)

Encrypted / RawContent

What do you think? I’m happy to go for that one.

11 Likes

Yes. Agreed.

Yes, looks good to me.

Your thoughts on “the Blob” vs. Block or Tile?

3 Likes

Yep, was just adding those in an edit above, when you responded :slight_smile:

2 Likes

I’d say be bold and unchain the blocks unless there is a better use for the term Block within the SAFE ecosystem . The term Chunk is a nice general term that can refer to any and all data objects that were formed by splitting up a file.

I suspect part of the reason that was chosen is because blob is often defined in the dictionary as a large drop of liquid. Real"Clouds" have large drops of liquid water in them.

4 Likes

unchain the blocks :grin: catchy

That’s interesting, and plausible. Etymology is always such a stimulating practice.

Today Chunk is reserved for this usage, i.e. a blob of data is split up in chunks.
But as can be read from this citation: “Blobs were originally just big amorphous chunks of data […]”, a blob and a chunk of data can be synonymous.

The chunk of data just means to say some data which has been amassed out of the environment.
So, not necessarily a piece of a file or a blob, but just a piece of data [from our world].

This distinction is going to be more important if we want to move towards having all data fundamentally stored as chunks in the network, as it would require that some content would have to fit all in one single chunk, as it would be too small for self-encryption. But there will be more information and discussion about this later, in another topic.


I’ll be circling back to this shortly, but I’d love to hear other people’s input on the use of Block or similar, when talking about immutable data in the network.

3 Likes

Yes, marketing wise people often just shrug when you tell them that project SAFE started before bitcoin and blockchain. A common question is, “well what is taking you so long then?” Instead, a different narrative might be:

First blockchain was launched and the people rejoiced. But chains are heavy, slow, constraining, they limit freedom, they limit choice. Then came MaidSafe, the breaker of chains, who created a thing with none of those pains. The blocks now roam free, in their SAFE place. You will need a Map to find them; hidden in spare space.

Yes folks, that was my first crypto poem. Bring on the memes.

7 Likes

Haha fantastic :laughing: that’s good, I like it!

Good points with the narrative.

@maidsafe, can we have that crypto poem somewhere on our material? :slight_smile:
(we need to have a significant use for the word Blocks first of course)

2 Likes

All you need now is a picture of dirvine in a black wolf/leather jacket riding a fire breathing dragon. (For those that have HBO, note that Drogon was the actual breaker of chains in that popular series.) I suspect @Zoki could come up with a good meme for this.

5 Likes

Seems legit to me.


Blob… Block… Blob is more recognizable / traditional for binary (large) 'object’s, IMO.

Blocks have a lot of baggage, but ignoring that I don’t think Blocks actually fit the idea of our binary data any better.

But aye, our Blob itself isn’t actually just a binary… it’s a collection of chunks, strung together by a data map, (so I agree it may not be perfect). Chunks/maps don’t immediately inspire any other naming ideas for me though :expressionless:

5 Likes

Doesn’t Block also somewhat implies/sounds-like with a fixed/predefined size?

3 Likes

yeap, from the PoV from an IT guy it’s just a binary object. It’s not inherently mutable or immutable.


But that’s not visible (at that abstraction depth) to the dev, it’s “just” a big binary “blob” & i think it’s not that important to the dev how the system is actually storing the data.

3 Likes