An Overview of the New Data Types

maidsafe · May 16, 2019, 5:16pm

While the new data types offer a wide variety of options for applications, they’re also not a world away from where we have been.

Changes to applications providing published data should (ideally) not be too massive. Though there may be some learning curve, we’re hoping improving our APIs should make this more pleasant overall.

Publishing data

Any published data will end up being either ImmutableData as it’s been so far, or AppendOnlyData replacing what was up until now MutableData, i.e. entries in a published AppendOnlyData cannot be mutated or deleted, and only new entries can be added.

App developers shouldn’t have to worry too much about the nitty gritty here though, as we’re hoping to simplify the general APIs to make publishing, versioning, and handling data much simpler (while still leaving raw APIs available should anyone want to go deeper).

The general gist is that all data that you want to be public as things are now, will need to be AppendOnly Data or ImmutableData to remain public, and therefore perpetual!

Unpublished data

Unpublished data is only accessible to the owners of the data. It is not considered public.

Mutable Data

The MutableData is the same data type we currently have with a few minor additions:

1) The ability to opt out of sequencing the data (see below).
2) It cannot be published, i.e. other users cannot access it.
3) Mutable Data can now be deleted.

MutableData, for example, might be data that you may want to change frequently. This could be a new use case (in terms of SAFE), and there are many applications that may want to take advantage of this.

AppendOnly Data

You can also create unpublished AppendOnly Data, i.e. accessible only by the owner(s).

Sequencing

The idea of ‘sequenced’ data is that order is important to you or the application. This is the same setup as we have now with MutableData, where to update a key, a version must be passed to ensure that you’re updating the correct data. This is only applicable to the AppendOnlyData and MutableData data varieties.

Unsequenced data has no such requirement or check.

Immutable Data

Immutable Data can be both published and unpublished. There won’t be any great changes here beyond specifying the publish variety of ImmutableData you want when you PUT the data. When ImmutableData is unpublished it can also be deleted from the network analogous to the new (thus unpublished) MutableData.

Use Cases

App configuration (SAFE Browser)

Your browser settings for example (bookmarks/history, etc.), could take advantage of this. For example, we could opt to use Mutable Data for the browser settings (as we don’t need a version history or it to be public, and this can also be deleted).

Perpetual Websites (“Internet Archive”)

The public name system, using published data, could now be versioned from day one. As will any data pointed to (via Immutable Data). This means we’ll be able to readily browse the history of applications/websites on any published URL and point to specific versions of that data. (With versioned-data being created atop AppendOnlyData)

Our applications will need to be updated to reflect this. But in terms of API changes, the end use should be very similar. The greater scope of changes will be in terms of making this versioned site history data easily available.

File Management

The Web Hosting Manager will most likely create published data by default. It may need to be expanded (or alternatives made) to more properly manage unpublished data too.

A CLI should provide a clean, simple and familiar way of publishing data from your terminal.

APIs

While there are many new data types, offering different feature sets, we’re aiming to build a public facing API that makes things clear and simple.

CLI

Our initial efforts in this regard have been focussed on thinking about CLI commands and what’s useful there. (We’ll be sharing a full RFC for this in due course.) It’s expected that these new APIs should (ideally) cover the vast majority of developer use cases. And so once we have this specced out, our applications will be migrated to use these new APIs. (Including yes, making those APIs available in the DOM of SAFE Browser).

Language bindings

Language bindings will need to be updated to reflect at least the developer ‘friendly’ version of the APIs we’re envisioning. And we should probably still make the raw versions of these APIs available (such as we have now).

Accounts + Authentication

As we get towards safecoin, some of the APIs will be expanded to provide for management of your coins. Choosing which wallet address to pay for various transactions will be possible for example. So applications will need to be updated to facilitate this.

Feedback

This is an ongoing process, and a lot to think over. So feedback on any/all of the above is both welcome and vital to ensuring we get towards data and APIs folk will want to use.

Traktion · May 16, 2019, 6:01pm

Great to hear about private data being deleteable. There are many good arguments for this and it will be a very useful feature.

Some benefits:

Delete old personal data for enhanced privacy.
Kill switch to wipe account.
Protection against compromised encryption leading to permanent access to existing private data.
Prevent storage capacity being wasted by being consumed with throw away or stale data.

dirvine · May 16, 2019, 6:09pm

It was all gonna be after launch but the community did not speak, it shouted loudly. That forced us to realise just how important this was. So there it is and already POC work happening on it.

andyypants · May 16, 2019, 6:37pm

so if I delete data do I get to reuse those PUTs for free or is like no refunds but you can throw it out if you want to?

dirvine · May 16, 2019, 6:39pm

You will eventually get a refund. We have not added that to an RFC (yet), but the safecoin RFC will show how this is possible. The safecoin RFC is not final safecoin as we will test farming algorithms in the wild, but refunds will be as easy as payment. The refund would be at the cost of storing that data at time of delete.

happybeing · May 16, 2019, 7:26pm

I suggest clarifying what ‘owners’ means - ah, reach owner has full access (see @dirvine’s reply). ~~in this case, for example an owner can be given restricted permissions such as read only access (if that is in fact the case ).~~

Also, has any consideration been given to granularity of access controls - to make it easy to implement a permissioned file system for example, or the access controls per LDP container/resource as in Solid?

There was also a request to consider a different access control mechanism (can’t remember the name) - was this considered and if so what were the thoughts on this?

The CLI section seems to conflate a few things so is a bit confusing to me - can you clarify it a bit (ask if you don’t see what I mean!). Also, I think some of us will not know what CLI stands for, so suggest you include that.

Really great to see this coming together. I spotted the RFCs being pushed so had a quick read earlier and now this helps me get an idea of what’s coming.

I’m still not clear how AD will work though, so would appreciate a topic to explain that in more detail. Maybe a block diagram showing how it hangs together internally (implementation side) plus a summary of what happens when example operations are carried out via the API.

dirvine · May 16, 2019, 7:30pm

Granularity is a good point, atm owner is full access, but read/only etc. could be a feature. I suppose published/owners is edit by owners, read-only by the public. Perhaps it could be more granular, but lets see for version 1.

happybeing · May 16, 2019, 7:40pm

This seems really crude? Essentially data is either:

public (read only by everyone, updateable by any owner, full version history in perpetuity) or
private (readable, writable and deletable by any owner, no version history)

Is that correct?

I think there are more combinations than this, but I’m confused about how limited sharing of private data seems to be. Help understanding that would be appreciated.

Maybe I’m missing some subtleties in the terminology. We used to have the concepts of public data, and private data. Where the latter had a lot of different permissions options. Have those gone away?

Toivo · May 16, 2019, 7:46pm

I just want to highlight this and tell you how much it means to me that you listen us here - very, very much. Thank you!

dirvine · May 16, 2019, 7:53pm

Yes that is correct. We can do a load more, but what we need for launch is where I am focussing right now. There are a few areas like this, even in routing, but right now I feel that we have so many moving parts of a disjoint set of tasks that I want a feature complete running network first. then I would like to focus on things exactly like this, but atm we have engineers in separate areas without full visibility of all the network components (understandable). so I would like a minimum network with data/safecoin/rdf to exist and let the Engineers and community see all the moving parts interact, We have amazing resources, in house and in community, but I feel there is a disjoint and when we put out a data/safecoin/rdf tesnet/network then we will all see so much more. I hope that makes sense, this one is kinda orthogonal to a lot, but it will help to direct all resources to this minimum set for now.

Of course when we get there then RFCs will be community wide and not so much us and then I think much of this will become points of great importance.

david-beinn · May 16, 2019, 8:53pm

Great to see this moving forward towards a more finalised spec.

Initially found the layout of this post a bit confusing though. Even just adjusting the sub-headings and their sizes so it was easy to see a direct comparison between the new specs for published and unpublished Data (and then onto the other related points) would be quite helpful.

I got there in the end though, so apologies if I’m just being picky.

Mindphreaker · May 16, 2019, 9:17pm

I plead guilty here. But seriously, thanks for listening on this very important topic. I think the Published/Unpublished concept is a very fair compromise between privacy and immutable “truth”.

From what I understand AppendOnlyData and ImmutableData can be both Published/Unpublished. Will this be done via some kind of internal boolean which can be modified via a method or function? Or will there be separate data types like UnpublishedImmutableData, PublishedImmutableData?

If separate data types is the case, will the data be transferred to a new “file” with different data type or how will the type be converted when e.g. data gets published?

happybeing · May 16, 2019, 9:34pm

Some of this is covered in the RFCs linked in the OP.

Traktion · May 16, 2019, 11:41pm

I took a closer look at the roadmap on the website and I like the Perpetual Web and Perpetual Data themes. It sits really nicely with these published data types and makes it clear what it means to the world to have these features. Great naming, imo.

mav · May 17, 2019, 1:07am

There should be a cost to delete, not a refund.

Delete should not be a common operation on the network. It should be emergency use only.

One instance that bothers me with refund-for-delete is all accounts now have a bounty, the bigger the account the bigger the bounty. If the account keys are compromised the best thing for the hacker to do is delete all the content so they receive the refund. At least with cost-to-delete the victims content probably won’t be deleted by the hacker.

I have to honestly say I can’t understand why you’d want to pay people to delete content. Can you please elaborate on the incentive there? I must have a blind-spot.

Yeah me too. I found reading the rfcs helped me a lot.

Should the OP contain links to the rfcs?

wes · May 17, 2019, 3:46am

If refund is based in put price when it’s deleted, you can basically play stock market with the network. When price is low, fill up your space, when price is high, delete (sell) you data. Stress the network for profit with no benefit to anyone but the “attacker”.

That coupled with @mav “bounty” issue, I’m not a big fan of a refund for delete. If it’s a must, at a very discounted price.

neo · May 17, 2019, 5:42am

So now imagine that I upload a pic. It is tagged with my ID

Now someone else uploads the same pic

It is dedup and not actually stored

If public then the public get a chance to see my ID since the pic is tagged with my owner ID?

If private then will the second person get to the pic since it was tagged with my owner ID and since dedup did not result in storing anything?

Original Immutable data was always anonymous with no ID ever stored as owner.

So now we can delete unpublished data.

What if I create ADs for indexes to data. The index ADs need updating and it would be good to delete old useless indexes to save storage space.

But since the indexes have to be accessible publicly then they can only be ever appended to and thus grow. If the indexes are changed regularly then this represents a lot of storage that will never be accessed again as it is only indexes and no longer pointing to data.

What if its a private database but needs to be accessed by the employees of the company, is it private or is it public?

Yes agreed but maybe just a zero sum here, because there is a benefit for the network to freeing up space. So while it does not cost to delete perhaps no refund either, or maybe a credit to their “PUT” balance of a small amount, but no safecoin amount. And certainly not a 1:1 ration for deletions to put balance increase.

Exactly and if any refunds then a credit to their “PUT” balance and maybe at the rate of 1/10th so that it is impossible to play “PUT balance” stock market with it

And to highlight it, one could store 1 PB while price is extremely low and then when price is high they delete some and PUT their data for cheap. If they only get 1/10 the PUT balance then the benefits for them is minimal even if they got real cheap and its 100x the price to PUT. To be clear that buy 1PB then delete in order to PUT cheap is not as bad as it sounds since there is a zero increase in storage. Just some work

dirvine · May 17, 2019, 5:52am

Bad wording by me, we have not added refunds yet and may not. It will require an RFC and these points and more will be discussed. I kinda agree with what you say but I can see another side as well (compromised ID wishes to remove trace of uploaded unpublished data etc.) , but the RFC process will flush it all out if there ever is an RFC for it, I don’t think so before launch in any case.

No, in that case, your data is different (as it contains your ID) so dedup is effectively disabled for unpublished data (it wastes the cache’s in any case)

neo · May 17, 2019, 5:59am

What if I “publish” my data later. What happens to the ID or is it uploaded again and I pay twice? What if its a video I upload privately so I can review it to ensure it looks good then “publish” it?

joshuef · May 17, 2019, 6:01am

User access controls should be somewhat separate to ‘published or not’.

I see this state as being more about a) whether data is deletable/modifiable (in a non-transparent fashion), and b) whether only authourised parties (currently the owner(s)) can see it.

B) touches on permissions/access control some, but certainly isn’t a real implementation of this (in so far as I’d like to see at least). So we’re certainly looking at how we can create some more useful permission system. Be that via ACLs, or via macaroons (that’s the permission system, I think you’re referencing @happybeing; and yes we’ve been researching the potential there ) , or who knows what.

Topic		Replies	Views
[RFC] Data Types Refinement RFCs	104	5265	February 4, 2020
RFC 54 - Published and Unpublished DataType RFCs	40	3110	July 25, 2019
Appendable Data discussion Features appendable-data , immutable-data , mutable-data	285	8215	July 8, 2019
An Overview of Data Types on the SAFE Network Marketing medium-reposts	0	684	September 13, 2019
DataStore over AppendableData design Development	25	2555	February 27, 2019