[RFC] Labelled Data, Indexing and Token Authorisation

joshuef · December 16, 2019, 1:05pm

tags: `rfc` `labels` `tokens`

Status: proposed
Type: new front-end data structures and permissions.
Related components: SCL, safe-api, authenticator
Start Date: 16/12/2019
Discussion: (fill me in with link to RFC discussion - shepherd will complete this)
Supersedes: (fill me in with a link to RFC this supersedes - if applicable)
Superseded by: (fill me in with a link to RFC this is superseded by - if applicable)

Summary

Enable both automatic and optional indexing of data on the account level and prevent data silo-ing by applications, with a system of indexed groups, which increase permission flexibility.

Changes to authentication mechanisms will be needed. A macaroon-like, ‘bearer token’ system is proposed, utilising BLS to manage permissions and advanced caveats for application permissions. (Limitations in the macaroon implementation make actual macaroon use non-viable for SAFE workflows.)

This enables us to validate data access not only at the client handler, against generic permissions, but also at the data handling layer, validating tokens against specifics of that data.

These changes also bring about the possibility of sharing private labelled data with other accounts, without the need to publish it.

Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Assumptions

Knowledge of BLS asymmetric encryption is assumed, as well as SAFE request validation, and ClientHandler, DataHandler and Authenticator flows.

Also, a basic understanding of app ‘containers’ is desirable.

RDF data schemes are not specified here.

Motivation

Ensure that applications can access ‘groups’ of data, as opposed to only data they PUT/manage in a specific container.

Ensure that, by default, all data PUT by any application can be located by the user (automatic indexing).

Enable sharing of private data via these labels.

Enable better data access and discovery in the account.

Detailed design

Requests

PUTting data

Any data type, published or unpublished CAN have extra labels applied.

Any data type PUT by an application MUST have an app(<appId>) label applied, (UNLESS it has permission for no-index, and passes a no-index flag).

The ClientApis MUST apply labels passed to the data, MUST create any index that is missing AND add the data’s XorUrl to this index. If the data has a human-readable name, this may be used as the key.

eg:

App requests the authenticator to create a label.
Auth creates a Token for the app, as above.
Auth creates an index (MData) and PUTs it to the network giving the app the requested access.
App receives updated Authentication Token.

When PUTing the data, the common public key is added to the labels field and the data is PUT to the network. The address of the data is also added in the respective data index.

Note: This burden of indexing can/possible should be moved to the network later on.

GETting data

To access any data, an application MUST be either a) A data owner (ie creator of the data) or B) have appropriate permission on that data via a label.

(See Application Permissions, below).

eg:

In order to perform any network action, the app will:

Present the Token alongside any request.

The ClientHandler will then:

Validate the Token’s signature against the PublicId public key, to determine that the app is still authorised.
ClientHandler can then pass on relevant information (LabelPublicKeys) to the DataHandler for verification there.
DataHandler will then enforce permissions based upon the PublicKeys presented, (the App’s own, or the labels’s granted to the app.)

Labels

Each label created will have a corresponding BLS SecretKey to be stored at the account level. the label’s PublicKey will be stored in the data’s permission sets, as any other key.

This allows the DataHandler to verify permission just like any other key, and so label’s is an extension of this functionality. This also means there is scope for groups of labelled data to be shared between accounts (more on this later).

Thus a map of LabelName: String will need to be maintained in a LabelStore along with the relevant keys and some metadata (RDF) describing the label use.

The LabelStore is part of the AccessContainer (not accessible to apps).

`rust
LabelId : UniqueId;
// RDF info about the label, includes sharing info, human-readable name, etc.
LabelMetaData: String;

LabelStore : {
LabelId : (LabelMetaData, SecretKey, PublicKey)
}
ThusKeySharescan then be assigned to apps or indeed other accounts in order to sign requests for a given label (to be validated by theDataHandler`)

This also allows updating the label name without impacting the labelled data. Or indeed having pseudonymous labels.

Adding / Removing labels

To Data

An API will be needed to add labels to data (and check that this app has that label permission).
- The update function of our data APIs should allow for --addLabels <label> and --removeLabels <label> additions.
- An API will be needed to remove labels from data (validating that this app has that appropriate label permission).

In general

It is likely to be desirable to remove labels in general. This can be done by simply removing the label from the LabelIndex struct. And app’s permissions.

Sharing Labelled data

SharedTokens can be generated by an account and shared to another user, giving permission for specific data / labels.

Indexing

Indexing should be an opt-out process, (requiring app permission to do so). Otherwise all data should be indexed.

No-index

A no-index permission will need to be added. This permission will allow requests to have a no-index flag set, to avoid indexing.

Without this permission, all data will be indexed.

Indices

Indices are MutableData objects stored in the account’s root container. Owned by the account.

`rust

EntryName: String; // could be filename as located on computer, xorurl or other. Unique. Human friendlier the better.

EntryInfo: String; // RDF data about the entry… timestamp / file info / link to data.

Index : MutableData< EntryName, EntryInfo >

RootContainer : {
IndexList : {
IndexMetaData // RDF, as per label (is this duplicated?),
LabelName
}

// individual indexes
<indexId> : MutableData< EntryName, EntryInfo >

}
`

MetaData

An index will be a BTreeMap<String, String>

With the key being either a human-readable entry (filename), or the xorurl. The Entry will be a string of RDF metadata including the link to the data itself, and other metadata (timestamp/ extension/morethings?)

Removing a label from data MUST remove that data from the appropriate index.

Example Label Flow

An app X wants to create label L and apply to some data.
App requests Authenticator for permission.
Label L doesn’t exist, so a keypair is created for it in the account.
KeyShares are provided to app X once permission is granted.
Auth permission is granted, within a timelimit and a Token is minted via a BLS key-pair sign. The token sez “He who bears this, can manage label L, for two hours from < now >”
Token is returned to app X
PublicKey is stored in the account AccessContainer for retrieval later.
SecretKey is stored by Authenticator

When X wants to access data, it must:

Make a GET request, and pass along the token.
ClientHandlers validates that the token matches the request type (ie has PUT permission if a PUT requests
ClientHandlers validates that the token with a PublicKey which has been stored in the AccessContainer.
ClientHandlers validates that things are still within the specified timeframe
ClientHandlers agree the request is valid and send to DataHandler
DataHandler checks the token’s labels against those on the data. If a PublicKey stored in the data’s Permissions array validates the KeyShare stored in the token, the request is valid and the GET is done

Application Permissions

Macaroon inspired bearer tokens will be used as the application authentication system.

Removing Permissions from Client Handler

Token

Upon authorising an application the Authenticator will:

Create a BLS keypair for the application.
Store the app, PK at the ClientHandler, for verification later.
Store the SK in the Authenticator, linked to the appId
Generate and sign the application’s token, with all permissions and caveats stored therein.
Pass this signed token to the app (along with apps own sign-keys)

`rust
// Token structure:
CaveatName: String;
PermissionRestriction = Enum< “read” | “write” | “managepermission” >
LabelCaveatContents = ( , , Option);

CaveatContents: String | LabelCaveatContents;

Caveat : ( CaveatName, CaveatContents )

// to locate correct ClientHandlers
SectionInfo: AccountPublicKey as Xorname;

ProtoToken = serde::serialize(( SectionInfo, Vec ));

Sig = secret_key.sign( ProtoToken );

Token = Token + Signature

`

This application token will hold all permissions granted (eg, balance, transfer etc, currently managed in the ClientHandler)

Shared Tokens

Ownership tied token implementation idea:

SharedTokens are signed by the Account’s SecretKey, in order to be validated at the DataHandler against both the ownership keys and the appropriate permissions for LabelPublicKeys in the Token.

SharedTokens are treated differently, and not validated at the ClientHandler.

Shared data MUST be identified as such in the LabelMetaData, along with an identifier for the recipient (inbox xorurl or safe-id).

ClientHandler skipping tokens:

SharedTokens are signed by the Account’s SecretKey, but they lack (or have) a caveat to indicate that ClientHandlers to pass on this token directly to data handlers regardless of LabelKey presence for that app.

The Token is singed by a LabelKeyShare, which can easily be verified at the DataHandler against its PublicKeyShares as normal. (ie, instead of verifying a SignedLabelCaveat, the DataHandler verifies the token itself). This means arbitrary tokens can be created/passed + validated, with other caveats.

Caveat Example

Using label identifiers (which could be PK for each label or a unique ID string…)
`rust

//labelId is arbitrary msg to sign for validation at DataHandler
let label_caveat = (‘labels’, vec![(, , Some<“read”>), (, ‘signatureForSandwhichSkShare’)])

let get_balance_caveat = (‘get_balance’, false)

`

Token Invalidation

To revoke an application’s permissions, the authenticator needs simply to remove the application’s PublicKey from the app’s ClientHandler, and remove the SecretKey from the authenticator’s data struct, meaning a ClientHandler could no longer verify an already existing Token.

Invalidating SharedTokens will need removal of LabelPublicKey from the data.

Drawbacks

The need to send all label-ids to the DataHandler could lead to increased request size.

Alternatives

A Pure BLS implementation could use sign key shares, and pass a key share to an application. This however requires more management and maintenance of ClientHandler data structs than the proposed macaroon-esque setup (which also can gain us the ability to verify specific ‘caveats’ depending on what’s encoded into the token.)

SecretKeyShares could be used as a simple alternative to SharedTokens, needing less validation, but having less potential than passing SharedTokens as they can have caveats attached.

Unresolved questions

Index Metadata

What should be included in an index’s metadata, and the metadata for each entry?

Optimisations

How far to optimise, and what to target. (To be decided as bottle necks identified).
Options:

Transfer, spend, other bools outwith of token to avoid deserialisation at CH?
Client notifying of appropriate labels for known data
DataHandler letting ClientHandler know what’s relevant, allowing Token pruning.
ClientHandler caching known data’s labels (as received from Datahandler)

Changelog

2020-01-29

Added examples of label flow for token issuing / authorization of labels on data.
Add a second SharedToken idea, so not tied to data controlled by issuer per-se.
Improved Summary
Added some thoughts on async token re-issue

lionel.faber · December 16, 2019, 1:12pm

Yes. Otherwise anyone can misuse the token and send a request. The requester must have the matching secret key.

joshuef · December 16, 2019, 1:14pm

But if the token is guarded as a secretKey, does that then need to be signed. Given an app would have both… what’s gained by app signing here? The token can be verified by its own signature. All we’re doing is moving responsibility on to protecting the key instead of the token…

edit: I’ve moved the Q down to the questions section, fyi @lionel.faber

bochaco · December 16, 2019, 2:47pm

This could be optional, if the user doesn’t want the token to be used by any app but only by the one which was specifically authorised by the user. If the user is ok with no app signature it needs a caveat in it for that, e.g. “bearer-token = true”

If we want to have the attenuation support, it sounds to me we’d need something like this instead:

Token = Vec<(Caveats, Signature)>

Which gives you the append-only type of struct for attenuations with their corresponding signature.

joshuef · December 16, 2019, 2:52pm

Yeh i think it’s good to work with attenuation in mind even if we can’t get there straight off the bat. (due to some limitations with access to app’s PK from another app’s `ClientHandler).

I still don’t see how that’s necessary. And the conclusion isn’t guaranteed either. The token would be used by anyone with the token + SK. I don’t see why that’s any different to just requiring the token… Both could be passed on. Both could/should be stored securely…

The token itself is the indication of being trusted.

As far as I see, we could add a caveat for appId or something there to the same effect as signing… (But that’s not really more secure either)

joshuef · December 16, 2019, 4:23pm

@bochaco, although it does seem like a smaller change which we could implement when we have attentuation sorted. As at the moment this assumes we’d need an app key for this, which may not be the case in the end.

joshuef · December 17, 2019, 12:53pm

OP has been updated / clarified after some internal discussions.

(Some flow clarifications and data-sharing methods)

JimCollinson · December 17, 2019, 1:07pm

Does an app still need a token to opt-out of indexing, and pass the flag?

Wondering if there would be a spoofing vulnerability there if not.

joshuef · December 17, 2019, 1:13pm

Does an app still need a token to opt-out of indexing, and pass the flag?

If we have to start this client side, this won’t be possible to enforce (though the APIs can still reflect this).

Ideal impl will have indexing done network side so a permission would be needed, yeh. I’ll updated OP with some thoughts on this

bochaco · December 17, 2019, 2:28pm

I think this probably needs some more clarity, what is the “common pk” referring to? and where does the “label field” belongs to?

I think the app will actually need ManagePermissions permissions…or both label and ManagePermissions?

Unless I misunderstand this, this won’t work if the owner is different from the account giving permissions for a label, like a new label being added to the content, e.g. User A creates content X with label L1, and gives permissions to User B to ManagePermissions on content X. Then, User B adds label L2 to content X and gives a token to User C to read/write label L2

Plus, tightening owner with other perms doesn’t sound a good idea, they should be decoupled I think.

I guess this is (< labelId >, 'signatureForSandwhichShareKey')]) ?

joshuef · December 17, 2019, 2:48pm

If we need perms to add a label, I think it’d just be that label’s permission (and that permission level), assuming ‘read’ when not specified?

Otherwise adding any label, even read will need ManagePermissions (which loses the impact of manage perms).

Good point though, I think the granularity of labels is missing from this impl’s caveats. Ie: we have the permissions matrix on the data to say X-key can mutate/update permissions or what have you. But what if we want AppY to only be able to read, even if that label might be in a higher perm level for given data… I think we can do this easily in the caveats (as opposed to needing photos:read photos:mutate perms eg.)

Good point. Did we have another alternative impl here that had this covered? I wasn’t sure how to get an account to validate without exposing the labels caveat (which would allow any auth to read/create a token with those in…)

Will update based on the above (where there’s an answer at least)

edit: I’ve updated the OP with some more token detail / perm attentuation as part of the token (ie, it doesn’t matter if DH finds a higher permission level, the token is only valid for read).

Also wondering on SharedToken, @lionel.faber, @bochaco, as the above is limited to the data owner… Instead of requiring data owner, maybe it’s just a key at / above the required level. So in your example, @bochaco, anything that could ManagePermissions, could sign a token… ? Not sure if that falls down similarly

happybeing · December 17, 2019, 3:57pm

The bits I understand look very nice!

I don’t know if doing this client side precludes using labels in sync with more concrete ‘types’ such as Published, Unpublished, AppendOnly, MutableData, ImmutableData, but if possible I think it would be useful to have those available as labels. For example, to filter a search by SAFE data type and status.

Also, applying the label ‘Published’ could be a useful API to have alongside the lower level call and so on. Labels of this kind could be distinguished using a leading symbol such as an underscore or ‘$’

I’m not sure if extending labels like this is going to confuse the developer and user experience, or make it simpler, but I think it is worth considering. It certainly appeals to me, but well, I’m not typical

I like the idea of having the user’s labels in a readable form, but when they choose to expose a label to keep the label hidden from any app or third party. E.g. I decide to share all data that has the label ‘BestFriends’ but all the recipient knows is that this is a collection of data, rather than the label I’ve applied to it, though that might be visible if I want - say as a default which I can remove/hide/alter. Is that catered for?

For anyone unfamiliar with macaroons, I think this is the starting point (maybe link to something like this in the OP?):

And there’s a Rust macaroon implementation (also Go).

joshuef · December 17, 2019, 4:02pm

The idea is any data you put with your account could be labelled.

I think the neat thing is, as a dev, you could largely ignore labels for a small app. And just be concerned with your apps data. And that would be fine.

Just other apps may be able to find it too!

Yeh, the shared data setup lets the recipient define what they want to call this data, nothing (thus far) is specced out to say “i call this X”. you could do that in a message if you wanted I guess… (or perhaps an option to share label metadata) .

I’ll place that link atop the OP thanks for your thoughts @happybeing!

(and just to note, we did look at rust impls, but they fall short for how we valiate requests due to the need to always have the SK to verify macaroons, hence why we’re rolling our own)

happybeing · December 17, 2019, 5:02pm

I’m not sure I was clear enough here. I did realise (expect) labels could be ‘applied’ to any data type. My suggestion was that certain labels be available which correspond to the type. So all MutableData would for example have label ‘$MutableData’ etc. You could then use ‘$MutableData’ as a filter in any UI/API which accepts labels.

Secondly, that by applying the label ‘$Published’ to something that is not yet published, the effect would be to publish that data.

Hope that’s clearer!

joshuef · December 17, 2019, 5:07pm

Ah right. Sorry, misunderstood. Aye, those are good labels/indexes to be having I think aye! And could be easily applied automatically

Ah interesting! That’s not in the current RFC. It wouldn’t work how we have data at the moment (published data is another namespace), though there some ideas / data changes being discussed that may make this possible.

happybeing · December 17, 2019, 5:58pm

I think it can work if you don’t implement publishing by adding a label, but do the action of “publishing” when the API to apply a label is asked to apply ‘$Published’.

So applying certain labels is an alternative way of invoking the action ‘publish’ rather than just applying a special label.

Hope that’s clear!

tfa · December 20, 2019, 10:25am

From latest weekly update:

This is a bad example because the network doesn’t manage time. But is this doable at the application level?

An application could complete the chain of system caveats controlled by the network by its own caveats. There could be conventions that applications can follow to manage these supplementary caveats like timeout restrictions, or better a client API that helps managing them.

But these restrictions could be bypassed by forking the application and removing the corresponding controls, or in the case of a timeout, simply by changing the system time of the PC running the application.

So, I would say there is no interest in managing application caveats client side, unless I am missing the big picture, like time managed by the network. A hint for this is:

But I don’t see how we can rely on a timestamp that isn’t a network consensual time.

JimCollinson · December 20, 2019, 10:41am

That’s correct for absolute time, but durations may be possible at a network level for a given account, which would work for this usecase. E.g. I’ll give you access to this file until my clock runs down to zero.

joshuef · December 20, 2019, 11:21am

Good Q @tfa. It’s not 100% that it’ll work, but I’m sure we‘ll be able to get this eventually, without requiring consensus on the time exactly.

I don’t think it would be for the reasons you outlined.

There’s been some discussion on time in general, and while the network itself wont (cant) be aware of time, I think it’s possible for some form of this to work at the client side (Authenticator and Validators). It has yet to be proven, but I think it’ll go.

A ClientHandler node, could validate the time as part of its normal checks ( is the token revoked? Does it match the request… etc). These ClientHandlers reach consensus on whether they think the request is valid or not, not specifically the current time. It may be that it falls down if a request is made at the expiration time, and therefore consensus isn’t reached. But we could also allow for approximations here.

I’m not sure if it’ll work 100% (proof will be in the pudding), but for the purpose of asking “is this application still valid” (as opposed to network critical operations), I think it may well be enough for ClientHandlers to be doing this.

If this approach doesn’t work, we’ll probably look at tying duration to parsec blocks (as been suggested elsewhere on the forum), so duration could be approximated there. But I’m hoping we wont need to go there.

jlpell · December 20, 2019, 11:44am

What is the use case for this? Once access is given, the document can be copied to local storage. The copier then has perpetual access.

Topic		Replies	Views
[Pre-RFC] Labelled Data RFCs linkeddata	81	3020	January 25, 2020
[RFC] Data Types Refinement RFCs	104	5297	February 4, 2020
RFC: Dynamic Data Support RFCs	13	1963	April 5, 2016
RFC 54 - Published and Unpublished DataType RFCs	40	3129	July 25, 2019
RFC - Remove Transaction Managers RFCs	5	2417	July 1, 2015