[RFC] Labelled Data, Indexing and Token Authorisation

tags: rfc labels tokens
  • Status: proposed
  • Type: new front-end data structures and permissions.
  • Related components: SCL, safe-api, authenticator
  • Start Date: 16/12/2019
  • Discussion: (fill me in with link to RFC discussion - shepherd will complete this)
  • Supersedes: (fill me in with a link to RFC this supersedes - if applicable)
  • Superseded by: (fill me in with a link to RFC this is superseded by - if applicable)

Summary

Enable both automatic and optional indexing of data on the account level and prevent data silo-ing by applications, with a system of indexed groups, which increase permission flexibility.

Changes to authentication mechanisms will be needed. A macaroon-like, ā€˜bearer tokenā€™ system is proposed, utilising BLS to manage permissions and advanced caveats for application permissions. (Limitations in the macaroon implementation make actual macaroon use non-viable for SAFE workflows.)

This enables us to validate data access not only at the client handler, against generic permissions, but also at the data handling layer, validating tokens against specifics of that data.

These changes also bring about the possibility of sharing private labelled data with other accounts, without the need to publish it.

Conventions

  • The key words ā€œMUSTā€, ā€œMUST NOTā€, ā€œREQUIREDā€, ā€œSHALLā€, ā€œSHALL NOTā€, ā€œSHOULDā€, ā€œSHOULD NOTā€, ā€œRECOMMENDEDā€, ā€œMAYā€, and ā€œOPTIONALā€ in this document are to be interpreted as described in RFC 2119.

Assumptions

Knowledge of BLS asymmetric encryption is assumed, as well as SAFE request validation, and ClientHandler, DataHandler and Authenticator flows.

Also, a basic understanding of app ā€˜containersā€™ is desirable.

RDF data schemes are not specified here.

Motivation

Ensure that applications can access ā€˜groupsā€™ of data, as opposed to only data they PUT/manage in a specific container.

Ensure that, by default, all data PUT by any application can be located by the user (automatic indexing).

Enable sharing of private data via these labels.

Enable better data access and discovery in the account.

Detailed design

Requests

PUTting data

Any data type, published or unpublished CAN have extra labels applied.

Any data type PUT by an application MUST have an app(<appId>) label applied, (UNLESS it has permission for no-index, and passes a no-index flag).

The ClientApis MUST apply labels passed to the data, MUST create any index that is missing AND add the dataā€™s XorUrl to this index. If the data has a human-readable name, this may be used as the key.

eg:

  • App requests the authenticator to create a label.
  • Auth creates a Token for the app, as above.
  • Auth creates an index (MData) and PUTs it to the network giving the app the requested access.
  • App receives updated Authentication Token.
  • When PUTing the data, the common public key is added to the labels field and the data is PUT to the network. The address of the data is also added in the respective data index.

Note: This burden of indexing can/possible should be moved to the network later on.

GETting data

To access any data, an application MUST be either a) A data owner (ie creator of the data) or B) have appropriate permission on that data via a label.

(See Application Permissions, below).

eg:

In order to perform any network action, the app will:

  • Present the Token alongside any request.

The ClientHandler will then:

  • Validate the Tokenā€™s signature against the PublicId public key, to determine that the app is still authorised.
  • ClientHandler can then pass on relevant information (LabelPublicKeys) to the DataHandler for verification there.
  • DataHandler will then enforce permissions based upon the PublicKeys presented, (the Appā€™s own, or the labelsā€™s granted to the app.)

Labels

Each label created will have a corresponding BLS SecretKey to be stored at the account level. the labelā€™s PublicKey will be stored in the dataā€™s permission sets, as any other key.

This allows the DataHandler to verify permission just like any other key, and so labelā€™s is an extension of this functionality. This also means there is scope for groups of labelled data to be shared between accounts (more on this later).

Thus a map of LabelName: String will need to be maintained in a LabelStore along with the relevant keys and some metadata (RDF) describing the label use.

The LabelStore is part of the AccessContainer (not accessible to apps).

`rust
LabelId : UniqueId;
// RDF info about the label, includes sharing info, human-readable name, etc.
LabelMetaData: String;

LabelStore : {
LabelId : (LabelMetaData, SecretKey, PublicKey)
}
ThusKeySharescan then be assigned to apps or indeed other accounts in order to sign requests for a given label (to be validated by theDataHandler`)

This also allows updating the label name without impacting the labelled data. Or indeed having pseudonymous labels.

Adding / Removing labels

To Data

An API will be needed to add labels to data (and check that this app has that label permission).
- The update function of our data APIs should allow for --addLabels <label> and --removeLabels <label> additions.
- An API will be needed to remove labels from data (validating that this app has that appropriate label permission).

In general

It is likely to be desirable to remove labels in general. This can be done by simply removing the label from the LabelIndex struct. And appā€™s permissions.

Sharing Labelled data

SharedTokens can be generated by an account and shared to another user, giving permission for specific data / labels.

Indexing

Indexing should be an opt-out process, (requiring app permission to do so). Otherwise all data should be indexed.

No-index

A no-index permission will need to be added. This permission will allow requests to have a no-index flag set, to avoid indexing.

Without this permission, all data will be indexed.

Indices

Indices are MutableData objects stored in the accountā€™s root container. Owned by the account.

`rust

EntryName: String; // could be filename as located on computer, xorurl or other. Unique. Human friendlier the better.

EntryInfo: String; // RDF data about the entryā€¦ timestamp / file info / link to data.

Index : MutableData< EntryName, EntryInfo >

RootContainer : {
IndexList : {
IndexMetaData // RDF, as per label (is this duplicated?),
LabelName
}

// individual indexes
<indexId> : MutableData< EntryName, EntryInfo >

}
`

MetaData

An index will be a BTreeMap<String, String>

With the key being either a human-readable entry (filename), or the xorurl. The Entry will be a string of RDF metadata including the link to the data itself, and other metadata (timestamp/ extension/morethings?)

Removing a label from data MUST remove that data from the appropriate index.

Example Label Flow

  • An app X wants to create label L and apply to some data.
  • App requests Authenticator for permission.
  • Label L doesnā€™t exist, so a keypair is created for it in the account.
  • KeyShares are provided to app X once permission is granted.
  • Auth permission is granted, within a timelimit and a Token is minted via a BLS key-pair sign. The token sez ā€œHe who bears this, can manage label L, for two hours from < now >ā€
  • Token is returned to app X
  • PublicKey is stored in the account AccessContainer for retrieval later.
  • SecretKey is stored by Authenticator

When X wants to access data, it must:

  • Make a GET request, and pass along the token.
  • ClientHandlers validates that the token matches the request type (ie has PUT permission if a PUT requests
  • ClientHandlers validates that the token with a PublicKey which has been stored in the AccessContainer.
  • ClientHandlers validates that things are still within the specified timeframe
  • ClientHandlers agree the request is valid and send to DataHandler
  • DataHandler checks the tokenā€™s labels against those on the data. If a PublicKey stored in the dataā€™s Permissions array validates the KeyShare stored in the token, the request is valid and the GET is done

Application Permissions

Macaroon inspired bearer tokens will be used as the application authentication system.

Removing Permissions from Client Handler

Token

Upon authorising an application the Authenticator will:

  • Create a BLS keypair for the application.
  • Store the app, PK at the ClientHandler, for verification later.
  • Store the SK in the Authenticator, linked to the appId
  • Generate and sign the applicationā€™s token, with all permissions and caveats stored therein.
  • Pass this signed token to the app (along with apps own sign-keys)

`rust
// Token structure:
CaveatName: String;
PermissionRestriction = Enum< ā€œreadā€ | ā€œwriteā€ | ā€œmanagepermissionā€ >
LabelCaveatContents = ( , , Option);

CaveatContents: String | LabelCaveatContents;

Caveat : ( CaveatName, CaveatContents )

// to locate correct ClientHandlers
SectionInfo: AccountPublicKey as Xorname;

ProtoToken = serde::serialize(( SectionInfo, Vec ));

Sig = secret_key.sign( ProtoToken );

Token = Token + Signature

`

This application token will hold all permissions granted (eg, balance, transfer etc, currently managed in the ClientHandler)

Shared Tokens

Ownership tied token implementation idea:

SharedTokens are signed by the Accountā€™s SecretKey, in order to be validated at the DataHandler against both the ownership keys and the appropriate permissions for LabelPublicKeys in the Token.

SharedTokens are treated differently, and not validated at the ClientHandler.

Shared data MUST be identified as such in the LabelMetaData, along with an identifier for the recipient (inbox xorurl or safe-id).

ClientHandler skipping tokens:

SharedTokens are signed by the Accountā€™s SecretKey, but they lack (or have) a caveat to indicate that ClientHandlers to pass on this token directly to data handlers regardless of LabelKey presence for that app.

The Token is singed by a LabelKeyShare, which can easily be verified at the DataHandler against its PublicKeyShares as normal. (ie, instead of verifying a SignedLabelCaveat, the DataHandler verifies the token itself). This means arbitrary tokens can be created/passed + validated, with other caveats.

Caveat Example

Using label identifiers (which could be PK for each label or a unique ID stringā€¦)
`rust

//labelId is arbitrary msg to sign for validation at DataHandler
let label_caveat = (ā€˜labelsā€™, vec![(, , Some<ā€œreadā€>), (, ā€˜signatureForSandwhichSkShareā€™)])

let get_balance_caveat = (ā€˜get_balanceā€™, false)

`

Token Invalidation

To revoke an applicationā€™s permissions, the authenticator needs simply to remove the applicationā€™s PublicKey from the appā€™s ClientHandler, and remove the SecretKey from the authenticatorā€™s data struct, meaning a ClientHandler could no longer verify an already existing Token.

Invalidating SharedTokens will need removal of LabelPublicKey from the data.

Drawbacks

The need to send all label-ids to the DataHandler could lead to increased request size.

Alternatives

A Pure BLS implementation could use sign key shares, and pass a key share to an application. This however requires more management and maintenance of ClientHandler data structs than the proposed macaroon-esque setup (which also can gain us the ability to verify specific ā€˜caveatsā€™ depending on whatā€™s encoded into the token.)

SecretKeyShares could be used as a simple alternative to SharedTokens, needing less validation, but having less potential than passing SharedTokens as they can have caveats attached.

Unresolved questions

Index Metadata

What should be included in an indexā€™s metadata, and the metadata for each entry?

Optimisations

How far to optimise, and what to target. (To be decided as bottle necks identified).
Options:

  • Transfer, spend, other bools outwith of token to avoid deserialisation at CH?
  • Client notifying of appropriate labels for known data
  • DataHandler letting ClientHandler know whatā€™s relevant, allowing Token pruning.
  • ClientHandler caching known dataā€™s labels (as received from Datahandler)

Changelog

2020-01-29

  • Added examples of label flow for token issuing / authorization of labels on data.
  • Add a second SharedToken idea, so not tied to data controlled by issuer per-se.
  • Improved Summary
  • Added some thoughts on async token re-issue
14 Likes

Yes. Otherwise anyone can misuse the token and send a request. The requester must have the matching secret key.

4 Likes

But if the token is guarded as a secretKey, does that then need to be signed. Given an app would have bothā€¦ whatā€™s gained by app signing here? The token can be verified by its own signature. All weā€™re doing is moving responsibility on to protecting the key instead of the tokenā€¦


edit: Iā€™ve moved the Q down to the questions section, fyi @lionel.faber

3 Likes

This could be optional, if the user doesnā€™t want the token to be used by any app but only by the one which was specifically authorised by the user. If the user is ok with no app signature it needs a caveat in it for that, e.g. ā€œbearer-token = trueā€

If we want to have the attenuation support, it sounds to me weā€™d need something like this instead:

Token = Vec<(Caveats, Signature)>

Which gives you the append-only type of struct for attenuations with their corresponding signature.

2 Likes

Yeh i think itā€™s good to work with attenuation in mind even if we canā€™t get there straight off the bat. (due to some limitations with access to appā€™s PK from another appā€™s `ClientHandler).

I still donā€™t see how thatā€™s necessary. And the conclusion isnā€™t guaranteed either. The token would be used by anyone with the token + SK. I donā€™t see why thatā€™s any different to just requiring the tokenā€¦ Both could be passed on. Both could/should be stored securelyā€¦

The token itself is the indication of being trusted.

As far as I see, we could add a caveat for appId or something there to the same effect as signingā€¦ (But thatā€™s not really more secure either)

2 Likes

@bochaco, although it does seem like a smaller change which we could implement when we have attentuation sorted. As at the moment this assumes weā€™d need an app key for this, which may not be the case in the end.

1 Like

OP has been updated / clarified after some internal discussions.

(Some flow clarifications and data-sharing methods)

2 Likes

Does an app still need a token to opt-out of indexing, and pass the flag?

Wondering if there would be a spoofing vulnerability there if not.

3 Likes

Does an app still need a token to opt-out of indexing, and pass the flag?

If we have to start this client side, this wonā€™t be possible to enforce (though the APIs can still reflect this).

Ideal impl will have indexing done network side so a permission would be needed, yeh. Iā€™ll updated OP with some thoughts on this :+1:

2 Likes

I think this probably needs some more clarity, what is the ā€œcommon pkā€ referring to? and where does the ā€œlabel fieldā€ belongs to?

I think the app will actually need ManagePermissions permissionsā€¦or both label and ManagePermissions?

Unless I misunderstand this, this wonā€™t work if the owner is different from the account giving permissions for a label, like a new label being added to the content, e.g. User A creates content X with label L1, and gives permissions to User B to ManagePermissions on content X. Then, User B adds label L2 to content X and gives a token to User C to read/write label L2

Plus, tightening owner with other perms doesnā€™t sound a good idea, they should be decoupled I think.

I guess this is (< labelId >, 'signatureForSandwhichShareKey')]) ?

2 Likes

If we need perms to add a label, I think itā€™d just be that labelā€™s permission (and that permission level), assuming ā€˜readā€™ when not specified?

Otherwise adding any label, even read will need ManagePermissions (which loses the impact of manage perms).

Good point though, I think the granularity of labels is missing from this implā€™s caveats. Ie: we have the permissions matrix on the data to say X-key can mutate/update permissions or what have you. But what if we want AppY to only be able to read, even if that label might be in a higher perm level for given dataā€¦ I think we can do this easily in the caveats (as opposed to needing photos:read photos:mutate perms eg.)

Good point. Did we have another alternative impl here that had this covered? I wasnā€™t sure how to get an account to validate without exposing the labels caveat (which would allow any auth to read/create a token with those inā€¦)

:+1:

Will update based on the above (where thereā€™s an answer at least)


edit: Iā€™ve updated the OP with some more token detail / perm attentuation as part of the token (ie, it doesnā€™t matter if DH finds a higher permission level, the token is only valid for read).

Also wondering on SharedToken, @lionel.faber, @bochaco, as the above is limited to the data ownerā€¦ Instead of requiring data owner, maybe itā€™s just a key at / above the required level. So in your example, @bochaco, anything that could ManagePermissions, could sign a tokenā€¦ ? Not sure if that falls down similarly

The bits I understand look very nice!

I donā€™t know if doing this client side precludes using labels in sync with more concrete ā€˜typesā€™ such as Published, Unpublished, AppendOnly, MutableData, ImmutableData, but if possible I think it would be useful to have those available as labels. For example, to filter a search by SAFE data type and status.

Also, applying the label ā€˜Publishedā€™ could be a useful API to have alongside the lower level call and so on. Labels of this kind could be distinguished using a leading symbol such as an underscore or ā€˜$ā€™

Iā€™m not sure if extending labels like this is going to confuse the developer and user experience, or make it simpler, but I think it is worth considering. It certainly appeals to me, but well, Iā€™m not typical :crazy_face:

I like the idea of having the userā€™s labels in a readable form, but when they choose to expose a label to keep the label hidden from any app or third party. E.g. I decide to share all data that has the label ā€˜BestFriendsā€™ but all the recipient knows is that this is a collection of data, rather than the label Iā€™ve applied to it, though that might be visible if I want - say as a default which I can remove/hide/alter. Is that catered for?

For anyone unfamiliar with macaroons, I think this is the starting point (maybe link to something like this in the OP?):

And thereā€™s a Rust macaroon implementation (also Go).

2 Likes

The idea is any data you put with your account could be labelled.

:+1:

I think the neat thing is, as a dev, you could largely ignore labels for a small app. And just be concerned with your apps data. And that would be fine.

Just other apps may be able to find it too!

Yeh, the shared data setup lets the recipient define what they want to call this data, nothing (thus far) is specced out to say ā€œi call this Xā€. you could do that in a message if you wanted I guessā€¦ (or perhaps an option to share label metadata) .

Iā€™ll place that link atop the OP :+1: thanks for your thoughts @happybeing!

(and just to note, we did look at rust impls, but they fall short for how we valiate requests due to the need to always have the SK to verify macaroons, hence why weā€™re rolling our own)

1 Like

Iā€™m not sure I was clear enough here. I did realise (expect) labels could be ā€˜appliedā€™ to any data type. My suggestion was that certain labels be available which correspond to the type. So all MutableData would for example have label ā€˜$MutableDataā€™ etc. You could then use ā€˜$MutableDataā€™ as a filter in any UI/API which accepts labels.

Secondly, that by applying the label ā€˜$Publishedā€™ to something that is not yet published, the effect would be to publish that data.

Hope thatā€™s clearer!

3 Likes

Ah right. Sorry, misunderstood. Aye, those are good labels/indexes to be having I think aye! And could be easily applied automatically :+1:

Ah interesting! Thatā€™s not in the current RFC. It wouldnā€™t work how we have data at the moment (published data is another namespace), though there some ideas / data changes being discussed that may make this possible.

1 Like

I think it can work if you donā€™t implement publishing by adding a label, but do the action of ā€œpublishingā€ when the API to apply a label is asked to apply ā€˜$Publishedā€™.

So applying certain labels is an alternative way of invoking the action ā€˜publishā€™ rather than just applying a special label.

Hope thatā€™s clear!

3 Likes

From latest weekly update:

This is a bad example because the network doesnā€™t manage time. But is this doable at the application level?

An application could complete the chain of system caveats controlled by the network by its own caveats. There could be conventions that applications can follow to manage these supplementary caveats like timeout restrictions, or better a client API that helps managing them.

But these restrictions could be bypassed by forking the application and removing the corresponding controls, or in the case of a timeout, simply by changing the system time of the PC running the application.

So, I would say there is no interest in managing application caveats client side, unless I am missing the big picture, like time managed by the network. A hint for this is:

But I donā€™t see how we can rely on a timestamp that isnā€™t a network consensual time.

2 Likes

Thatā€™s correct for absolute time, but durations may be possible at a network level for a given account, which would work for this usecase. E.g. Iā€™ll give you access to this file until my clock runs down to zero.

Good Q @tfa. Itā€™s not 100% that itā€™ll work, but Iā€™m sure weā€˜ll be able to get this eventually, without requiring consensus on the time exactly.

I donā€™t think it would be for the reasons you outlined.

Thereā€™s been some discussion on time in general, and while the network itself wont (cant) be aware of time, I think itā€™s possible for some form of this to work at the client side (Authenticator and Validators). It has yet to be proven, but I think itā€™ll go.

A ClientHandler node, could validate the time as part of its normal checks ( is the token revoked? Does it match the requestā€¦ etc). These ClientHandlers reach consensus on whether they think the request is valid or not, not specifically the current time. It may be that it falls down if a request is made at the expiration time, and therefore consensus isnā€™t reached. But we could also allow for approximations here.

Iā€™m not sure if itā€™ll work 100% (proof will be in the pudding), but for the purpose of asking ā€œis this application still validā€ (as opposed to network critical operations), I think it may well be enough for ClientHandlers to be doing this.

If this approach doesnā€™t work, weā€™ll probably look at tying duration to parsec blocks (as been suggested elsewhere on the forum), so duration could be approximated there. But Iā€™m hoping we wont need to go there.

2 Likes

What is the use case for this? Once access is given, the document can be copied to local storage. The copier then has perpetual access.

1 Like