[Pre-RFC] Labelled Data

OP has been updated w/ some more general info and a couple typo fixes.

6 Likes

It sounds like a job for an extremely simple graph structure which will have more versatility when moving on to solving other problems in the future.

Here is a quick one page explanation of why it may be better suited data structure for this job than a roll-our own label db structure.

Now we can structure our data in whatever pattern we want without hitting complicated nesting issues, because we’re just keeping a reference to that object, not the object itself.

Here is a quick description of how you would use it which covers the example given in the pre-RFC.

Note I am not suggesting this particular library (there are a few rust libs that could serve or roll our own for the simple use case of labels), I just selected in because it had concise description/example.

3 Likes

Thanks @krnelson,

There’s definitely plenty of scope for improving the index data structures. You’re right though, a simple graph may well be the answer there. (Such a thing built atop the key:value store of mutable data may be feasible eg).

Though that’s getting deeper into data structures than I attempt in the OP as right now I’m thinking in terms of usability (can this idea work for app devs? how?), and this is implementation details (which granted, need to be sorted out).

I’m trying/hoping to find an implementation which might be feasible to get going soon, using our current data structures on the network. Such as (if it’s deemed desirable) we could build this out in place of the container structs we’ve had so far (and have yet to build the API for). This way we’re not again changing some underlying app APIs in X amount of time.

That may or may not be possible with a graph in the near-term (if that’s the best struct for such indices), if not I’d hope we build this out in a way that the APIs hold, and the underlying index structs etc can be improved over time :+1:

2 Likes

First, I generally think it’s a superb idea. Labels will give a lot more freedom in how we organize and access data.

Also, I’ve got a couple of initial questions.

With regards to viewing data in a folder hierarchy, how is this supposed to work?
Let’s use index and multi-index app(safe-cli)/images/japan/josh with meInJapan.jpg as example.

Will these labels correspond to folders, or would we explicitly state which label correspond to a folder?
Since the labels are concatenated alphabetically, how is the hierarchy determined?

Or… will we simply by default resolve all combinations and find meInJapan.jpg there?

root/images/japan/josh
root/images/josh/japan
root/josh/japan/images
root/josh/images/japan
root/japan/josh/images
root/japan/images/josh

I think that with the convention of having access to images giving access to any multi-index with images label in it, would indicate that all combinations automatically resolve, as that specific label is then always a top hierarchy folder.

6 Likes

The thinking here (from a UX POV) would be that a folder structure wouldn’t be automatically created/viewable from a set of labels, but would be created or determined by the user.

In my opinion, metaphors such as folders/containers, work around the premise of a piece of data being only in one location in that structure. So the folder is a way for the user to opt in to viewing and structuring their data in a deliberate way.

The labeling sits alongside all that, and allows a lot more flexibility.

There are times when you can mix these metaphors a little, e.g. a ‘Smart Folder’ which the user can use to curate data that stays in its original location, but is still within a virtual ‘container’ of sorts. Labels would enable all this too.

5 Likes

Viewing/managing data within a folder hierarchy is a separate construct to labelling data or indices. That would be a ‘folder’ in terms of the pseudo filesystem which is what in the APIs we’re calling a FilesContainer. This struct that allows for websites to have relative contents etc. This data can be managed and created outwith of any indexing. But the FilesContainer itself could be labelled (eg, with Folder )

The labels don’t correspond to folders in the above ‘FilesContainer’ sense. They are more indicative of the index in which a link can be found, and which is used as a method to determine permissions.

These are all the same label index. So only one of these would be valid, (the images/japan/josh going of labels in alphabetical order). And within that index, you’d be able to retrieve meInJapan.jpg

Does that help clarify @oetyng? Let me know if I misunderstood what you were asking

3 Likes

Interesting and coming late, nice diagrams! :smile:

Mainly just following but one question. I’m thinking of this as more a way of indexing data alongside use of the containers/content structures people are used to. Otherwise, where is labelling used instead, rather than alongside - any current/past application examples?

My experience is that people are willing to make reasonable use of organising into folders, not perfect, but people are used to this and do organise things like this, so I think containers/objects is useful.

Whereas where labels or tagging are used it tends to require more effort than most people can or are willing to put in. I recall Evernote had a good mix of tree and tagging, but it was a lot of work adding tags, and I never felt I got the value back from that so I’m dubious that:

  • users will label things (especially if containers are there)
  • that it is worth the work

Automated labelling might be good, but if this is just based on file extension, well we could just search for those anyway.

Another thought is that this is parallel to RDF semantic description, so the two would best be mirrors of each other or we end up with a mess and it might be hard to build UIs or apps that handle both without causing confusion to users and developers.

So I’m inclined to see this more as behind the scenes indexing rather than as a useful alternative to containers and/or semantic web. Which could be very useful! One of the issues with RDF as it sprawls about from one resource to the next is going to be how can that be explored, searched and accessed, and I think indexes like this would be useful. But in that case it would be derived from the RDF rather than explicitly. And any labels applied by apps/users would end up mirrored in the RDF, rather than only in label index itself.

Just first thoughts! It is an interesting idea. Now I’m wondering how this will look with/without containers and RDF from an API and a UI perspective. Seems that is almost more important to think about than how it would work under the hood.

8 Likes

Yes, the idea is that these would co-exist. One is not a replacement for the other. Folders and containers are a very useful metaphor that people are used to and we’re not trying to do away with that. But the user could choose to flip between various ways to view the structure of their data, depending on their needs.

Yes, labelling would predominantly be an automated, indexing layer, that would enable all sorts of UIs to be built on top. And it’s a way to stop the siloing of data that could make for a very clunky experience if we bake in a container-based structure, and fail to embrace the possibilities of flat structures + RDF goodness.

5 Likes

Not if the data is siloed away in some app’s container that you don’t know to check.

That’s the crux of this proposal. It’s not removing the ability to have folder structures. But more enabling such data localisation and modification by any apps targeting that data label.

Currently (well… previously as we don’t have this implemented.), you’d do something like:

// this is all pseudo API
let myPhoto = <data>
let app = new Safe(<my app id>);

// saved in apps own container
app.save('/profile_pic', myPhoto);

// and to retrieve
let myPhoto = <data>
let app = new Safe(<my app id>);

// retrieved from apps own container
let photo = app.get('/profile_pic')

Only this app knows about this data. No other app can access this photo.

With this proposal for apps to manage their own data

// more pseudocode
let myPhoto = <data>
let app = new Safe(<my app id>);
// automatically labelled with `appId`, and saved in that index
// ALSO has 'photo' label applied automatically
app.save(myPhoto)

// and to retrieve
let myPhoto = <data>
let app = new Safe(<my app id>);

//retrieved from apps own index
let photo = app.get('/profile_pic')

// BUT ALSO

let someOtherApp = new Safe(<another app id>);

// if another app has 'photo' permissions
let photo = someOtherApp.getFromIndex('photos', '/profile_pic');


It is the semantic in web, in that each index could be an RDF struct explaining what’s within, and referencing the data via URL. Just accessible across apps.

Indeed, it could / would be great to have these labels applied from a data’s RDF automatically. In which case, each index is simply a quick reference of all data of a particular type. Though as we don’t have RDF baked in yet, label’s is perhaps a shortcut.

4 Likes

Aha, OK thanks.
So, to build a folder view on top of this, we explicitly state that a label corresponds to a folder somehow. But how?
Would it be enough to post-/prefix the label with “folder” (+ delimiter)?

Then, to determine the hierarchy, how is that done?

For example, if I want to have the tree-views “root/photos/food” and “root/photos/animals” would “frog.jpg” have label “folder_root/photos/animals” and “ceviche.jpg” have label “folder_root/photos/food”? Or something else?

In that case, that one label could serve as both the multi-index as well as the tree-view definition.

3 Likes

We’d build a folder completely separately to this. safe files put <folder> would create a FilesContainer.

Within that container, whatever data you have can/will be individually indexed/labelled. And then the FilesContainer (which is our folder), would / could be labelled as well.

In terms of drawing out hierarchy from these indices, you woulnd’t necessarily be able to have one true hierarchy. The data is flat within each index (as i imagine atm). You could create some kind of hierarchy via ‘Smart Folders’ as @JimCollinson suggests above eg.


That seems correct yeh.


(It may be worth noting the ‘label-combo’ string such as photos/animals is just a workaround current limitations with container APIs. Ideally you’d just have an index for each label and the client libs/network handles permission crossover for you)

3 Likes

I was imagining it as this:

  • You upload data with any app, all data is labelled automatically and linked from a Root index container
  • The Root index container data is represented using RDF, you have indexes with URLs to the data
  • You can refer to indexed data using their labels with a label-URL (also with an API) like safe:///<label>/<file and/or path> , a label could be linked to a FilesContainer so you can pass the path of a file in such URL after you provided the label.
  • FilesContainers can be also created where the link to the file is a label-URL rather than an ImmutableData URL (this can create a circular link, we can solve it as OSs do I guess)
4 Likes

Yeh, @bochaco, all that would be grand :+1:

3 Likes

Okok, this is the part I missed.
With hierarchy I meant solely for the tree-view (as it’s basically one and the same) so not for the labeling. But I thought that tree-view over containers would be scrapped and replaced with tree-view emulated on top of labels instead.

Hmm… I’m not done mulching that, but I think I’d prefer the emulation over doing both actually, for simplicity … (if it in fact would make it simpler…).

3 Likes

But emulation you mean every time you query something? I see the FilesContainers to be such emulation but persisted on the network (I guess with better performance/efficiency…? …)

1 Like

At least in OP i’m suggesting to scrap the root level ‘containers’ as we had envisaged them previous.

But you could still create your own FilesContainer data struct (that we use for NRS resolution of websites eg).

Smart Folder emulation atop labels could be another (perhaps app level) feature?

3 Likes

So the motivations here are:

  • to overcome a problem caused by apps having their own container which leads to data being known only to the owning app, and
  • to provide a general indexing mechanism that will enhance access to semantically labelled data, or non RDF data that is explicitly labelled by app or user, or labelled automatically according to content type for example

Is that fair / any others?

The first only applies to data created by apps which choose to use their own container, so I’m wondering what the use cases are for that and if it’s still needed, or could be handled in other ways (permissions for example). I can’t remember the discussions on this and I’m not sure I understood it anyway, so can someone give a summary of why we have app containers and some use cases?

I’m wondering if app containers are still needed, and also whether labelling might conflict with the reason an app would use them.

I’m liking the idea of a built in flexible index, and the way this is described lends itself to a good UI/API. The implementation also seems much easier to understand than I’m used to with this kind of feature. :+1:

4 Likes

To me the app container becomes a label (with < app type > ??), which will make more sense when you start sharing data across apps, e.g. I have my chat app customisation created as an RDF by chat-app-A, but I could import that to chat-app-B by simply sharing the data created with label with chat-app-B

3 Likes

They wouldn’t be needed.

If you only label it with your app, it’s effectively the same as your own container. Any other app would still have to specifically ask for permission for the data that an app’s put.


Ah yeh. Missed that bit sorry @happybeing. Yeh I think that’s fair.

  • No silos
  • indexed data
  • flexibility of data access (ie smart folders, etc) maybe being another aim
3 Likes

Thanks (both of you). I think it’s best to drop the idea of app owned data then, though obviously an app can achieve this functionality as you’ve described.

Much better to encourage a chat client/app to use ontologies to store messages, user identity etc and not encourage labelling as ‘RiotChatMessage’ and so on.

It might be worthwhile looking at how these issues are being handled in Solid too, or we might end up with unnecessary incompatibilities, and we might get some useful ideas and feedback. I think it’s an area of active work.

5 Likes