Live data sharing with CRDTs

gsvarovsky · December 1, 2020, 12:20pm

Hi folks, I’ve come here on the kind suggestion of @happybeing, on the basis that you are “using CRDTs for [your] data types (combined with RDF)”. I’m working on a project in exactly that space which may interest you, called m-ld (http://m-ld.org/).

At heart it’s a technology for sharing a live dataset between collaborating actors, like users or robots. It uses an RDF data representation, with JSON-LD for the interface. It’s decentralised, or rather centralisation-agnostic, so intended to fit well with existing data architectures including client-server and ‘local-first’ (perhaps Safe fits in that category, or perhaps it’s a more ambitious category of its own).

From what I understand of the Safe Network so far, I can picture that CRDTs could be involved in the maintenance of storage network metadata (folder hierarchy perhaps), but also as a file data type – that would be quite aligned with m-ld’s use-cases, at least the for equivalent Safe Network apps.

So, I have some curiosity about the user scenarios you envisage will make your investment in CRDTs (especially combined with RDF) worthwhile. Are you building apps with realtime collaborative editing, for example?

digipl · December 1, 2020, 12:52pm

dirvine · December 1, 2020, 1:00pm

You are more than welcome and thanks for contributing.

So far we have an Lseq and crdt-tree that allows us to provide a rich API (we hope). @bochaco and @danda are quite involved there. We are moving on byzantine CRDT’s as well (bft-crdts) which allows us to operate in the most hostile of environments.

We have not set up collaborative editing yet, but there is a neat lib for this GitHub - yjs/yjs: Shared data types for building collaborative software and talk of it moving to rust. It makes perfect sense to work that into the mix for sure. We just need to work out the intricacies of holding the data in a way that works well for the network.

It’s all exciting stuff as we use CRDT to allow partition management (cap theory) and even better we have offline capabilities that would otherwise be closed to us if we were using a strict order consensus-based algorithm (close shave).

The links to that and SOLID though are almost completely missing right now. Your timing is perfect and while we are in a bit of a flurry to get a testnet up right now this is an area we will be really keen to collaborate as I feel Solid + CRDT + secured network (no servers) is the holy grail we all need right now.

gsvarovsky · December 1, 2020, 2:39pm

Thanks for the replies. So filesystem-metadata thinking is far advanced already, cool, and you’re definitely ahead of the crowd thinking about Byzantine fault tolerance. Intuitively, bft-crdt is almost a contradiction… I found the project and I’ll have a read with interest.

Yjs is a great library for sure, and mature. I’m sure you’re also aware of automerge. My project is a little different because it starts with RDF as the base data type, with the intention of having natively extensible, declarative semantics. In a nutshell, instead of choosing a List or Map or whatever, with its implementation, at compile-time, you have metadata in the shared dataset which says “I’m a list, and this is my expected merge behaviour”; then, apps can be more dynamic in response to changes. No different to the usual semantic web (and Solid) ambitions, just applied a little further into shared data types. To illustrate, https://yjs.dev/#Y.Text could be a literal type (although a more general IRI that represents the text semantics of Y.Text might be ultimately preferable).

Early days, which is why I’m keenly looking out for use-cases. Great to know about your “holy grail”! What would you build with it? (Unless it’s under wraps of course )

dirvine · December 1, 2020, 3:30pm

In terms of lang, then likely rust. Safe as the secured storage/compute works well and should soon be a testnet. Then with the crdts we are in good shape.

The API is interesting though and a SOLID layer would be really compelling I think. If we can get SOLID secure and take way the requirement to run a server or have a pal do it etc. then we are on track.

This is fascinating, I have some reading to do here

gsvarovsky · December 1, 2020, 4:35pm

Thanks for the detail. You might have read this as “it with”. Also interested in “with it”… i.e. what’s the superpower the end-user will get? (Sorry, this is probably Safe Network 101.)

Sure! This level of thinking is not on the project website yet. I need to author something; happy to elaborate ad hoc in the meantime.

dirvine · December 1, 2020, 4:47pm

Ah got you. New apps basically. Like Solid data is yours and in your control, unless it’s public. Then it’s held forever, so no need for Internet Archive etc. All data will last forever and be able to be referenced for the life of the network and beyond.

I say beyond as we have double signature checks on all data. The owner/policy allowed signature for each mutation and a network signature to show it was validly held on the network. As these are crdt types then we don’t mind the network segments or even vanishes etc. As long as folk have some data it is valid, even if many users disconnect from each other for a long time etc. Ultimately we can make the network optional to look after data (meaning move it to other networks if you wish, or store it on crystals etc.) Of course we hope the network is always there, improving as it goes.

Like solid we believe data is “just there” and apps are allowed to read it, but not own it. This means we can look at new apps that interact with data.

So far we have community members working on safe_git, Jams (music) and much more. The beauty of being able to switch apps and keep data is huge, but making that all automatic with no servers is a compelling thing.

I am not sure I can pinpoint a single killer app/superpower, but giving these tools to devs should allow them to create any app and on a level playing field. No need to get massive cash to create projects, none to expand a user base etc. So simply secured data on a privacy enhancing network should allow a whole bunch of new ways of working with our data.

Personally I am keen to see neural network sharing and secured immutable networks that do specific things in a way that it cannot be hacked. Then sharing learning nets to take AI much further and again on a level playing field.

Hope that gives a bit on an insight?

philip_rhoades · December 1, 2020, 9:54pm

Love it! Exactly how I thought my Avatar Phi Rho would work . .

gsvarovsky · December 2, 2020, 8:37am

It does!

I’ve long nurtured a mental metaphor of a long-running MMO containing all game maps ever, which never deletes anything. So you can wander into an alleyway and find Doom, 320x200 resolution, and buy it from John Carmack to make a theme park. Besides just being cool on that level (ahaha), the metaphor at its logical conclusion becomes immutable data = the subatomic particles of a parallel universe (to which we’re not native, but that’s a whole other tack, for which beer is needed).

I don’t want to impose on you with newb questions on the overall vision so I’ll keep reading up on that.

In the meantime I can see how CRDTs fit into it, thanks. Just intuitively, the requirement to tolerate arbitrary byzantine faults requires at least a consensus protocol (I see it here in bft-crdts), which means that allowing for network segments or disappearance will require that consensus can be deferred, which could make the eventual merge, um, interesting. Also being aware that preserving user intent in CRDTs is still a matter of active research, it might be an idea to linearise history when you get the chance, since you have consensus for the byzantine tolerance anyway, and fall back to CRDT behaviour when partitioned. Apologies for the half-formed thoughts.

To relate this to m-ld, I would paint the following picture. m-ld is currently intended for small-corpus replicated data sets, to support realtime collaboration. To the Network it could be nothing more than an opaque application-level file type that utilises the network messaging to synchronise, with its own protocol. At this level the value to the end-user is to make collaborative editing pervasive in Safe Network apps. Shooting myself in the foot, you could use Yjs or automerge for that.

More magic happens when m-ld data is integrated, or subsumed, into a wider network of RDF data, in which ‘datasets’ or ‘files’ disappear in favour of a global semantic graph, which is always available and always writeable. Semantics, including integrity and intent preservation, are a matter of declaration, not of a parochial technology choice in an app.

This is your level playing field, I think, and it’s level at all levels: the superpower it gives is the freedom to climb the data-information-knowledge-wisdom hierarchy. Humans and AIs are free to process any data and infer any wisdom, and have it be available to all, safely.

gsvarovsky · December 17, 2020, 10:11am

Hi folks, I’ve written an article that expands on some of the ideas discussed in this thread:

The Data Æther

dirvine · December 17, 2020, 12:28pm

Fascinating stuff and just at the right time. I need to dig more into how you have achieved this, but love the fact this exists. Hats off to you!!!

gsvarovsky · December 17, 2020, 12:45pm

Thanks for the shout @dirvine! I wouldn’t say I have achieved this … I’ve made a start on the long road to a vision, which I think quite a few of us are looking at, from different directions. So there’s also lots of work to do for us to cohere around common narratives, hence the article. Sounds like it resonates…

dirvine · December 17, 2020, 1:22pm

% it does for sure. It’s the depth of thinking I love here, the solutions appear simple, but the thinking behind them is extremely deep, truly wonderful and very natural too.

gsvarovsky · February 4, 2021, 8:35am

Hi Folks

Since I wrote the Data Æther I’ve been hurrying to take the next step and show some more concrete ideas supporting the idea of a pervasive data abstraction.

One of the most important features of such an abstraction, in my view, is how well you can describe the semantics of your data structures. Following the lead of JSON-LD, I’ve wanted first-class support for Lists in m-ld for a while, so, time to put my money where my mouth is…

Truth and Just Lists: multi-collaborator editable Lists in m-ld’s JSON-LD interface

This included supporting work in m-ld to implement the List type as an extension, showing a path to how other data structure extensions might work.

I’m really happy (and relieved) to say, it’s working.

As usual, feedback, thoughts, and next step ideas super-welcome!

gsvarovsky · July 14, 2021, 2:57pm

Hi listeners, a quick update from me on m-ld!

Here are the headlines; visit the website for more details…

sharing our vision with the world at the NLnet Linked Data webinar and the SEMANTiCS conference in September
a new starter project for web apps
a new messaging plug-in using Socket.io
a new release of the Javascript engine
threat modelling in our EU-funded security research project

The long road to the vision continues… how are you folks? I see lots and lots of activity, as usual!

gsvarovsky · February 21, 2022, 3:04pm

Hi @dirvine and others in the Safe Network team, hope you’re having a great start to 2022.

At m-ld we’re also trucking along with our vision, please do stop by our news page to see the latest (tl;dr more security work & new release ).

We’re keen to understand better the kinds of problems that people who’ve expressed interest in m-ld are trying to solve, the communities around those problems, and how we may be able to align. So we’d love to have a bit more of an in-depth chat with you folks if possible. Could we arrange a <1h call in the next couple of weeks?

Best wishes,

George

Topic		Replies	Views
Conflict-free replicated data type explained Features	12	1022	May 31, 2020
Proposal: ‘Indexed Data’ Type Tag for Mutable Data Features	43	1630	September 11, 2020
Article: Federated SPARQL queries in your browser Development linkeddata , sparql , databases	2	945	March 15, 2018
IPLD : Inter-Planetary Linked Data Development rdf	3	1029	January 2, 2019
Database @ Safe in the Published Zone Features development , appendable-data	8	502	December 13, 2022

Live data sharing with CRDTs

Related Topics