Code: SAFE.Thingies, EventSourcing on SAFENetwork using PARSEC

Code can be found here:

Back when I started this topic back in early November How will Parsec be used?, I had a drive for an idea:

I haven’t continued with the idea since, and I wanted to get further with the idea before I posted anything.
But better to just get the it out there, for anyone to look at, as I might not follow up on it for a while :slight_smile:

Parsec for a decentralized event sourced application

So, what I wanted to do, is to use Parsec for a decentralized event sourced application. Any event sourced application actually.

What an event sourced application - following basic DDD and CQRS principles - consists of, is basically different streams of events, produced by a construct called Aggregate, which protects invariants.
Now, the idea was that a binary could be running on machines, forming a network much like how SAFENetwork is formed, but that in addition to this, Parsec is used to reach consensus on actual domain events, which in effect makes the logic network agreed, i.e. a decentralized application.

For more details on EventSourcing and CQRS, see this post SAFE Crossroads Podcast #38, Event Sourcing on the SAFE Network - #14 by oetyng with a detailed explanation of some of the basic concepts.

Example application

So, I did what I usually do:
A simple example using extremely simplified mocks.(Bare with the extreme simplification of Parsec, it is meant to just roughly show the principle.)

We have from DDD, EventSourcing and CQRS:

  • Aggregate
  • EventRouter
  • ProcessManager
  • Projections

We have from SAFENetwork:

  • Parsec

We have the obligatory example:

  • Notebook, which is an Aggregate.

And finally, we have a client for this, that combines these parts to a running program.

  • DecentralizedClient

All this runs in a console application, with an utterly simple end to end interaction.
Naturally, this is such a naïve implementation, that it only serves as a draft for displaying the idea of the concept.

Code examples

Program.Main()

Now, what happens here is that we will run the console app, which sets up a small group of decentralized clients in memory, takes a random client and request a change, which will then be propagated in the group through Parsec, and finally agreed on.


DecentralizedClient.Run()

Run

In the Run method of a DecentralizedClient, we see that first of all the client joins the parsec group.
Next, it will enter a loop, where it keeps polling parsec for new events that has been agreed upon by the group.
After receiving any such, they are applied to the local state, and then passed on to the EventRouter, which makes sure any events are dispatched to subscribing ProcessManagers. A ProcessManager is able to issue new cmds as a result of any event it picks up. You can think of it as a state machine.


DecentralizedClient.InitiateChange()

The cmds that are produced, are then applied to the locally held Aggregate, and if it was a valid request, the change is also requested on all the other nodes in the Parsec group, as well as voted for. (So, again, bare with the extreme simplification of Parsec.)
Now, this is the fun part: what happens here is the very thing that will result in _parsec.Poll on line 30 in previous image DecentralizedClient.Run(), yielding a result - if consensus is reached!


DecentralizedClient.ReceiveChange()

The _parsec.RequestChange(cmd) on line 53 in previous image, will result in this method being called on all nodes in the group.


DecentralizedClient.Apply()

When events have been agreed upon, and received through polling Parsec, we will reach line 36 in DecentralizedClient.Run(), which calls the Apply method. This is what actually applies ther events on the local aggregate state, as well as persist these events to a db (probably a SAFENetwork db).

Grasping what this is

So what’s happening here?
Well, we have a binary, a program, and we ask machines to run this and to form a network using Parsec, as to reach a decentralized consensus (network agreed logic) on what the results are from running this program.

An aggregate is identified by its type (stream name) and id (stream id), forming the stream key. In this example it could be Notebook@fe240945-4a94-e093-b044-989a1a7ecdd5. The XOR address of that unique stream key, will used to localize the group responsible for it, just like data and vaults in SAFENetwork.

And so, a very fine partitioning is achieved, since the growth of data in a typical EventSourced application is preferrably on the number of streams, rather than the number of events in any particular stream - each stream representing an instance of something, with boundaries so defined, that the invariants are protected within that aggregate (remember, aggregate : stream is a 1:1 mapping).

Problems

Now, this is for all clever people out there to crunch:

  • Data sensitivity everything that happens to an aggregate is open and readable to a node responsible for that stream. How can that be solved?
  • Projections. These are by definition a state that is built from events from any stream, which means it possibly needs to subscribe to events from all groups i.e. the entire network. David mentioned secure relay messaging. What options do we have to make this feasible?
  • Everything else. Yes, there’s virtually no end to the possible problems. If you want you can fill in here :slight_smile:

Next

Not much is planned here. This is, as usual mostly a thought experiment from my side. :slight_smile:

18 Likes

Hi @oetyng. Noob question: In a non-decentralised architecture would the aggregate normally be maintained in memory rather than on disk and only written periodially? If so, would this be desirable in a future iteration of SAFE?

2 Likes

Hi @JPL! I’m glad that you asked.

My answer might be a bit long, but the concrete answer comes at the end :slight_smile:

The fine thing with aggregates, is that it is a way to handle complexity, which can be applied on any domain. So, the key is that there’s some inherent complexity.
Often you tend to define an aggregate such as “Billing” rather than “Invoice”, i.e. we’re looking for verbs, rather than nouns, when modelling them. But that really depends on how complex the lifecycle of for example an invoice is (it might be a good boundary in some cases).

When you call a method on an aggregate (send it a cmd), it will need to evaluate its own state, and the request, to determine what the outcome of the request is. It might be invalid, it might lead to some change.
A change is an event that is raised/applied. Raised/applied in this case means applied to the current state of the aggregate. But generally there are two additional important things happening there, or shortly after; persisting the event, and dispatching it to subscribers.

And so normally, you want these three things to be atomic, to avoid inconsistencies. So that you don’t risk having some event being dispatched that is later not found in db, for example. Getting 100% atomicity for such a thing is probably quite hard in this scenario, and it all comes down to reaching a good enough value.
It’s possible to imagine that batching up the writes could be a valid compromise in certain domains, if frequent persisting of small chunks of data is an expensive operation, in comparison to the increased risk of inconsistencies when batching.

So, that I think answers your questtion.

In this example of a decentralised architecture, it actually works in the same way as for a centralised one. The difference is that now each node has one instance each of the same aggregate - but they still cache it, and they still persist it somehow, as close to atomic as possible in conjunction with the actual event being raised. Perhaps this requirement could be loosened a bit, so that if only a majority is able to store the fact, we are good.

The persisting part could perhaps be one single SAFENetwork store, or one per node, I haven’t delved into all the aspects of that yet. One idea is that they randomly take the cost of storing to the network, while being rewarded by longevity in the group (so reward like with vaults and Safecoin).
This example though has them all keeping their own store.

4 Likes

Yes, thanks. I was wondering how that would work. I don’t want to reopen an old debate, but since SAFE stores everything forever and writing new data has a monetary cost (it can’t be overwritten), presumably the optimal solution will be different. So keeping it in memory for as long as possible will make more sense. But this would require some sort of distributed memory, and how that would work in SAFE I don’t know.

3 Likes

In this kind of solution, the aggregate instances on every node, is an inmemory cache. So that’s there.

Besides that, I would say that SAFENetwork is the distributed memory, especially if this solution was to use a shared event store (as opposed to one per node).

If you have a domain where it is unclear whether the data is worth the cost of anonymous, secure and backed up persistence, then you maybe want to use another storage solution than SAFENetwork. From the calculations I’ve seen, the cost looks to be competitive in the long run. A bigger issue would IMO be latency and performance on working with the data. That’s where cost will increase in form of hours building up.
Overwriting or not, for event sourcing, is not an issue, since the idea is append only. In rare cases you want to edit streams (bugs), but that can be done with pointers to other events, in case immutable data has been used (sometimes anyway, for audit purposes).

3 Likes