Introducing SAFE EventStore (event sourcing database)

Just listened to the safe crossroads podcast and just wanted to say I appreciated it!

1 Like

Btw, something I noticed closer to the circles I develop in - Apache Kafka seems to be targeting event sourcing in a recent presentation I watched. The idea that the message stream itself becomes the database, rather than it feeding others, seems to be an interesting correlation.

Perhaps as applications are breaking down into smaller microservices, thus becoming more dependent on event streams, is pushing technology like this forward. I think this bodes well for safe net and the overlay networks that provide a simple interface for this.

2 Likes

Listening to a podcast about Urbit. Your EventStore database sounds like just the thing they need for their system. My understanding is that Urbit is a VM that records the history of events such that the current state is computed based on past events. They mentioned backing up the event log on different cloud providers for redundancy. Your database seems like a better solution. https://urbit.org/posts/objections/#log

Edit: actually, I think they already have their own storage, so probably disregard. I’m having a pretty hard time trying to understand their system.

3 Likes

great job that must have taken a lot of dedication!

2 Likes

I have been back to this little project now to add another protocol for the event store.

The first protocol was basically doing this:

Created a database object to store serialized in the app container.
Add category entries to this database object, where category name (stream name) is key and an xor address to an MD is the value.
Each category MD has individual stream keys (stream name + stream id) as key, and xor address of the stream MD as value.
Each stream MD has event batch sequence range as key and the serialized events as value. If the stream exceeds the MD size or entry count, it will link to (or use hash of its own id) to next MD.
(An event batch is a sequence of events raised as a result of a single cmd . Each event in the sequence has a sequence nr in the stream.)

(Instead of adding a database object with categories to the app container, a proper MD for a database, (hash name of database to get MD xor address), could be created.)

It is many times practical and useful to be able to edit streams / events (most often due to bugs, unforeseen situations or other kinds of mistakes - which always happen), and this is possible with the first protocol.

Second protocol will be almost the same, except that instead of having the event batches serialized in the stream MD, it will have a collection of objects with metadata and xor address to immutable data, for each event in the batch. The immutable data will store the actual payload of the event, which is all domain logic related data.
The metadata will contain things as:

  • event clr type (used for deserialization),
  • sequence nr,
  • event name,
  • causation id,
  • correlation id,
  • event id,
  • time stamp.

Stripping the event of these unique property values, we can many times take advantage of the deduplication of immutable data (naturally depending on what other unique data is part of the payload). Some basic searching will be a tad more effective too.

One other significant difference is that the stream MD will have append only permissions. (Event sourcing is in theory supposed to use immutable streams). The ability to correct mistakes and bugs will be there by the concept of RestructureEvents.
When some error in the stream occurred, which is not reflecting something real, but rather corrupts state, a RestructureEvent can be appended, with instructions on how to handle the stream.
When reading the stream, it is first searched for any RestructureEvents, then all changes in these are applied to the stream, then the resulting stream is replayed (applied to an instance of the aggregate) to get to current state.
The restructure event will simply say which key shall point to some other immutable data instead of the corrupt one.
This way the history is kept, but our application can still get instructions how to avoid corrupt state.

The repository on github has not been updated with the above yet (will get there when I have some more time over), but some general cleanup and some CQRS examples (cmd handler, cmd + events) has been added, as to show how to use it.

It’s likely that this library will eventually only be using the second protocol, with additional improvements.

Soon I might be announcing some other project, which will make use of the event sourcing database technique enabled by SAFE EventStore library :slight_smile:

17 Likes

Great work you should be proud of this contribution!:smiley:

3 Likes

The IData protocol (mentioned in previous post as Second protocol) has now been implemented and the PR merged into master. Check out the code here: https://github.com/oetyng/SAFE.EventStore

It is now possible to use SAFE.EventStore with the alpha-2 network.

However, you will still need to do it via debugger in VS, since I have not yet configured paths to match with the release binaries. I’ll update this topic when I’ve fixed that.

Currently you can:

  • Create databases in alpha-2 through the web UI at localhost (only when debugging solution in Windows).
  • Use the TestCQRSApp console UI to write notes into an example NoteBook, stored in a database created through the web UI (or implicitly create a new, with name specified in Startup.cs).
  • Browse the databases, the streams and the events in the streams, through the web UI.

Coming up:

  • Download the binaries and run also on OSx, Linux and even ARM.
  • Include SAFE.EventStore in any C# project, and start building event sourced applications, with SAFENetwork as backing data storage solution.
  • A small secret surprise project based on SAFE.EventStore :wink:

If there is interest I might be able to do a live chat, share screen and go through the various parts and discuss what is going on and generally explain how this can be used. Would be fun, so just let me know and we’ll try to arrange something with as many as possible.

18 Likes

Wow! Very cool! I am going to have to find some time to test this out. I linked the repo url on reddit too.

It is great to see these examples of advanced SAFENetwork use cases!

5 Likes

Could you explain like I’m 5?

I feel dumb and don’t understand.

2 Likes

Hey @anon81773980

I’ll give it a try :slight_smile:

(There’s also a podcast which goes through some of it: SAFE Crossroads Podcast #38, Event Sourcing on the SAFE Network)

SAFE.EventStore library contains tooling to be used by software developers.
It is an implementation of a specific database type - a certain way to organize data. It adds a layer of abstraction on top of current SAFENetwork API, which means: it makes it easier for developers to interact with the network.
This library can be used as foundation for data storage as is, both according to Event Sourcing practices but also in other ways, if used under additional layers of abstraction.
Simply put: a protocol to store data in a specific structure, which makes it easier for new developers to get started, without knowing much of the details of SAFENetwork implementation.

There is also a SAFE.CQRS library which is yet another layer of abstraction enabling developers to build applications according to patterns and practices commonly known as CQRS and DDD (Command and Query Responsibility Segregation and Domain Driven Design). Those patterns are ways to handle scalability as well as improve development experience (CQRS) and to be able to handle complex business domains (DDD).
It is supposed to make it easier to build robust, scalable applications that are easier to maintain and that are more apt to handle the problems they are built to solve.

But there is more to be done on those areas. For those who are somewhat familiar with application design patterns it can be interesting to look into DCI (Data Context Interaction). It is yet another level of design practices that I will eventually use for applications that will make use of the libraries mentioned above. (Here is a very good 43 minute talk by Andreas Söderlund on DCI).

All this talk of layers and abstractions is maybe a bit confusing if you’re not a software developer, but each layer of abstraction is supposed to make programming easier in various ways. A good abstraction doesn’t lose much power in exchange for this simplification, but generally you could say that you have more freedom/power with less abstractions, but also a much steeper learning curve and many more open pitfalls.

15 Likes

An update!

There was an addition to the family recently and workload has been a lot over normal, so the progress with the previously mentioned milestones are not that fast :slight_smile:

But I can tell you a bit about projections, and the upcoming protocol for it.


Projections

Projections is the way we make the event streams accessible for a client, such as an app or some component in the system. It is the read model, or the Query part in CQRS (Command and Query Responsibility Segregation). We can also call a projection a view model.

So, when you have an event sourced system (i.e. you store the data in some sort of event store) you want to be able to access it in a performant way for the relevant context.

This is the beauty of it. One set of events, from the same or different streams, can build different views. So the same event can contribute with its data, to build up any number of different projections.
For example, we have the famous account example:

  • You can have one viewmodel that represents the balance of your account. For that you need the AmountDeposited and AmountWithdrawn events from your own account stream.
  • There can also be another viewmodel, with the amount of deposits coming in for a specific date. Here, all the AmountDeposited from all users, are applied.

Let’s say we also store the accountid in this viewmodel. And voilá, we have something we can query for, say… give me all users that made a deposit today.

What we realize is that we can build up queryable datasets from the events, which will be much (MUCH) more performant, than iterating through all eventstreams and trying to calculate this data - over and over.

Now, for our SAFE EventStore, the basic version of projections, is executed locally.
A local agent will run in the background, and process the event streams specified by the user, into the projections wanted. Problem here is that this data, with no further connection to outside world from the local machine, is only available locally.

What we need is a protocol to store these projections back into SAFENetwork, so that they can be queried from other locations.

The protocol

When we save a projection, it means we save the current state of something, to the network.
This state is arbitrarily large, and for that reason we use ImmutableData. Also, it is a representation of current state, as we knew it at that time. We get a version of the state.
Now, every update to this view model, say one property Amount is increased with 100. will lead to the entire viewmodel being saved again. Since we use ImmutableData, this is a new data structure stored, and a new data map retrieved.
What we have now is an older and a more recent version.
This is actually something not uncommon to want, for example, account balances at end of year. That means you want to go back and access a certain version.

The picture is coming together here, and what is interesting is, that it is strikingly similar to an event stream.

Our event streams as implemented in SAFE EventStore, use ImmutableData for storing the payload (which can be anything…) and chains the events together in a stream represented of at least one MD.

Do you see it? :slight_smile:

We can actually re-use the EventStore, for our projections.
One stream would be one instance of a viewmodel, with it’s entire change history, saved as individual events, with metadata and datamap to the immutable data in the network.
Current state would be the very last event in the stream.

And there we have the projections for our event sourced system, implemented in SAFENetwork with our own EventStore.

What we need to do next, is to organise this for querying. This I still have to design, but stay tuned, both for the code update in GitHub, to see the actual implementation of projections, and the quest for a queryable store! :slight_smile:

15 Likes

Whatever it is you are doing. Keep at it! In your own time of course. Precious little bundles are number one priority. Work comes second!!

5 Likes

Congratulations on the new arrival! :baby:

I’d love to be able to say yes, but I 'm not sure I do :thinking:
Let me have a shot and you can put me right.

The stream of events is stored on the network as ImmutableData, so each new event (say buying a coffee) is added to the pile of events stored.

At the same time a new projection is created and also stored as ImmutableData. A projection is a snapshot of the current state of the event stream, taking in chosen parameters (amount spent, item bought, etc).

When this happens a pointer to the new projection is stored in a datamap which is a MD. A pointer to the old version is also stored - i.e. the new entry is appended to the MD rather than overwriting the old entry.

So the MD is essentially a record of everything that has happened in the stream (data and metadata), and it allows applications to access and query the stream of events very rapidly as the current state is the last entry and previous states are all there in one place as pointers accessible without having to loop through the data … (I’m floundering now).

Am I anywhere near?

2 Likes

Haha, I’ll keep at it :slight_smile: Yes, precious little bundles indeed!

Thanks JPL!

You know, that is pretty near spot on. You do see it :slight_smile:

I’ll just clarify one part and then expand with an example also:

With this organisation, there will be two types of streams (a stream is a chain of MDs, or at least one MD as long as it is not full).

The event stream, which keeps every incremental change.

  1. CoffePouredUp
  2. MilkAdded
  3. SugarAdded
  4. SipTaken

The event CoffeePouredUp might contain info on volume, type of coffee and what not. Let’s say we just store the fact that there is a cup of coffee, so nothing more detailed than that:

So then there is the projection stream. It will, at each entry, keep the state as it was at that time.

  1. A cup of coffee
  2. A cup of coffee with milk
  3. A cup of coffee with milk and sugar
  4. A cup of coffee with milk and sugar, less one sip

or… when looking at the implementation details:

  1. Metadata + datamap
  2. Metadata + datamap
    … and so on

And in the metadata we have date time for example, and things like that.

Our current version of the coffee is the number 4, this is what we are holding in our hand. Maybe this info was stored at 15:20.
If we want to see what our coffee cup looked at before 15.20, then we look at number 3.

The model maybe looks like this

CofeeCup
{
    string Content
}

and at 3. we have

CofeeCup
{
    Content = "coffee with milk and sugar"
}

The difference between these streams is that the event stream is immutable by design. What happened happend. Every event represents a fact, we cannot go back in time and change the fact.

If there is anything we found out later, then we add that to the stream:

  1. FoundOutTheCoffeCupWasJustAnImagination

and the current state in projections stream would perhaps then be

  1. null

(meaning, we just deleted it)

But, the difference: the projections stream can be recalculated.
Because we can at any time decide that we want to include more data in our viewmodel.

Say, we also include temperature of coffee, because we happened to have a thermometer that we dipped into the coffee.

Maybe those events come from the CoffeeThermometerStream:

  1. TemperatureRose
  2. TemperatureSank
  3. TemperatureSank
  4. TemperatureSank

They would contain an identifyer so we know where this was (a geolocation specifying the exact location of the coffeecup, or an id for this cup), and the temperature change in degrees.

So, our viewmodel maybe looks like this:

CofeeCup
{
    ContentType
    Volume
    Temperature
}

now we want to see how this looked at 3., and if we rebuild the projection and include the events from the thermometer, we get:

CofeeCup
{
    ContentType = "coffee with milk and sugar"
    Volume = "250 ml"
    Temperature = "35ºC"
}

The first model at 3. with fewer properties, still exist in the network, since it is an ImmutableData, but we wiped the MD clean, and entered new entries with metadata and datamaps.

That is the big difference compared to the event streams, with regards to their lifecycle.

Feel free to ask more! :slight_smile:

3 Likes

Thanks for the great explanation. If I was stuck on what was happening with the MDs but I think I get it now. I also went back to your previous explanation of projections to refresh my memory. So it’s the ability to use the same raw material (streams of events going back in time) to build multiple updatable projections in full confidence that the underlying historic data is reliable, with at the same time (if the projection is specified correctly) the ability to query data very efficiently.

1 Like

Aah, yes, I had forgotten about that post. It is very detailed indeed. I recommend it to others who want to know more!

Very much so. It is of big importance that rebuilding is easy, because you want to be able to adapt the projections to the ever changing needs of a consumer (like a view in a website or a new feature etc.).
What I drafted above was how to store the projections. The actual querying will still need some design.

2 Likes

I don’t remember the exact phrasing, but David Irvine mentioned some boxing analogy about the patience being the important part, winning the long game.

So, we have another update :slight_smile:

Actually, there will come a day in a not too distant future, when I can actually spend more than a day here and there a month on SAFENetwork related coding. We will get there.

The big news for this update, is that SAFE.EventStore is now upgraded to be used with safe_app_0.6.0 and the latest csharp bindings!
This Saturday I refactored the interaction with the bindings and did some final cleanup and testing yesterday.

(SAFE.EventStore repo is here)

There’s been a lot of nice simplification and elegancy introduced with the new csharp bindings so it’s getting easier to work with. Very nice! I like that there are now less static classes and more instantiation, like I had suggested once earlier (I don’t know if my call was heeded or if it was already thought of).

Additionally for the EventStore, I have increased the capacity of the db. I wasn’t happy with 1k dbs, 1k stream categories and 1k stream instances per categoy.
So I introduced some sharding by non-cryptographic consistent hashing.
Each category will have almost 1k shards (minus a couple reserved entries per MD for other stuff) which each hold 1000 streamkeys, instead of each category holding the streamkeys directly.
So far, I have just multiplied stream instance capacity by 1000, but I will do the same to the events.
In effect, we will then get a total of 1000 trillion events per database id, and 1 million trillion events per safe account (so, with total of 1k dbs).

I will have to see how the performance is when having any substantial number of events in there though… I mean, the sharding still give fast key value access, but if I would ever need a full scan, then that would probably be completely insane. But we’ll get there :slight_smile:

So if you have an app (1 database id) with expected life time of 10 years, you could fill it with more than 3 million events per second, and still not fill the quota during its lifetime. So… it will take a lot of devices to be able to push that many events to the network per second :upside_down_face:

(I guess, eventually, I’ll have it expanding to be “unlimited” by some nifty way.)

There’s still garbage collection of delegates in the csharp bindings, which causes NullReferenceExceptions. This happens inevitably after doing some writing or reading of data. But I think the major bug that was causing me problems with safe_app_0.4.0 is now gone, and that is good, because it was an inexplicable one. Had no idea what was causing it. The garbage collection of delegates is well defined, even though I don’t currently know exactly how we should solve it.

So, that’s that for now. As soon as the garbage collection of delegates is solved, I think we have a functional (albeit very simple) eventstore for SAFENetwork :slight_smile:

32 Likes

This is a superb thing to have at this time, huge congrats on this so far. Superb !!

17 Likes

Yay, I’ve been anticipating an announcement from the forests of Sweden for some time. I felt sure you were still beavering away out there, not gnawing down trees but instead refining some valuable backend infrastructure for SAFE. :smile:

12 Likes

Very cool! Glad to see this project progressing well!

3 Likes