SAFE Crossroads Podcast #38, Event Sourcing on the SAFE Network

That’s right. And an EventStore is a database while a queue and a dictionary are data structures. Using the former is far more uncommon and has greater impact on application design, than the two latter. So, I could have been more specific, but what I meant was like implementing the IDictionary or maybe IReliableDictionary (of ServiceFabric framework) which is an interface in C#. That way existing applications could switch out current implementations without breaking anything.
This will enable coders to use the network without even changing anything they are doing, and without needing to learn the details of the mutable data implementations for example.
I.e. quickly get up to speed with producing SAFE Network based apps.

The short answer would be yes. The longer answer would be that what you store in a stream, is not necessarily limited to everything that happens to a particular object. It would be an application design choice how you partition your streams. An aggregate of objects could have one stream, but a single IOT device could have its stream as well.

Yes almost correct, with a slight modification :slight_smile:
Current state is not stored, but instead all the changes to its state are stored as separate events.
Usually in an application, an object (like for example an Account, where we have chosen to have one stream for it), which is representing the current state as a result of all events, would be cached. So replaying events would not be necessary when accessing this object often. And any additional change would be applied directly to current state, and the new events appended to the event store. That way you keep the application responsive. (There is also something called snapshotting, but it is complicating things, and best is if you can avoid it by design.)

Yes, an event is a fact. And it is supposed to be the single source of truth.

Yes and yes. Depending on how you design your application though, you could be using events that say something about another date than the actual store date of the event. It is all about how you interpret the data when you then calculate current state. As an example:
We have customers that make deposits. If the data from the bank has not been read the same day that the deposit was registered at bank, then it is useful to add a property say BankReceiveDate, which could be different than the event TimeStamp, so that when our system records this event the day after, we are still building up current state with correct information (for the accounting for example). So if you after this want to display the balance of every day, you would have the correct balances for each day, even though the events were not stored in the order that what they modeled “happened” (what the event is actually supposed to reflect, is a matter of design DepositMade and BankDepositRegistered are reflecting two different things).
If you want to record something that happened before the facts of a previous event that has already been stored, then it can become complicated. But it is solvable. If you have just stored current state, and overwritten it with new state, then you are totally lost when you try to go back in time and apply something as if you were at that point in time.

Yes, you could have massive scalability and very fast writes if designed correctly. And to the queries part, the statement is partly true.
However, the separation of reads and writes can be especially beneficial for fast queries.
This requires the usage of projections (i.e. modeled as you say), which is a way to make use of eventsourced data, but in no way a requirement for an event store.

I did not get very far into the projections subject. These can be built and scrapped at will, you always have the events to rebuild any projection you want. You would just choose any events, and stream them through a function that would use the data from the events, to build up some certain state (could be in memory, could be persisted). The events need not come from the same stream or stream category. Most often the projections use events from various streams and categories. These are then used for querying data. You could create very efficient queries this way. If you want to build an OLAP cube, by projecting the events, you can do this too, and then you get the possibility for powerful ad hoc queries that you would be used to with relational databases.
So the event streams themselves are not efficient to query, and so the process of setting up projections is an additional work input - true. But when you have constructed what ever projection you want, there is nothing saying that the queries would be slower. On the contrary, you could have much faster queries.

I would say that it is true for any database. You don’t need to move around data to access it. You just add permissions. Backups are unnecessary. You can have high availability by always being able to spin up an instance of your application somewhere, and have instant access to the data (as long as good internet connection is as mundane as steady electricity). But even with intermittent/weak internet connection it has similar benefits.
Event sourced applications often use a lot of messaging, and I can see how messaging infrastructure could be cut down on, since you don’t need to report to some other physical location, what is stored on this physical location.

Yes, partly because of that. But also the fact that any database can be created and accessed. And it can be of any size it needs to be. And that we could (depending on app logic design) cut down on messaging. What we are saying is that, as long as a good internet connection is as mundane and granted as a good electricity source, then you can basically have one shared “infinite” hard drive, which you can use for keeping databases on for example.

Yes!
(There are situations when editing and deleting in streams are justified - although event sourcing theory states streams shall be immutable. It mostly has to do with correcting bugs. It can be solved by redirecting pointers to events if you have designed events to be immutable.)

It could be designed this way, which could minimize data storage needs when you could expect large number of events to be identical. It would however require that things like Id and TimeStamp be considered metadata, and stored separately from the event body, to be able to take advantage of that feature.

About right!

Event sourcing theory builds on the assumption that the streams are modeling facts. Things that happened. So you cannot change what happened. But you can add new things that happened which could change the meaning of previous things that happened. Compensating actions revert state. Like with accounting. You do not erase in the accounting ledger.

When we are interested in the deltas in our reality, which would be in every place where we are measuring things for example, you would be able to do a good representation of this, and maintain information of use.
What applications it would be suited for really comes down to how much and what information you need to store (i.e. at some time after conception also use).
Do you only need to store a form, with the name and address field of a user for example, then an event store is not needed.
Do you want to track the orbit of something real or abstract, and later correlate various influences on the orbit (NB: this is in very abstract terms and can be referring to almost anything), then event sourcing is just the sort of storage solution you would use.

In customer support, we have great use of storing every action some of our customers takes, since it
can later be correlated with probability to be reaching out to customer support. We can device help messages and nudging hints before the users themselves know that they need help.

In fraud detection, we can analyze behavior.

And financial applications of course with the transactions and requirements of accounting, audit etc. makes it very suitable.

My first suggestion for an implementation of an event store, makes use of the entries in a mutable data structure. Each entry is a potential holder of a set of events resulting from one command.
In this version, we do not make use of the deduplication of the network.

Generally though, I would not say that PUT costs undermine the efficiency gains. There will be certain drawbacks and benefits with using the network, and I am absolutely sure there will be plenty of cases when the benefits are heavily outweighing drawbacks. But as always, some cases might be better solved with other approaches.

If for example you need low latency a lot more than you need the the security, accessibility and so on, then you might want some solution where data is closer physically.

12 Likes