With the SAFE Network Alpha 2 release progressing forward with interfaces stabilizing, and a stable network to test against, application developers are starting to see and begin exploiting what is possible with such a network when it is ultimately in place.
Today we have a very interesting talk with Edward Holst, Software Architect and Lead Developer for a small fintech firm in Sweden. In his spare time he has begun to explore the guts of the SAFE Network and see the huge advantages SAFE will have for hosting the type of Event-Sourced databases he is familiar with.
We explore that area of tech, as well as getting a feel for where SAFE is at currently from the perspective of a seasoned developer.
Music for this episode: Arrival, an original piece composed and performed by Nicholas Koteskey of Two Faced Heroes
@fergish, once again great work, your contribution to the community may seem small to you compared to that of the engineering beasts at Maidsafe, but in a few years you’ll look back in fondness and satisfaction on your role in bringing SAFE to a wider audience.
Something so simple yet important as sharing knowledge at a grass roots level makes you very valuable
@oetyng nice to put a voice to the posts, thanks for sharing your ideas, (the shared mutable data bit had me very interested, will be really interested how you approach this).
Look forward to hearing plenty more SAFE voices in the months to come.
Outstanding quality and content John and Edward, I really enjoyed this episode thank you very much!
I think you covered several important aspects of SAFEnetwork both at the high levels in a way most people will be able to get and recognise the significance of, and enough technical meat and depth for us geeks to enjoy without confusing things for those who are not technical. Amazing job! Really interesting and significant points throughout.
I had read a little about event sourcing before listening, but think that really wasn’t necessary to understand both that there are different ways of organising data and to understand the fundamental differences between them, and why Edward is excited about the possibilities of combining these two innovative approaches (Event Sourcing and SAFE decentralised storage) to organising and using data in applications.
Maybe at some point @oetyng, you could come back and go into the application side, and outline some more of your vision with use cases, and maybe what’s needed to get from here to there. Just an idea - I’d certainly be interested.
Folks, this is a great listen - I recommend it highly
Great stuff @fergish. It’s really valuable for the community to have someone like you monitoring and reasoning and communicating about the SAFENetwork, its ecosystem and what happens there, as well as bringing forth and conveying thoughts from those who are part of it. I am happy to have participated!
Hey @beermenow and thanks Yes, if I’m going to use my voice more over this medium more, I probably need to learn from John and start articulate and slow down a bit on each word
@happybeing, thanks for the encouraging and kind words! It can be hard to keep the balance between tech and translation of tech, I always struggle with communicating such things to nontechies in a way that don’t mess up the message (so that it doesn’t even make sense to techies ) I had good help from John there.
Yes that’s a very good idea, it would be fun. Let’s see, maybe we can do that
Great job @fergish A very interesting listen, nicely structured and paced.
In summary (just because I find it aids my understanding to break things down):
Storage on SAFE currently offers few abstractions. It is a low-level data store so @oetyng ls looking to apply structures or abstractions on top that developers can use, one example being an event sourcing database. Others (later) include queues and dictionaries.
An event sourcing (ES) database is used to store everything that happens to a particular object as a stream. State is not stored, but instead, the state of a particular stream (eg a bank account) at a particular time can be calculated by ‘replaying’ all the events (eg deposits and withdrawals) prior to that time.
The fact there are no duplicates means that the database serves as a single source of the truth, similar to blockchains.
ES databases are useful for storing events in the order they happen (they’re append only) and have a built-in audit log (again like blockchains).
Advantages of ES include massive scalability and fast writes. However, queries are more difficult than in a relational db and have to be modelled (so slower, less precise).
The main advantages SAFE can offer as a backend to an ES database :
It acts as a single storage source (a virtual hard drive) instead of multiple distributed nodes, removes the need for replicated copies, simplifies management. This is important because an event sourcing database stores more data (i.e. deltas of every event going back through time) than would typically be the case with a relational database, so avoiding duplication for redundancy is an important efficiency saving.
The architecture is also simpler as there is effectively just one database that developers need to address.
The data cannot be tampered with – therefore it is reliable. This is vital if you are looking to build a single source of the truth. There is no central control and no-one can remove the data. Removing data would break the entire stream in an ES db.
If the file already exists on SAFE you won’t need to store it again.
Is that about right, as a 101?
If so have a couple of questions, if you wouldn’t mind, @oetyng – just to satisfy my curiosity.
First, what are the main use cases? You say you develop fintech applications. For what type of projects do you use an ES database? Presumably, ES databases only work with immutable data?
Second, even if a file already exists on SAFE you still have to pay for the PUT in order to protect the anonymity of the original uploader. Would this undermine the efficiency gains offered by SAFE?
Just trying to get my head around all this. It’s a really interesting project and best of luck with it.
That’s right. And an EventStore is a database while a queue and a dictionary are data structures. Using the former is far more uncommon and has greater impact on application design, than the two latter. So, I could have been more specific, but what I meant was like implementing the IDictionary or maybe IReliableDictionary (of ServiceFabric framework) which is an interface in C#. That way existing applications could switch out current implementations without breaking anything.
This will enable coders to use the network without even changing anything they are doing, and without needing to learn the details of the mutable data implementations for example.
I.e. quickly get up to speed with producing SAFE Network based apps.
The short answer would be yes. The longer answer would be that what you store in a stream, is not necessarily limited to everything that happens to a particular object. It would be an application design choice how you partition your streams. An aggregate of objects could have one stream, but a single IOT device could have its stream as well.
Yes almost correct, with a slight modification Current state is not stored, but instead all the changes to its state are stored as separate events.
Usually in an application, an object (like for example an Account, where we have chosen to have one stream for it), which is representing the current state as a result of all events, would be cached. So replaying events would not be necessary when accessing this object often. And any additional change would be applied directly to current state, and the new events appended to the event store. That way you keep the application responsive. (There is also something called snapshotting, but it is complicating things, and best is if you can avoid it by design.)
Yes, an event is a fact. And it is supposed to be the single source of truth.
Yes and yes. Depending on how you design your application though, you could be using events that say something about another date than the actual store date of the event. It is all about how you interpret the data when you then calculate current state. As an example:
We have customers that make deposits. If the data from the bank has not been read the same day that the deposit was registered at bank, then it is useful to add a property say BankReceiveDate, which could be different than the event TimeStamp, so that when our system records this event the day after, we are still building up current state with correct information (for the accounting for example). So if you after this want to display the balance of every day, you would have the correct balances for each day, even though the events were not stored in the order that what they modeled “happened” (what the event is actually supposed to reflect, is a matter of design DepositMade and BankDepositRegistered are reflecting two different things).
If you want to record something that happened before the facts of a previous event that has already been stored, then it can become complicated. But it is solvable. If you have just stored current state, and overwritten it with new state, then you are totally lost when you try to go back in time and apply something as if you were at that point in time.
Yes, you could have massive scalability and very fast writes if designed correctly. And to the queries part, the statement is partly true.
However, the separation of reads and writes can be especially beneficial for fast queries.
This requires the usage of projections (i.e. modeled as you say), which is a way to make use of eventsourced data, but in no way a requirement for an event store.
I did not get very far into the projections subject. These can be built and scrapped at will, you always have the events to rebuild any projection you want. You would just choose any events, and stream them through a function that would use the data from the events, to build up some certain state (could be in memory, could be persisted). The events need not come from the same stream or stream category. Most often the projections use events from various streams and categories. These are then used for querying data. You could create very efficient queries this way. If you want to build an OLAP cube, by projecting the events, you can do this too, and then you get the possibility for powerful ad hoc queries that you would be used to with relational databases.
So the event streams themselves are not efficient to query, and so the process of setting up projections is an additional work input - true. But when you have constructed what ever projection you want, there is nothing saying that the queries would be slower. On the contrary, you could have much faster queries.
I would say that it is true for any database. You don’t need to move around data to access it. You just add permissions. Backups are unnecessary. You can have high availability by always being able to spin up an instance of your application somewhere, and have instant access to the data (as long as good internet connection is as mundane as steady electricity). But even with intermittent/weak internet connection it has similar benefits.
Event sourced applications often use a lot of messaging, and I can see how messaging infrastructure could be cut down on, since you don’t need to report to some other physical location, what is stored on this physical location.
Yes, partly because of that. But also the fact that any database can be created and accessed. And it can be of any size it needs to be. And that we could (depending on app logic design) cut down on messaging. What we are saying is that, as long as a good internet connection is as mundane and granted as a good electricity source, then you can basically have one shared “infinite” hard drive, which you can use for keeping databases on for example.
(There are situations when editing and deleting in streams are justified - although event sourcing theory states streams shall be immutable. It mostly has to do with correcting bugs. It can be solved by redirecting pointers to events if you have designed events to be immutable.)
It could be designed this way, which could minimize data storage needs when you could expect large number of events to be identical. It would however require that things like Id and TimeStamp be considered metadata, and stored separately from the event body, to be able to take advantage of that feature.
Event sourcing theory builds on the assumption that the streams are modeling facts. Things that happened. So you cannot change what happened. But you can add new things that happened which could change the meaning of previous things that happened. Compensating actions revert state. Like with accounting. You do not erase in the accounting ledger.
When we are interested in the deltas in our reality, which would be in every place where we are measuring things for example, you would be able to do a good representation of this, and maintain information of use.
What applications it would be suited for really comes down to how much and what information you need to store (i.e. at some time after conception also use).
Do you only need to store a form, with the name and address field of a user for example, then an event store is not needed.
Do you want to track the orbit of something real or abstract, and later correlate various influences on the orbit (NB: this is in very abstract terms and can be referring to almost anything), then event sourcing is just the sort of storage solution you would use.
In customer support, we have great use of storing every action some of our customers takes, since it
can later be correlated with probability to be reaching out to customer support. We can device help messages and nudging hints before the users themselves know that they need help.
In fraud detection, we can analyze behavior.
And financial applications of course with the transactions and requirements of accounting, audit etc. makes it very suitable.
My first suggestion for an implementation of an event store, makes use of the entries in a mutable data structure. Each entry is a potential holder of a set of events resulting from one command.
In this version, we do not make use of the deduplication of the network.
Generally though, I would not say that PUT costs undermine the efficiency gains. There will be certain drawbacks and benefits with using the network, and I am absolutely sure there will be plenty of cases when the benefits are heavily outweighing drawbacks. But as always, some cases might be better solved with other approaches.
If for example you need low latency a lot more than you need the the security, accessibility and so on, then you might want some solution where data is closer physically.
Mutable data structures do not have the inherent de-dup simply because de-dup of SAFE uses the immutable chunk’s address. For mutable objects the address is the address of the object no matter the contents.
Great post BTW and answers a lot of things. Didn’t realise that some of the stuff I (a lot of people really) did in the 70/80’s has a name. Relational data bases for microprocessor devices didn’t exist and so a form of event store was the useful way to deal with events etc.
Thanks again for your work and discussions on this topic.
Yes, thanks for clarifying this, I left out the details.
The details on how you would go about: if someone would want the dedup you would store event bodies in immutable data structures, and the meta data at other place. You would then have a map to the events, the map would be what makes these events be a stream, (and this part is what I referred to when you have immutable events and for that reason change pointers to events, in the rare cases when you need editing/deleting events - mostly would be if there was a bug. So you would simply change the map to point to a replacement event.).
And we don’t do this now since we only use the mutable data structure for all events (which will be a bit faster), so that is why we currently are not using the dedup feature of the network.