ELI5 how the MaidSafe Network would resolve the issues described here?

I was reading this HN post “What should we consider when moving to a service mesh architecture?” [0] & found the circumstances described in this one comment [1] interesting.

It got me thinking that the issues brought up would likely have been encountered in dev here as well. If anyone here could explain to a non-dev (me) how these would be handled?

[0]
https://news.ycombinator.com/item?id=17415421

[1]
https://news.ycombinator.com/item?id=17417046

Overall I don’t think ‘service mesh architecture’ is similar to SAFE because SAFE is not really a set of different meshed services, it’s a single service (the network) utilizing many identical processes (the vaults). But here’s my thoughts:

“Inter-service communication adds a ton of complexity and isn’t a totally solved problem. The biggest issue is distributed transactions.”

SAFE sort of solves the issue of distributed transactions by having ‘an owner of mutable data’, so it only takes a single change to update both the sender and the receiver.

But there is an unanswered question of how to keep flow-on effects (eg updating wallet balances) all in sync when ownership changes. The only current answer is ‘it just works’ (not really that satisfying I know but the feature hasn’t been designed yet). The reliability of this ‘flow’ should be extremely high considering the robust consensus and communication protocols being incorporated into the network.

So transactions a) work differently to a typical database and b) use extremely reliable ‘components’ to satisfy each step of the transaction.

Rollbacks is another aspect. I imagine mostly this will be up to the client to manage correctly but it needs a bit more of an exploration of the types of rollbacks being considered (not done here).

“What happens if one of the writes succeeds and the other fails?”

The client app doing the actions must detect and manage the failure.

If this is due to the network itself failing the network can manage it by punishing the misbehaving nodes and replace them with (presumably) reliable ones.

If it’s due to the client failing then the client can be improved to handle errors correctly (eg resume the data process from where it failed and try again).

The basic network operations are all atomic by nature, ie a single operation, but there are layers on top which involve multiple steps and there’s no info about what happens when they fail since they don’t exist yet.

“What if you get a service loop?”

This is specifically referencing messaging which won’t happen on the SAFE network due to the design of the secure messaging protocol. Every message has a clear route from the source to the destination, and the message will terimate at that destination (for more info read about secure messaging in xor space).

If computation is introduced (instead of just storage) then there could be issues with looping messages but ethereum has the same issues and has found ways to manage it so I guess when that part of the design of SAFE is approached there is precedent for how to handle it.

“you end up writing your own global database to synchronize everything”

Yes, this is SAFE. The synchronization happens via a combination of mutable data versioning, consensus protocols, secure messaging through xor space… I think that’s the main things.

“why not just build a monolith that can horizontally scale easily?”

From a software architecture point of view, sure. From a distributed permissionless byzantine-fault-tolerance view, this has too many flaws.

12 Likes

This is a very interesting part of developing on SAFENetwork.
You actually don’t even need to be on the using end of the of the API before seeing examples of it (NFS uses both md:s and imd:s in a “single” write).
But definitely when building some higher level structures you will see it. Like trying to implement a SQL dB (there is actually at least one popular implementation over a keyvalue store out there: CockroachDb (not considering that the lowest levels of disk storage are actually key value too)).

The reliability of this ‘flow’ should be extremely high considering the robust consensus and communication protocols being incorporated into the network.

With what we have today it requires an additional layer with cross md/imd transactions implemented, which would mean building up a persisted log of actions and corresponding counter actions, and then keep working down the log until something fails, and in that case work down the counter actions for those that did succeed.

There will be constraints. If anywhere in the TX is included an operation which writes irreversibly (change ownership of md, create imd), then it cannot be rolled back. So it sets some boundaries for what a higher level construct can do within a transaction.

Would be cool with some more discussion and explorations of this area.

4 Likes