Limitations of autonomous networks / decentralized apps

I’d like to have a serious discussion on a few limitations. They might be covered in various degrees in other posts, but I mean to explore the current state analysis of these two and hopefully as technical as possible.

  1. Databases
  2. Code execution

Databases

The most standard database type is relational databases (“SQL”). Even though I have seen production type SQL dbs based on keyvalue storage (Cochroach Db, read this very interesting blog post about it), this seems like a very difficult thing to solve on SAFENetwork keeping it ACID and maintaining any resemblance to the performance people are used to today. (I have started an extremely rudimentary attempt to implement SQL over SAFENetwork using the approach of CochroachDb - ACID and performance problems were quickly very evident).
If we want to be truly serverless, the db servers has to be redesigned into decentralized processes. Not only is the coding for that very complicated, I can only imagine the performance drop you’d see.

And so, the future for SQL on SAFENetwork, to me seems cloudy (no pun). I don’t have the information at hand that woud tell me: “no probs, it’ll be easy peasy” or even “it should be doable”.

Even though there’s for quite many years now been a lot of hype around “NoSQL databases”, they simply do something else and have other use cases. I don’t see relational databases going out of fashion for a long time.
(Mind you, I am saying this as a non-conventional database user. In our business we do not rely on relational data models.) And still, even a nosql db, as people are used to them, would have many of the same requirements as the above.

Code execution

There’s often talk in the community of replacing Amazon and Azure etc.
Being a person that uses Azure everyday, I see that as a very far away goal.
The largest problem being that everything needs to be a decentralized application, and the number of applications/functionality (very advanced ones) in Azure is staggering, and growing insanely fast.
Just as with the dbs we have the very same problems:
The first thing is that the problem of how to code these things in a decentralized manner must be solved.
Second thing is that these processes are running on centralized servers, the code execution is fast.
The performance drop that we would see, it would be big.

I think if network latency would be likely to drop massively, we could get down closer to consensusMajority * jobExecutionTime, and that would be a huge improvement, but I don’t know about the prospects for that.

Current knowledge

What does the core people in MaidSafe, and other thinkers of the community, think about these things? I mean in depth, what’s the take on it?
Do we see a realistic way to implement the db standards that exist for example, or do we see them being replaced with something else that - realistically - actually does what people require them to do? (In case of the latter, it would need to be quite well defined to be considered realistic IMO - i.e. I’d love to hear more about it).
How do you envision a SAFENetwork Azure? What are your concerns and solutions?

I know we might be so damn early in the evolution step of this technology (autonomous networks, decentralized apps, all of it), that we simply cannot know that much now, and have no other choice than to trod on and solve it as we get there. But still, if anyone out there have come further in these thoughts, and would like to elaborate on the details it would be something really nice to start talking more about!

22 Likes

I’ll continue / expand:

The things that I like to see is that the car rental company, (or the airlines or firedepartments etc. etc.) can realistically use a SAFENetwork db.
What do we need to make that happen? I mean, would we change how data is stored and accessed, or is it possible to implement current data storage standards?

Same thing when it comes to a company hosting their applications. Like a billing company, (or an accounting service or call center or what ever), that they’d be able to just code up their applications and host it on SAFENetwork (instead of Azure). How can we do that? In light of current technology, what would that look like and is it feasible?

3 Likes

I’m not able to add depth I’m afraid, but I suspect it will be a bit of both, and that decentralizing applications, and separating applications from data will at the same time lead to different needs with corresponding different solutions.

Naturally, where established approaches still work these will be used first and may continue, and where they don’t new things will come into play (both new needs and new solutions).

I can’t add depth because database implementation isn’t something I know much about, but it does intrigue me so I hope some people can get into this here, even just floating ideas, but then thinking a bit into the detail.

One area which I’m interested in and encouraged by is SPARQL, which is a relational like query language used by Solid to run queries across different data sets on different servers as part of Solid. This is interesting because it is I think a proper relational style query language, and is implemented either on client or server. No doubt there will be limitations of scale for client particularly, but I’m hopeful this will still lead to an explosion in use cases and application models that make it useful in SAFEnetwork.

10 Likes

I’d say it will be a combination of user data and company data. Like orders that customers place will usually be in company owned MDs since it is unreasonable for customers to be able to modify orders/picking lists/invoices after an order is firmly placed.

my 2 bits worth to the discussions.

5 Likes

I’m a clueless consumer here, but most server related work seem to be reduced to functions and api’s. Honestly I don’t even know what I’m talking about, in one of his interview seneca said something about servers, so he might be better at giving technical feedback… All big tech companies have a serverless option.

Maybe tag @dirvine, @Viv . Don’t know if they have seen this thread as it is placed in the uncategorized section. It feels like very intersting a questions in this thread, about databases and code execution.

1 Like

I think once data and communication has formed a sturdy foundation, executing of code at vaults will become a natural priority.

Even if we can start with basic search/retrieval operations supported at the vault level, it would be a huge benefit. Being able to search through data without having to retrieve it would help data mining enormously. Naturally, this would benefit database management systems substantially.

The real big step forward would be running arbitrary code on behalf of users. This would open the flood gates to commodity distributed super computer power, metered by resource use. Combined with the above, a really powerful system could be realised.

Moreover, being able chain scripts together and trigger events which are just executed by the network will start to provide flexibility more akin to server side processing.

In the shorter term, I think programming styles will adapt to work around some of the limitations. A combination of new design patterns and scripts in by users in concert with other user input could be powerful. Being able to fulfil an order from running an admin app on your phone could be handy. Likewise, apps could still have components running on traditional servers, which respond to network events; a sort hybrid design.

Plenty of challenges to overcome, but the spine on data and comms needs to form the foundation first. Then we can see what happens next.

7 Likes

These are my theories…

At first, the old idea of ‘servers’ will simply change to becoming ‘another client’. I’m going to use ecommerce as the example but it’s similar for all cases. Instead of consumers interacting directly with an ecommerce server, consumers will upload orders on the SAFE network for ecommerce clients to discover and respond to. A bit like consumers are emailing to and fro the ecommerce client and SAFE is the technology that ensures that interaction stays secure and unbiased. So the main difference will be the model of data ownership (which is a very important difference!). But the processing of data is still essentially happening on a server acting as a SAFE client.

The second step will be to outsource processing to special-purpose compute clients, ie have the ecommerce processing done by a ‘rented client’. This could be as simple as a second-layer protocol that allows renting clients for general-purpose compute jobs. Many different ecommerce operators could be running on a single compute client. This has the benefit that ecommerce clients no longer need to run 24/7. They can pay a compute client to do their processing for them, and presumably get the benefit of the high uptime and faster processing speed compared with every ecommerce operator trying to run that all themselves.

The obvious disadvantage of the second step is reduced privacy and need for trust. Ecommerce clients would be uploading business logic and code to third parties and relying on them to do operations on data that’s specific and sensitive to the ecommerce operator. So this step will take some time to develop, starting with benign public services and gradually moving into private services as homomorphic encryption develops. The eventual shape of distributed computation on SAFE depends almost entirely on the way homomorphic encryption develops.

A possible third step is that vaults incorporate the existing second-layer compute services and offer it natively within the SAFE network itself. It seems sensible to have the various second-layer services compete separately and then look at what works and what doesn’t. Maybe the second layer becomes a separate defacto standard anyhow and there’s no need to combine the vault and compute concepts.

Another possible path to compute may be that SAFE becomes a place to initiate and coordinate p2p connections (eg webrtc) directly between consumers and merchants, with most data management and computation happening off the network. Is this webrtc relationship a client/server model? Or a client/client model? Or a server/server model? It’s a bit of all of that. I suspect this will be the dominant mode for real time interaction with background sync of the essential data of the interaction onto the safe network. So SAFE becomes primarily a sort of dns / lookup tool plus a backup solution to store the essence of each interaction, and compute never becomes a significant part of SAFE.

Regarding databases and sql queries on safe, I see that as being the same conceptual development process as ecommerce.

The main concept that needs exploration is that of ‘client’. When is a client no longer a client? And what is it then? Do we consider the current idea of a ‘server’ as simply a ‘special client’? If not, why not?

Maybe another good idea to explore is 1) how would my app work if clients were only able to use an ftp server for storing data? 2) how would that change if the ftp server could execute code for clients? SAFE is basically an ftp server. Should it try to be more than that?

How do you envision a SAFENetwork Azure?

MapReduce + homomorphic encryption + webhooks

MapReduce: allows calculations on local chunks to avoid bandwidth bottlenecks and enables massive parallelism.

Homomorphic encryption: allows calculations on private encrypted data by third parties.

Webhooks: allows ‘data to be a client’. The results of mapreduce (ie a new piece of data, which is specially designed to also act as a client) can be saved to the network in a meaningful way by the compute-node generating it, and then further action taken if required. Also a lot like the pubsub concept.

10 Likes

Cosmos DB on Azure implements an SQL-like db on top of what basically a key value store. I think some of the concepts they use might be useful. All about building various indexes for the queries you want basically.

Here’s some info on how the cosmos db/documentdb indexes work. Something similar to what is described here could be interesting to explore on SAFE.

What you could do is download indexes that you query. Indexes on SAFE would need to be optimized in different ways than most indexes are normally though. If you query an index and then find some pointer to another index that you need to download to execute the next step in the query,there is much more latency than if it simply existed on the same drive or another machine in the same data center, connected with a 10gb connection. Downloading things in parallel should be fast though. The indexes should be designed around the constraints of SAFE.

For fuzzy queries I think it could be very useful to have a method to retrieve a list of MDs by xor distance. You can then use various ways to do dimensionality reduction of your data to fit core features into the MD key/name and then you can do a query to find a list of MDs that contains data with certain features or that is similar to some other data.

3 Likes

Nice @mav, I suspect we can add STARKS and bulletproofs here as well to allow secure and private querying of data elements from personal containers. This could be a big boost for medical research etc. whilst giving no knowledge of the individual’s other data.

I would add genetic algorithms, evolutionary developments (like NEAT with some newer notions), deep learning and simpler neural network sharing could all do much better if they did indeed share nets between them all or at least be able to securely query data privately from each other.

I like your ideas prior to this quote, it is certainly possible to see those steps happening.
I hope that makes sense.

7 Likes

You’re speaking of zk-starks correct? Is bulletproofs something else?? Very cool stuff to read about :smiley:

Edit: ahhhh perhaps this for bulletproofs very much of the same ilk. Even better, a Dalek crypto pure Rust implementation, GitHub - dalek-cryptography/bulletproofs: A pure-Rust implementation of Bulletproofs using Ristretto.

4 Likes

Yes bulletproofs/starks are like zksnarks, but simpler and no trusted setup.

4 Likes

But for that you would need to support transactions (at least: bundling multiple PUTs into one atomic operation, which would fail as a whole if one of the PUTs failed), is this a planned feature for the SAFENetwork?

3 Likes

There’s lots of academic research into ways one might get efficient ways of querying data in peer 2 peer networks. Found a couple papers, maybe something could be useful.

A Content-Addressable Network for Similarity Search in Metric Spaces

Resource Location in P2P Systems

Approximate Matching for Peer-to-Peer Overlays with Cubit

Magnolia: An Efficient and Lightweight Keyword-based Search Service in DHT based P2P Networks

Range-capable Distributed Hash Tables

Distributed Pattern Matching: A Key to Flexible and Efficient P2P Search

5 Likes

Thanks a lot @intrz, and thanks to all ideas in the topic.

I was very interested in hearing how MaidSafe sees these things with specific regard to SAFENetwork implementation.
I am sure they have read some of the research in the area, and they know their code and ideas best, so that’s the combination I was hoping to get more details on: SAFENetwork, SQL, acid, transactions. First of all, has there been any thoughts on it, then, what are those and what possibilities, limitations, and ambitions are there?

I mean, even a “sorry, we have not considered it at all” is a perfectly good statement about current status.

3 Likes