Privacy-preserving distributed computing with side effects – is that even possible?

lightyear · June 21, 2016, 6:17pm

One big problem I personally – but also many other ppl in general – have with blockchains is their all-is-public-approach. This makes such concepts (including Ethereum) a tough case for any sensitive data. But in a server-free (rather than the new buzz-word “serverless” – yak) SAFENet we need to think about concepts like distributed computing eventually – because otherwise our functionality would depend on specific devices/servers again to do certain job. So, I’ve been pondering about that a lot, but I can’t get the answer to a question, which I feel hasn’t yet been investigated a lot (even from research): can we provide privacy-protected distributed computing (with side effects)?

Let me explain what I mean with that:

Let’s assume we do have some form of distributed computing on SAFE, which does guarantee – through consensus mechanisms – that we can trust the computed results
and let’s further assume we have some homomorphic encryption system in place to ensure none of the computing parties themselves understand the data entirely by sharding and obscuring it.

Now let’s take the following rather simple scenario of a “MailingList”: I want to host a service (a computation with side effects) that is triggered by an incoming message, looks up a list of recipients and would “forward” each one that message, maybe providing some filtering but that isn’t important. Clearly, the list itself is sensitive information I’d rather not have leaked. In the best case scenario this list would also be stored on SAFEnet somewhere but wouldn’t be accessible in plain text ever – even for the computation itself. But would that even be possible?

As far as I understand homomorphic encryption so far each individual computation might not understand what the data actual is but there must be some other controlling party, which gets these results back and can actually understand them. Further more these models usually do not guarantee that you might not be able to snoop internal information in memory during the computation (or am I wrong)? But that would mean I have to supply that list of other contacts in plaintext to some computing system at some point, don’t I? But if I can’t trust that system (not to do a bad computation but to not leak any information), how could I?

I know this is highly hypothetical at the point we are right now. But I am wondering about the boundaries of the server-free approach we are aiming with SAFEnet.

Looking forward to discussing this!

Edit: Changed the title to “privacy-preserving” as that seems to be the buzz word science uses in this context.

dirvine · June 21, 2016, 6:42pm

I see it like this, initially compute will be for public data and possibly smart contracts (but initially very dumb contracts behind a DSL type start).

Homomorphic encryption is still in it’s infancy and can do some things like add numbers etc. It cannot (yet) do very much more like recognise names etc. The ability to compute over sensitive stuff, like medical records is still a hard problem with many folks working on it and some research look promising so far.

If/ when it does get smarter that we can embed readable format info in a record then we will be in much better position.

The other thing to watch is zk-snarks, not for using to ensure something has been done (as we cannot control the physical environment, so it can prove it did something, but to what really, a mock or fake). Instead we should use zk-snarks for their big property (IMO) and that is the solution to the halting problem. i.e. we can tell how long a program should run for and prevent runaway recursion etc.

This is what ethereum gas is also trying to do (and possibly some more, but I am not so close to that), but snarks should be able to provide that exact feature (it’s the core purpose) and allow us to send computation to a group, get consensus and make sure the computation was pre measured by that group before running it.

I hope some of that makes sense? It’s me in a rush as usual.

cretz · June 21, 2016, 7:48pm

Yeah, homomorphic encryption isn’t practical yet but I think that’s only beneficial for things like search. In your case, I think you just need layered encryption. But the problem here is someone has to know who you are sending it to, they just don’t have to know why. By saying you want the list hidden, you’ve essentially said “I want them to send something to someone, but not to know who it was” which isn’t really possible with direct sending.

The key to doing things like this is to make it a pull model, not a push one per se. Have what you want sent put somewhere and have the recipient listen for it. The “rendezvous” point can be as obscure/abstract as you want.

Also, I don’t see why you need distributed computing for your example. The only two benefits I see of distributed computing are consensus and load distribution.

lightyear · June 21, 2016, 8:29pm

That does make sense but is again focusing on how to “securely provide” computing power – with ‘GAS’ and basic-payment-per-CPU-cycle. Which is all great and fine if you are a research institute and want to cluster out a big computation of non sensitive information (as has been happening before to calculate proteins and stuff). I am pondering about the other part: how could I give sensitive information to that “computing cloud” without having to fear it might be leaked/compromised.

The “need” only comes out of the idea of providing the services without having to host a server (which can go down etc.). You don’t need “distribution” for it – but the question of how to provide information to an untrusted third-computing-party.

Of course there are different ways to do this specific example, I just took the idea of “doing something potentially sensitive triggered by an incoming message” as a general example. This could also be an “edit”-send-message to a wikipedia-style-system, which if the source matches a list of “authors” it accepts it updates the content – would also require at least the private key to be stored somewhere for that update procedure to access it. The general procedure and problem are the same.

If workflows as simple as these still relies on a “central” server somewhere again then SAFENet wouldn’t be as “server-free” and “always on” as it aims/could be. That’s why I am pondering about this.

Edit: investigating that mentioned zero-knowledge stuff. Learning new things !

lightyear · June 22, 2016, 4:43pm

Update: I am learning about FairplayMP now. Their work clearly goes further into the direction I am talking about, yet, their functions, too don’t work if they are having side-effects like the ones described above.

I’ll keep digging.

dirvine · June 22, 2016, 4:52pm

I think zk-snarks are a bit further advanced (the creators won the turing prize for solving the halting problem). Here’s an old fork I had (to work cross platform)

And here is an early rust version (not by me either)

whiteoutmashups · June 22, 2016, 9:17pm

tl;dr for anyone reading this and confused:

@Lightyear is asking how distributed computing be private. How can you ask someone (computer) to solve a problem for you, without telling them what the problem is?

whiteoutmashups · June 22, 2016, 9:21pm

Not my area, but I have faith it can work if you break the total problem down into such tiny parts that they become meaningless.

Might be further helped with encryption.

Perhaps there’s a way to create a type of encryption that lets you perform computation on it without being able to interpret the original data? Perhaps a framework could be made for this. Would be quite complex.

bluebird · June 22, 2016, 10:09pm

I’d appreciate a definition of side effects in this context; search leaves me grasping emptiness.

neverending_manga · June 23, 2016, 3:09am

I’m not a computer science expert yet, but could you also lock down the compute program so that the client can’t store the results of the computation through foreign code?

Just throwing it out there.

lightyear · June 23, 2016, 9:28am

Not. Quiet…

I am asking how can I ask a potentially compromised computer to do a computation - I know the problem to solve and also its solution – on privacy-sensitive data without them ever knowing/understanding that data themselves. So even if it was leaked/compromised no privacy sensitive information has been compromised.

In some way are already doing that in a highly specialized form when we talk about self-authenticating logins. Where we store self-encrypted privacy sensitive information with permission checks (the computation) that neither knows the data that is being stored nor the actual login in question, it only knows the procedure to check these infos to confirm them rather than knowing the actual content.

We could have similar mechanisms for message sending, too: where we could give a computing entity some info which allows them to issue a message eventually without the computing entity every knowing the actual source account, recipient nor the data itself. We could model that already today (as part of the protocol), but I am wondering: whether we can generalise that. To effectively allow privay-perserving secure computation on top of safenet rather than having to bake in every case into the protocol.

@dirvine zk-snark sounds super interesting indeed. I’ll dig deeper into that!

bluebird · June 23, 2016, 10:20am

What do you mean by side effects?

lightyear · June 23, 2016, 1:23pm

In Computer Science a side effect refers to a computation that effect the outside world through its activity or depends on an outside state for the outcome of that computation. That often means that rather than in usual mathematical function given the same input parameter such a function can have different output. Functions that don’t require any external state and don’t have any side-effects are often referred to as pure: every time you supply the same parameters they will return the exact same result.

A simple example of a state-full function would be todayAsDayOfWeek(), which would check the current date and depending on that time either tells you it is monday or tuesday – the result isn’t only depending on the parameter I supply but also on outside state. An example of a function with a side-effect would be closeDoor() on my elevator (by pressing a button), which has the side-effect of actually closing the door – if it was indeed open.

The example I am describing is having the side-effect of being expected to send a message back into the network. That is important because while you could more easily hide actual information of to a pure computation by for e.g. multiplying the parameters with a value only you know and just divide by that after receiving the result (super simplified), if you supplied these to a function with side-effects that easily could lead the function having an undesired side-effect rather than the implied one. Further more for side-effect inside the safenet network to take place you need credentials, which somehow need to be supplied to the function to be able to execute said side-effect. It’s quite a problem …

bluebird · June 23, 2016, 1:58pm

Thanks for the explanation.

That led me to read up (again) on functional programming, which is supposed to handle side effects by the use of monads.

And after I read an explanation of monads… I wondered what the heck I’d just read…

dirvine · June 23, 2016, 2:05pm

You should also check out elixr if you are thinking monadically

bluebird · June 23, 2016, 2:23pm

How interesting, about Elixr, I was studying Erlang before getting distracted by SAFEnet. Erlang is supposed to be super-efficient for concurrent programming, designed from the ground up for running telephone exchanges.

bluebird · June 23, 2016, 4:02pm

What might the simplest possible case look like? A privacy-preserving, distributed-computing, hello world?

lightyear · June 23, 2016, 7:04pm

That is interesting indeed. However, as I understand it, computation and sensitive information are still with me, on my side and the only thing it proof is to a third party that said computation has been happening over that data, am I correct? (Also, how do you control the completeness of that data? In their example of a watch checking your walking habits, giving you credits for good behaviour with your insurer, who can they ensure you didn’t just only supply a subset, claiming the watch ‘was off’ or ‘didn’t record’ although you have those records but they would lower your score so you don’t supply them? Well, issue for another time.)

Which means that it wouldn’t solve the issue, if I don’t control the place of computation. Which in a distributed network I wouldn’t necessarily. Or at least I don’t see how you could zk-SNARKS to this case.

I think the case I was making is a quite good ‘hello world’, but I could boil it down further:

The hidden forward address
A message arrives, a function is executed which would unpack and forward that message to a private ID, concealing my real identity, with which I’d actually look up that message.

Even if bundled that id inside the code, it would be readable in memory by the executing party. So, if I am not sure I can trust that computer, how could I trust it to not leak that information: only if outside of the context of that specific function this information is useless. Similarly to how a signature is useless without the public-key and the content, could I provide something to that function which allows to send the message to me without actually giving away the information of my ID?

dirvine · June 23, 2016, 7:17pm

Ah, this is my point, yes you can prove a snark run X etc. but that is not the key issue for us. WE have group consensus so can get agreement on something, including physical changes outwith the environment of the snark (a big issue if a snark proves a computation was done, it does not prove what the API etc. it called was).

The big issue is the halting problem one. For nodes to accept a code segment they need to know it will not run forever and be able to measure it’s life. Snarks do this really well, probably better than gas (as we have perhaps seen). They also enforce a particular language so far (gcc reginster code anyway).

So the key is being able to know the size of the code to be executed in terms of cycles.

So this is computation, but not computing over homomorphic encryption (but it can be).

What I am saying is stage 1 - get a language that works and do so with the halting problem fixed.

Stage 2 → well that is more difficult, what to compute why and how. Fixing stage 1 is pretty big mind you. Doing that gives us the platform. Group consensus gives us the checks to ensure the answer is correct.

And on it goes from there.

whiteoutmashups · June 23, 2016, 9:46pm

Was attempting to boil it down to a sentence or two for people… I’ll give it another go

Topic		Replies	Views
Network Based Replacement for Server-Side Processing? Features	44	3351	September 12, 2017
Multi Party Computation in SAFE Net Features smartcontracts	22	1621	November 11, 2019
What about decentralized computing? Features	10	724	October 21, 2018
Will SAFE be able to do the kind of things Golem can do? Features	4	1015	November 28, 2018
Distributed Computing / Smart Contracts Brainstorming Apps	9	665	February 27, 2021

Privacy-preserving distributed computing with side effects – is that even possible?

Related Topics