How will AI be trained in SAFE Net?

Hey,

I work in an AI startup/consulting firm, and I was wondering how one would train an AI on the SAFE Net, except using public data (which might not be sufficient) ?

There is some research on machine learning on crypted data when using fully homomorphic encryption, but it is still very slow even for very small data sets and simple ML algorithms.

Any thoughts ?

7 Likes

Private data by its definition is private, inaccessible unless you have the datamap (or keys for mutable). Randomly encrypted data is very difficult to gain information from. And SAFE is designed from the ground up to be very very good at this

Now if you could use every available computational node (when SAFE implements it) to scan the network then you might be able to gain some info on the private data and maybe even figure out the data map for some very small amount of it. But even that is not certain and would likely cost you every SAFEcoin that exists to do it. In other words this too is extremely unlikely

So then you either hack everyones computer to get credentials and then scan their accounts OR you create a opensource scanner that can be verified not to retain specific data which people can run across select files in their private data for you. Basically you ask the public to help you and you prove their privacy is not compromised. Since the APP runs on their own computer they know from that and the code audit that only processed data will be given to the AI and that processed data does not retain any personal identifying info. Not sure that is possible, but its really the only way I can see people giving the AI (limited) access to (some of) their private data.

For mutable data you can scan all the addresses for any MD data that is unencrypted. Only good if you have billions of years though. So then you have to follow leads to MD addresses that should have an MD object at and this might lead to some useful info since some people will hide the data rather than encrypt it.

5 Likes

Mmm, thanks for your reply neo. I think this is an important matter.

One can discuss whether Amazon personalized recommendations, photo recognition by Facebook or usual itinerary recognition by Google maps are really that useful, I think they are, though the way personal data is used today is not correct.

The problem is that none of these services would have been available if data was not already there, so the difficulty will be to bootstrap the development of such applications.

This is where all the blockchain/token stuff comes to the rescue, and we can imagine bootstrapping such developments by expliciting precise data collection/analysis conditions, that can be audited and verified as any smart contract. However, this will certainly need to collect and process personal informations, because targeting as I understand it cannot work without it. But the balance of power is reversed : you first have to authorize a specific entity to use your data and you can control its use in a very precise way, whereas today regulations are trying to protect users with nearly no power to protect their data.

5 Likes

Just to be sure, you would be able to trawl all the public data including any public forum/blog/social media content. I am sure a lot of people will post a level of personal material in the public arena. Photos, stories, etc.

Yes, we agree. But I was wondering wether protection of data can be hard coded in some way into the infrastructure.

Indeed, if we imagine a Facebook on SAFE Net, they could use obscure data protection/sharing conditions (similar to the current ones), download all users data to their private storage (which is distributed on the SAFE net), hence no real gain to use the SAFE net for end users in terms of data protection when trying to use daily life digital services.

A form of centralization associated to private keys instead of physical servers would emerge, and the power that goes with it.

Also until compute module is developed any program running on SAFE is running on the user’s computer.

So the APP could be allowed access to their private data and if suitable rules are set up then you could ask then which files the APP can process.

Your AI is only going to be running on computers that run the AI APP (or utility for the AI)

Yes I understand your point, but if we think of the finalized SAFE net, I am still unsure about how to avoid the new form of centralization I explicited.

So there probably won’t be any social network company on Safe, not in a traditional sense anyway. Just a bunch of people agreeing on how to share data among themselves.

For example on Safe, you don’t use the server of a company to interact with a social network. Instead, you upload your data in a way that respect a specific protocol for social networking so your data can be understood by the other participants of the social network. It’s like a big puzzle and everyone brings their own piece that is carved in a way that is compatible with the other pieces.

Then how do you train an AI? Well it depends what you’ll want to achieve. If you want a recommendation AI like Amazon you are gonna have to convince people to send you their book list. If you want a roadmap/traffic AI you are gonna have to ask people to give you their locations.

Point is, on Safe you’ll need to ask. And since people will pay to upload their data, you’ll have to ask nicely :wink:

4 Likes

Mmm, so the safecoin mechanism makes it expensive to centralize. Is it true that you’d need nearly all available safecoins if you try to pay for all private data transfer in a controlled way ?

Private data is signed by owners so all the safecoin wont help you there. The data is separated from the currency. If you had all the safecoin and the farmers got none then you would dis-incentivise the farmers and the network would fail.

Not quite. It’s just the nature of the network that makes it not possible to centralize data. If a user decide not to share with you his data, no amount of wealth will be able to change that.

And that’s the magic of Safe, even the creator of an app has no way to know who uses the app and where the data is.

Even if the data is public, as long as you don’t share the address to anyone, you can almost consider that data as hidden as private data. (AFAIK)

I guess but good luck with that. And as @dirvine says, at that point the network is probably just dead with nobody interested in farming and in using it anymore. People will just flock to another iteration of the Safe network.

1 Like

A FaceBook on SAFE could actually work in a way where each user owned their own data. For training an AI to do for example facial recognition that would have some disadvantages, though perhaps there might be way around them.

Google Photos can train on all the faces of all photos of millions of Android users. Apple Photos take a different approach where they to do it locally on the app and is thus more privacy preserving, it doesn’t work as well as Google’s, but it does work.

So in practice training an AI on SAFE could be kinda like Apple Photos, it would use the users own computer with the users own data. Perhaps a way could be found to extract features from a trained network and combine it with those of other networks in a privacy preserving way, would be a bit of research to figure out if that is possible or not I guess.

Once SAFE get compute capabilities the network itself could be used for the training, instead of only the users own compter, but then the data would have to be decrypted so the nodes doing the computation could potentially see the data. Fully homomorphic encryption could be a way around this if someone finds a way to do it efficiently. Perhaps another way could be to split the data into small parts and spread it to random nodes to do the computations. You would decrypt the data on a trusted node that you run, then split the data into small parts and do non-sensitive computations that could be combined again on the trusted node.

2 Likes

Yes indeed, efficient FHE is how I imagine decentralized AI training.

I understand David’s point about nobody interested in using the network if someone has all the safecoins, it’s just a physicist professional deformation to think of edge situations, a bit like the theorem saying that for a certain spacetime curvature, though you can get back to your starting point by moving straight ahead, you would be back approximately when the universe has shrinked to the size of a point :stuck_out_tongue:

I am not sure Apple trains their AI locally, but I know they use some differential encryption on SMS writing to enhance text writing suggestions.

Anyway, I think the SAFE Network poses some real challenges to the AI world. These were some first thoughts, it would be interesting if people keep bringing ideas into this topic on how future AI services can be imagined.

Worth looking into this project. DEPRECATED/readme.md at master · iamtrask/DEPRECATED · GitHub

I haven’t had a deep dive on it yet but looks promising and for those involved in this thread likely to stimulate some ideas.

I’m sure SAFE could be an alternative to IPFS for this project.

Yeah, but no commercial benefits and that is problem why fb will not work at safenet.

My thoughts are that we will have to look AI in another angle of view, in another words we will have to use different approach to understanding and implementing AI in safenet.

Users could utilize GAN(generative adversial network) to anonymize some properties and preserve the others in vector by hiding the weights of GAN. You, as a dev, can train another network to decide whether the user have done it correctly without knowing the weights of GAN before it’s collected into datasets.

I don’t expect HE(homomorphic encryption) would have any breakthrough near now.

4 Likes