How good is SAFE Network really for anonymity?

Hi to all veterans here,

Today I just found out about SAFE network (while doing something for Solid) and it immediately attracted my attention with its set of promises over decentralisation, anonymity & censorship resistance. I think you guys did a great job at nailing one of the weak points I too realized with Solid - the issue of Pod being a potentially insecure link in its system for data storage (however I understand your project preceded the publication of Solid).

While I appreciate the many goals you are trying to achieve and admire your work towards such a grand ideal, I find the explanations on your website regarding ‘Core Technologies’ / ‘Network Fundamentals’ leaving me with questions and wanting for more detailed material. For example, there is brief mentioning of ‘IP masking’ & ‘encryption of all communication including handshake message’, but I’m not sure how this is implemented in practice. In particular:

  • if I run any Safe app (e.g., Safe browser), how can you prevent me from capturing the tcp data it’s sending / receiving with other participants of the network and thus discovering those people’s IPs?
  • given the packets are probably encrypted in some similar format, is it possible for an attacker to differentiate the traffic via deep packet analysis and thus developing strategies to denying such connections (note, they might not be able to understand the content, but they can try to drop it)
  • I haven’t taken a deep look into the crust library yet, however I presume it has to do some bookkeeping about the network topology to be able to route traffic efficiently; how do you prevent people from finding a way to decrypt these topology data (either from persistent data or memory) and thus gaining info of other nodes?

Sorry if these look like naive questions or have already been answered somewhere - I just can’t find it. Maybe a suggestion is to have some whitepapers outlining the technical principles underneath so it’s easier for people to get an intuition about it and make contributions asap :slight_smile:

19 Likes

Hi @lynx these are good questions and folk here will be happy to help you understand how the network promises to handle these issues. To start though, it would probably save some time if you first take a look at the https://primer.safenetwork.org/ which is a technical description of the network written by members of the community to help this situating. I think it will answer some questions and give you a much better picture of how things fit together, and how it can fulfil its very ambitious goals.

If it doesn’t help please let us know too, and we’ll try another approach, as well as answering any follow up questions you have.

The Primer is a little out of date, but still a good technical introduction. The main thing that isn’t in there is I think PARSEC, a new consensus algorithm, but there are videos, slides and other descriptions of that if you want to look into it. Searching the forum should help there.

I also recommend reading this post which one of our members updates from time to time, which gives a good introduction to the capabilities from different perspectives.

14 Likes

Hi @lynx, Indeed these have been answered before, but let me see if i can help you here to understand further.

Your PC will communicate with a node. Then that node passes on your request to the next node that is closer in XOR address space to the node (section really) that can respond. This continues till your request arrives at a node in the section that can handle the request.

The number of hops will be on the order of log of the size of the network.

Now the part that answers the query :slight_smile:

Each node will not be passing your IP address on, and none of the other node’s IP addresses either. The only IP address passed is to the immediate node each node communicates with since this is a part of the tcp/ip protocol. This often referred to IP scrubbing.

So how does the section that has the information get it back to you. Well what is passed is the XOR address and the reverse is used to get the info back to you. In this way no node pass the 1st node knows the IP address of the originator of request or information. And in fact the 1st node doesn’t even know if you were the originator or were passing on the information/request.

Now the problem of associating XOR addresses with IP addresses. Well I guess if you could monitor all ISP/routers in the world then you might be able to know the associations for a short period till nodes are reallocated and clients change their XOR addresses. So yes it is conceptually possible to unmask all IP addresses, but is practical problem due to privacy laws and a lack of sharing information

Then the bigger problem to doing this is its all encrypted so monitoring all routers/ISPs does not reveal your XOR address anyhow.

In theory yes. But deep pack analysis relies on what it can read from the packet (destination address only), the structure of the packet (length etc), and quantity/frequency of sending/receiving such packets.

Now since SAFE will basically look like any other HTTPS packet (ie all sorts of sizes/destinations) any deep packet analysis will have problems telling the difference and succeeding in isolating SAFE packets from the others. Imagine 5% error rate and killing packets to your bank, or company. There would be outrage. The issue then is that even if they can identify the packets (never 100% sure) it would be dangerous to their business to stop/slow those packets.

The other point I ignored above is quantity/frequency/patterns. Now since the packets are going to all sorts of IP addresses there is not a sure way to identify via destination IP address. And to quantity, well they may slow you down because of that, but again this can be a prickly one as Australian ISPs often find out when they try this. They lose business.

I guess the only pattern could be when a series of full chunks are being sent. They would be identified by approx 1MB of data via multiple packets sent to the one IP address. But since a series of chunks would likely be sent via different routes (IPs) then its not a clear indication.

So clients (users) are not at risk since their traffic is like normal browsing etc. Nodes may have some issues in some places due to the volume of traffic, but tests previous showed the traffic to be less than 5Mb/s (maybe less than 2Mb/s) on average and this is like watching a movie or uploading a movie to dropbox etc.

Yes it has information about nodes within the section and you could extract this. But again you need to have enough of these nodes reporting IP addresses around the network in order to make a map.

But even if you did there are a few issues to mitigate the effectiveness of this

  • pure cost of running 10 to 100’s of thousands of these nodes for a mature network. 1000’s to 10’s of thousands for a network out of initial stages.
  • Nodes are routinely relocated around XOR space in the network. This makes any mapping incomplete at best at any instant in time. And could see your “mapping” nodes bunch up at times
  • When your node joins the network it does not choose where it goes but is given an XOR address by the network. So you would have no control where your “mapping” nodes got to. Random placing would see a doubling up of these “mapping” nodes in sections for a large number of your “mapping” nodes. Increased cost of needing more nodes to be mapping.

So yes in my responses there is an element of uncertainty of how well your node/client is anonymised. In a small network of a few thousand nodes it would be easy to reveal IP addresses and follow a portion of the traffic. But even in this small testnet size network you would have a measure of anonymity, but not guaranteed to any measure. But once you get to a reasonable starting size of 10’s of thousands of nodes, then you have a reasonable chance of anonymity but again not certainty.

Once the network is 100’s+ of thousand nodes (hopefully in the 1st year) then anonymity would be expected.

But there is no guarantee ever. But it should be close to certain with a largish network.

15 Likes

Thanks for your illuminating answers and links, these definitely have helped me supercharge my learning :slight_smile:

I’ve just got a few more questions on the topic though:

  1. from the whitepaper it looks like there is a reliance on ‘bootstrap servers’ - since these are hard-coded an authority could just block access to those, and/or monitor anyone who tries to connect to those for blacklist?
  2. how would SAFE network stand up against network partition? Many features of reliability / trustworthiness seem to depend on having a sizable number of nodes, like collusion prevention, proof of honesty for vaults, including the mentioning above on randomised relocation and multi-hop routing; if a malicious attacker gets to control ISP / network boundary, he can potentially ‘localize’ a part of the network, inject enough dishonest nodes to manipulate the individuals in that area, no? (I guess having bootstrap server is to partly solve this issue by providing a true picture of the network at the start?)
6 Likes

CRUST has a bootstrap cache, like skypes host cache. So nodes should never use the hard coded endpoints except on first time after downloading. Even then they can be given endpoints from a friend.

There is a better approach. Nodes are given genesis information (they know the original genesis of netwokr/sections). So they can check the section they connect to (again) can prove cryptographically the network has moved forward from thei last known point. So like a blockchain/linked list signed by the Elders of the section they know if that makes sense. So you know key ABC as the last known section authority, the network nodes can give you the current key in the list signed by the last known elders you knew. AS they sign via consensus then you can be sure you have connected to the correct section and network.

Clients also, but they have the added, can I see my data and friends on this network, view.

Partitions

We have not decided fully on true partitions, e.g. 50% then both sides would stall (at the moment) until they seen at least >2/3 reconnecting. So in a partition we currently forsake liveness at least in terms of network mutations, reads (Gets) are fine though.

8 Likes

Thanks for the high level pointers - I think some of the concepts are still a bit foreign to me, so I’ll spend more time in coming days to read up on these to deepen my understandings. Still, I’m already greatly impressed by the level of technical depth that goes into this!

15 Likes

As someone who has dabbled in code in the past but is more of a user than a coder, I have found myself asking ‘how’ quite a lot as I’ve studied ‘Safe’. My own experience tells me that there are lots of people who are technically brilliant that are not teachers. Or to put it another way, they may know exactly what they are talking about, but they haven’t a clue how to convey that knowledge to other people in a way that less brilliant people can understand.

The above paragraph is by way of a thank you for the above explanations. I now have a lot better understanding of how encrypted information is retrieved. Something I had been struggling with till now.

If Safe is really going to take off and be used by ‘ordinary’ members of the public, then the details (as well as the concepts) need to be explainable to them.

5 Likes

I think it’s inevitable that some group(s) will find value in the methods used. They’ll either contribute their own high powered machines or make a similar network with the same principles regardless of whether this catches on with average people.

3 Likes