SAFE versus passive surveilance

These questions are bound to be risen by others who audit the protocol sooner or later. Who knows, maybe some of this might make it into the FAQ.

Don’t be so concerned, Mr Original Thinker, the question was already asked, and answered, on this forum.

@anon81773980

How does exactly IP address get scrubbed?

It doesn’t. You just cannot tell which IP does what on SAFE unless you control a large majority of nodes and the target has very distinct traffic pattern which you can monitor in terms of flow. Very hard on a large network. Also hard to prove. If countermeasures are added it gets even harder.

None of this is actually strong privacy protection the way a project such as I2P or even Tor works. I think it’s safer to say that there’s some pseudo-anonymity, but that against state-sponsored surveillance all bets are off. Tor is also weak when it comes to a state-level group monitoring traffic flows.

Lol! You amuse me. If you read the OP, you would know that I stated there was a possibility that this had already been brought up. But, taking a jab in the dark is your specialty. You’re like a mole. I whack you in one thread and you pop up in another. Is there no end to this game. :disappointed_relieved: I want a refund…

—Loudspeaker—

Attention, attention, there is shit on the bathroom floor in the biology department. I repeat, there is shit on the floor!

Yo janitor, you’re being summoned. Stop auditing the class and get to it…

No doubt I left it there for you. I’m such a primate. :joy:

I think Janitor must be a Stack Overflow mod… :stuck_out_tongue:

Lol!!! :sob:

Yup. You definitely got me thinking!

I believe it’s in crust where this gets accomplished. Unfortunately, I am not cogniscent on exactly how this occurs. However, it is mentioned many times by @dirvine in these forums.

Disclaimer: I am not sure this is how it works, and would welcome any critique or clarification that any devs would have to give

My best guess is that since the network operates on XOR space, (and not IP addresses like TOR), the first hop is to another IP address based on it’s closeness of it’s XOR location related to the piece of information it’s retrieving.

This first hop will then be to a vault, and all others after as well. Keep in mind that vaults are stateless, and only retain information for the time it takes for them to perform whatever process they need to do on it.

After the first hop, vault1 queries their DHT for the node closest to the hash of that piece, and sends the request on. Now that request (on vault2) is comprised of:

  • Hash of data requested
  • recipient XOR

That continues on and on until the vault is found with the piece of data. At that point, vault9 says “I have this piece of data, let me see who in my hash table is closest to the intended recipient”.

The path is not necessarily the same, because “Every computer has a different view of the network”. In reality, it is most likely going to be a completely different path.

Vault9 will then send it to vault8, who recieves, similarly:

  • data
  • recipient XOR

and on and on until the recipient is found and the data delivered.

So now that you (hopefully) see how the network works in XOR space, one final note about Kedemlia that may tie this all together. Here’s a graphic (video starts at correct time for reference). The DHT that’s held by all nodes does show the IP address of other nodes. The routing however, is done on a step-by-step basis.

This simple explanation may have holes that the maidsafe team has confronted and solved, but I am not aware of the details, and only hoped to convey an elementary view of the network…the only type that I am able to provide with my limited knowledge.

1 Like

Just a couple thoughts:

Since data is split into chunks, there is no one piece of data being retrieved from one specific entity. That would make it incredibly hard to trace an actual file rather than just a chunk.

Chunks can also be de-duplicated, so who knows if that chunk is being requested for file1 or file2. Each may contain data that hashes to the same chunk, but there are two reasons for requesting it.

As far as timing attacks go, the path to retrieve the data, and the path to return the data are not necessarily (nor likely) the same paths.

Also, while a monitor at one end will see incoming data, there is no way to see from which vault it came from. Even the recipient doesn’t know any but the last hop it took. And since even one file will be coming from many different vaults, it would be unfeasible to correlate any timing attacks (IIRC)

Also, chunks are cached as well, so it may not be the original storage vault who ultimately returns the chunk, but rather an intermediary close to the chunk that has cached it.

@Tonda, to learn more about the network I would invite you to view @dirvine’s whiteboard speech. (even though it’s not great audio and he tends to let his thoughts wander around a bit. In fact, I would absolutely love it if he did another whiteboard explanation for the current state of the project and have time to go through all of the elements of the network that he thought were important.)

3 Likes

As I am mostly unfamiliar with TOR and it’s functionality, I would request some clarification on this statement if I may.

In light of the numerous reports on TOR vulnerabilities, in what ways and to what extent is the protocol weak in regards to traffic flow or other analyses?

Another question to consider is: What base assumptions are made of the TOR protocol that are not necessarily similar for the SAFE Network?

Thank you so much. You are quick helpful and to the point. 1+ respect for smacz for real. I look forward to more of your scalpel like responses. I will definitely check out that speech. Thanx again. :smile:

2 Likes

This might also help, the FAQ 33 videos

4 Likes

@smacz , here’s a good article from Tor itself which explains their weaknesses – https://blog.torproject.org/blog/one-cell-enough (though the latter half of the article gets in depth on one particular attack they’re partially debunking). Basically when you’re watching traffic end-to-end, even though the middle hops are a black box, if you see what goes in and out of that black box (even if it’s encrypted), you can make some good guesses as to who’s doing what on Tor.

Here’s a recent article on a new project which offers better anonymity but at quite a performance cost: 'Dissent,' a New Type of Security Tool, Could Markedly Improve Online Anonymity

One feature of Tor which I’m not sure SAFE has, is that traffic is intentionally sent through random hops. On SAFE it’s not quite random, it’s a DHT lookup.

I2P messages contain additional protections, such ability to specify delays on when messages should be sent or specify additional hops and routing instructions. I2P: A scalable framework for anonymous communication - I2P

1 Like

The part I quoted was your claim that someone (a lesser thinker, of course) may come up with the same or similar question (months later, when they catch up with the foremost thinker of the forum).

So, for the benefits of the community you nominated your unique question for the FAQs. :smile:
True team player!

That’s not necessarily true. I can download top 10 anti-Communist videos to SAFE and read them to learn the patterns and flows (for example, how many chunks there are requested and how they are delivered). After a lot of learning it may be possible to narrow down the list of suspects to some large number that would have to be cross referenced with a bunch of other things from other sources.

Chunks: in videos and all compressed docs, if there are repeating deduplicated chunks, those are very likely the same files. Not that it matters in terms of files, really, since you cannot really see what’s inside as they are encrypted on way to the downloader.

It won’t be easy to detect SAFE users who apply simple measures of protection.

That article focuses on mainly what they describe as “tagging attacks”, so I’ll focus on those for this post. Feel free to expand the surface area if you wish. Also, as an aside, the other three attacks that they link to; (some of) their pdf’s are unavailable. I guess someone didn’t want to renew their DNS lease. Another problem defeated by the SAFE Network…

Sorry for the long post. Don’t worry though, I repeat myself for clarity often enough. I swear it started out a lot smaller. Also, once again, I am not an expert. YMMV



The way we generally explain it is that Tor can try to protect against traffic analysis, where an attacker tries to learn whom to investigate.

For future comments, I want to re-emphasize this point. This is not a discussion about an attacker trying to learn whom to investigate. It is exactly about the “other things”.

However, Tor can’t protect against traffic confirmation (also known as end-to-end correlation), where an attacker tries to confirm a hypothesis by monitoring the right locations in the network and then doing the math. -Emphasis mine

Before we dive in, (breaking my own rule above) I did want to mention that setting up such a tagging attack would be so utterly massive in it’s scope that it would be relatively unfeasible. This mainly due to the amount of farmers required to serve up any meaningful bit of data. But feasibility never stopped the NSA before.

So chunks are served by farmers. Those chunks aren’t even necessarily a unique piece of data. For instance, I don’t know what kind of file to exemplify here, but imagine a type of file that contained a huge (1MB+) header - even when compressed. If that header uses default values and many people store a file of that nature with different content but with the same header, that header chunk will be brought down for many different reasons, without the corresponding exit of that chunk on the target client.

Also, if that chunk is popular, it will be cached on an intermediate node. So now the “server” of that chunk has gone from 4-6 vaults to a ${probability} number of vaults - the probability depending on just how popular that chunk is. That also varies with time. One day a chunk may not be there, another it is. And then the next it’s not again.

On top of all that, think about churn. As nodes enter and exit the network, the same chunk of data is copied to multiple machines. It may be stored offline while a farmer is offline, but there are always 4 live chunks available somewhere. So when a chunk is reduplicated and stored on another vault, the attacker would then have to figure out which new one it went to before it can continue the analysis. That is if they don’t have to start it all over again.


Another (less convincing) part to this is that vaults are not serving specific content. They could be content for any number of applications. For instance, the original Silk Road servers were hacked. If they did this type of attack before hacking it, they could attempt to correlate the content that was put out by these particular servers.

Now what if the servers served several services?[1] They wouldn’t be able to tell which services were requested when - thus instilling plausible denyability. Also, since with the SAFE Network port numbers are randomized uniformly(?), there’s no way to say: “That came out of port 80 for a http request,” or “That came out of port 21, that must be an FTP request.” Data is just data. Nothing more, nothing less. Every single thing would have to be taken into account. Even checks from all of the other managers that are in charge of that particular vault.

What if the known servers only hosted part of the service? Then there’s a good chance that the known nodes would not be contacted for all of the services, and they would miss many correlations of that same service, just because it wasn’t being served by those known machines, even though the request from the client would be indistinguishable from one that would be eventually served by those servers. Enough of a chance, I think, to establish plausible deniability.


The basic idea is that an adversary who controls both the first (entry)
and last (exit) relay that Alice picks can modify the data flow at one
end of the circuit (“tag” it), and detect that modification at the other
end — thus bridging the circuit and confirming that it really is Alice
talking to Bob. This attack has some limitations compared to the above
attacks. First, it involves modifying data, which in most cases will
break the connection; so there’s a lot more risk that he’ll be noticed.
Second, the attack relies on the adversary actually controlling both
relays. The passive variants can be performed by an observer like an ISP
or a telco.

In general I’d say that these limitations hold, and are amplified to some extent. Especially with the difficulty of controlling, hell, even knowing which “relays” to control being orders of magnitude greater.



An interesting question here is: “If a successful GET triggers a reward to the farmer, how does the network know which farmer served up that data, and can that mechanism be exploited?” That is something that I have not researched yet. Anyone else care to expand upon that? If not I guess I’ll just keep digging (I got nothing but time anyways).



[1] How many services can a service server serve when a service server serves services? Several.

An interesting question here is: “If a successful GET triggers a reward to the farmer, how does the network know which farmer served up that data, and can that mechanism be exploited?”

The network knows because it picked one of the replica hosting farmers (vaults) to deliver.
Ways to exploit may be several, for example one can seed a lot of content and corelate his earnings with traffic observed on residents’ clients. As you watch a video, the number of gets may be the same as the number of MBs of sequentially downloaded content in given time frame on the watched client. This assumes no caching and more.

Again this too can be made harder by the client (download some random shit at the same time, etc) , and the same farmer won’t get all requests so it won’t be that easy.

This stuff concerns only a tiny minority (say 0.0001%) of users. There is virtually nothing that users cannot stealthily download today. Unless you are in Iran or N. Korea, why worry?
Governments cannot even bust “illegal” dark web sites on Tor (with 10-20K users), let alone find some “illegal” SAFE downloader hiding in a crowd of 5 million normal SAFE users.

By the time they get their act together and arrest first SAFE pedos, less illegal users will have to slightly improve their privacy to buy themselves another 2 years of worry free time. And does anyone expect SAFE devs will not keep improving the sw?

Does the network choose one, or is it the chunk that arrived at the client’s machine the fastest?

All vaults holding the chunk will get the request, the fastest to deliver has a chance to get the reward. I imagine the fastest will be selected by the client’s manager group, which tells the vault’s manager group that that vault won.

1 Like

Thanks @Seneca. I always seem to be fuzzy about the roles that the various Managers take on.

Hmm, David on several occasions emphasized that the speed will not be the main factor, because that would contribute to farmer centralization. @smacz I would suggest you to check David’s posts on this topic (I am writing from memory so I can’t provide specific links).

If it’s true that all vaults would deliver requests, that would be a disaster. Farmers already don’t get paid for PUTs, so this would mean more work (and precious egress bandwidth) for uncertain gain - cross that - certain loss (if you’re a home user who is fast, you can be pretty sure you’ll almost never be the fastest). With 4 replicas, the average farmer would have to serve 4 chunks to earn 1 coin; all those below average could conceivably store and serve data for free on a permanent basis. Some investment! (If what Seneca said were to be true. Which I think it isn’t.)

IIRC a combo score (“reputation”, which consists of age, uptime, performance, etc.) will play a very important role.

Edit: I found one of previous discussions on this topic here.

Also of interest to @Tonda for the OP…

There have been at least a couple of threads discussing these kinds of attacks (in 2014 I think), and at least one in relation to Tor (will SAFE be better than Tor? How do they compare? Kind of thing). I also think there is something in the wiki FAQ on that.

Anyway, the forum discussions have gone into some detail in terms of attack, network ability to evade them, and further measures that David anticipates to improve things even more.

I might have noted one or two links, but searching for posts about Tor and attacks/surveillance in @dirvine’s posts should find them. If not, DM me and I’ll check when I have access.

Vault’s own speed is only a (small) part of the overall speed, because the chunk has to be routed through the DHT before it arrives at the client.

It kinda baffles me that after so many discussions on this topic you still don’t get how the farming issuance algorithm works. It’s not static or linear like that at all.