How does SAFE protect whistleblowers?

The cost is not monitoring all routers though, there is no MiTM attack in MaidSafe. The cost will be monitoring all routers etc. plus creating a huge amount of nodes (we can calculate that figure I think to be 25% of all nodes, give or take). This would be to randomly get a node near an existing node (not a group as a single node is enough for this attack).

Interesting to look more at Tor and i2p, so the connection you first make to a relay node has to be trusted I think. Then i2p carries your ip address all the way along by the looks of that to. I may be wrong here, so happy to know more. If the first connection does not need to be trusted then we should dive deep into that and see why, it could be very beneficial for sure. I cannot see how right now, but very interested to see.

If store requests were encrypted in SAFE to the Data Managers then it would evade all of this from as far as I can see, but I need to think more about that. Then it would be possible to mask even stores for known data. This could take a few shapes like direct N + P messages to the DM nodes. Then if routed via nodes with random size parts of the data, it would become near impossible to detect. I have not thought too deeply, but this is the kind of thing we love in house. Its an issue, but I don’t see it as a huge issue so far. Could prove a really nice addition though as randomising MaidManager types to route N+P type data t data managers who could then assemble back to a chunk whilst having no knowledge of the uploader could be possible.

As we do not reward on PUTs for tokens then this is possible. Good one for the Attacks section of SystemDocs for sure. I will try and add it this weekend and we can analyse it during roll out. Could have ramifications on upload caps if done very securely though, interesting angle. These chunks could be more expensive so paid in an anonymous currency perhaps :slight_smile: Just an idea. Brainstorming here for sure.

The answer is no. It’s not safe enough for most whistleblowers at this time. It does depend on how sophisticated the attacker is. I don’t think it’s safe enough to protect someone from the NSA.

I don’t think it’s safe enough to protect someone like Snowden. I think we are years away (and lots of auditing) before you can make statements like that.

VPN would just give adversaries another step but they’d probably monitor all VPNs so that traffic to and from them can be intercepted. VPNs will not provide whistleblower level security.

If you’re talking about whistleblowers you have to consider the fact that any vulnerability in the chain could unravel the entire network. Encryption and chunking doesn’t do much good if the random number generator is compromised, if there is unknown bugs in the code, and so on.

In most cases if SAFE Network works as intended then over time it could become safe enough but I don’t think it alone would protect whistleblowers.

Whistleblowers need specialized apps on top of SAFE Network along with specific training. If they have the apps but use it wrong they’ll be identifiable and SAFE Network alone isn’t going to do anything except distribute and host the data.

SAFE Network would have to be ubiquitous. I just don’t see how you would be able to make SAFE Network secure if only several thousand people use it and all of them are under surveillance. There are so many possible attacks and so much attention on SAFE Network already that I don’t think it’s going to be any more secure than Tor or Freenet already is.

Now if there were secure gateways in and out of SAFE Network that might be a way to make it more secure but then you have to trust the operators of those gateways. So in theory yes SAFE Network is secure for storage and distribution should it work but it’s not a whistleblowers paradise.

I am security wary of many things, but every single product can only say they were safe, not are safe as you say. This means looking back is the only way to tell. So yes I agree to an extent, but does that not also infer that any change to any system would take years of auditing first (i.e. recent ssl in apple, recent i2c issue, recent ssh vulnerability etc.) ? We can only audit in the wild really so it needs to be in use then it needs to be constantly under attack and checked.

In these days my preference is lots and lots of people looking at code and measuring. The side effects of all tech, including long lived tech like truecrypt which were thought to be (and may have been) safe are a good example of my thoughts on this.

So yes lots of auditing for sure, test in wild for sure, lots of eyes on the code who understand it for sure (this to me has been a missing ingredient in many of the issues with secure products).

It is hard to prove security and we all agree, myself especially, but without attack vectors then it’s hard to prove insecurity as well. So when we get to beta I would be dead keen on all pods trying to attack from all angles (social as well) and see what we can measure. We won’t perhaps match the scale of agency money, but we can sure try and match the capability of their analysis if we get enough people involved. To me this includes the Tor/I2c folks as well if we get enough ground swell.

So I agree, but with caveats. I think your statement about a few thousand nodes not being enough is a very good start. The more ubiquitous then the more secure we may feel. The recent revelations have shook the entire industry and serve as a great warning to us all. I think though it equally serves as a great call to actions to improve the situation. So we start then grow and learn.

4 Likes

Don’t understand your mention of MiTM attacks

Im just saying that the capabilities, through fibre optic taps, taps in every ISP, means that governments can act as a global passive adversary for a relatively low cost.

Spawning a few thousand evil nodes also will not cost much.

I don’t know where you pulled the 25% figure from? Combined with more sophisticated attacks against your Kademlia based DHT, they would not need so many.

The fact that for every sensitive file, K chunks are needed to be stored / requested also makes the attack K times easier.

Im just trying to get the point across that anonymity is hard and that researchers spend years on it, please don’t expect to have solved it by magic. Having no servers is not enough

Just make it clear to your users that there are no anonymity guarantees, that is all!

1 Like

Hope you don’t see me as actively and defensively protecting an idea, I continually search for the truth and fact where possible. Sorry for delay also, I went out for my social life (2 hours in the pub once a week now :frowning: )

This point answers the first point. If in the middle (on lines and routers) and all that is seen is encrypted data then the MiTM attack subversion is applicable. So this is why I mention it. Replay attacks are also prevented by the accumulator in the vault nodes.

25% is easy enough, if you imagine nodes are equally distributed across a range and you want to be part of a group of 4 then 25% of all nodes would do that (if you ignore rank/churn etc which you need to, to be able to look at worst case). In terms of cost of 25% nodes, then its variable at the moment how much that will cost. It is relevant though in this specific attack
[edit] Really interested to hear of the kademlia based attacks, we actively analyse attacks so please post them here, PM me or post them in the systemdocs and we will definitely look over such attacks and see if they are relevant. This is hugely important to us all.

I agree with this, to an extent, but its out of context, if you are going through the same close nodes it won’t matter, if not then the distribution of data matters a lot.

We don’t expect to solve anything by magic and we know its hard :slight_smile: honestly this is not news.

There are no guarantees to be had, is AES256 secure, RSA?, is the speed of light really a constant? is there such a thing as exactly 1 meter? we all thought NIST was on our side, we figured the world was the centre of the universe and much more. I can guarantee very little especially as I do not believe in definites, either in positive or negative proofs.

Again, I am not actively defending something to say its perfect, the end of innovation or anything like that, far from it. I do believe this is a step change and very different in approach from many other experiments, After a few years in operation we will know better.

4 Likes

The more eyes there are auditing and the easier it is to read the code the better. It’s very difficult to determine exactly how secure something is in practice even if we believe it’s secure in theory.

I may be a bit paranoid about security so my views may not be common. When I thought about this question I thought is it something I’d be willing to use if my life were at risk? Do I trust the technology that much? If that is the standard then no.

But I do think it’s going to be safer than Dropbox and probably better than the ordinary web where any teenager can break into millions of accounts.

We can aim for making SAFE Network “reasonably secure”. That’s not absolute and it’s not definite. It’s just an attempt to be more secure than everything else out there.

2 Likes

Never something to be ashamed of, many of us are and its correct to be that way inclined I think. We all know deep down we have great algorithms and they may even be secure, but is there a little door open somewhere waiting for an exploit. History shows there nearly always is. I know we work very hard to engineer this properly, so many can understand the code (easily) and can audit it. I belive code complexity is as bad as badly implemented algorithms and way more dangerous at times. We know for sure there will be bugs (we run every tool we can againt the code though and it is very clean so far), but we need to find them.

Stay paranoid, we need that around us for sure!

2 Likes

I for one am really excited by this project, a decentralised platform upon which we can construct a new generation of privacy preserving apps and services. It will be revolutionary.

It does not need to aim to replace Tor or I2P in order to be hugely important, so here I was just being a little pedantic about it, purely because the whistleblower use case was mentioned in the blurb :smile:

Over the past year I have seen a few groups of people attempt to solve the decentralized storage problem, but it seems clear to me that your team are on to a winner with MaidSafe, I can’t wait to try it out!

1 Like

I agree with this, to an extent, but its out of context, if you are
going through the same close nodes it won’t matter, if not then the
distribution of data matters a lot.

My understanding (maybe flawed!) was that by splitting up a document into K chunks, where each is encrypted (and its hash is randomly distributed), would mean that they would be fairly uniformly distributed across the network.

This would give an attacker K opportunities, instead of just one, to try and spot the classified document being stored, since any chunk would give the game a way to an evil node.

So if there are 10 chunks for the sensitive file, then the attacker would not need to control 25% in order to be reasonably sure that en evil node sees at least one chunk being stored, it would be a lot less i.e. 2.5%

This is true but as I noted earlier it doesn’t amount to an attack because knowing your node received a chunk from a particular file doesn’t reveal where it came from or who stored it.

If you have several chunks, that may reveal something (?not sure about that one!) but again not the identity of the uploader (e.g. IP of the node).

My point in saying this is that if you believe this to be a weakness, I’d like to hear the rest of the attack. What you’ve stated is true, but not an attack as far as I can tell.

Don’t take any of this as criticism (I know I come across like that sometimes), I’m really keen for us to come up with as many attack scenarios as we can and test our ideas and code against them. So please, full steam ahead! :slight_smile:

This is bad, for you and Project Safe. It is not enough for most humans, and I think you are human ergo it is not enough. You have my permission :wink: to spend more leisure time and not all of it in the pub, and I hope the community’s permission ;-), and maybe even your team’s.

I hope you give them more leisure time than that. At least one of them has a life, I know, he went on a canal boat recently :-). Hope that was fun BTW, didn’t hear of any sinkings anyway.

I think I have a fundermental misunderstanding about how the network works! This is how I see it…

So this Whistleblower guy has split up the super-sensitive-file into K chunks. No other node has them yet.

He wants to STORE the chunks onto the network so he must find and connect with appropriate nodes to do so.

Any node that he connects with and sends the chunk to, can d0x him.

The very fact that he has to perform K store operations to upload a single document, means that there are K opportunities for an attacker to d0x him.

Yes, here’s the misunderstanding. I can’t explain how it actually works, but I know it is not this, and I know that it is not possible for a node to know where a chunk came from.

What I can say is that the routing is not from uploader to storer. The address of a chunk is equal to its hash in xor space, and using magic (to me) called DHT’s (distributed hash tables) and Kademlia, a chunk moves through N nodes until it arrives at its address, where a manager node ensure’s copies are stored on four vault nodes (which can be geographically anywere).

You’ll need to dig if you want to understand this, or I recommend David Irvine’s presentation at Seattle Conference on Scalability in 2008 (but still largely correct). It is an hour long, but I found it a great intro to the main features.

Thanks for the link!

I understand Kademlia fairly well. But you are now suggesting that your system does something significantly different, almost magical, which is bizarre, and not mentioned in your paper that I can see.

You must understand that the whistleblower HAS to send the chunk to some other node at some point?

Otherwise his document will just sit on his computer and not be shared!

You also must understand that as soon as he sends the chunk via RUDP he reveals his IP to the recipient.

What you might be suggesting is that the whistleblower does not upload direct to the Storing node, and the chunk is passed on over a few hops to the final destination ? A mixnet of sorts?

This is true @willish nobody stores direct but via intermediaries. The IP address is scrubbed in hop 1. This is fine, but there is still an issues to solve. This is the nodes the person does actually connect to. So we have a window of opportunity for an attack and this is what we need to look deeper at. I think we can subvert this one though, but at the moment the window would exist. I think if we wanted to we can close that window though via a scatter/gather approach of uploading all files.

Thanks for looking so far, keep t it and we will close as many windows as possible no matter how small.

2 Likes

Yes, that’s right, so the storing node and the uploading node have no way to identify each other. The magic is I think in the use of the chunk’s hash as an address. This way the chunk can join the network from anywhere and end up at the right address. Only at the first “hop” is the uploader’s IP address exposed, after that the chunk is clean so to speak.

1 Like

Cool thanks for clearing up my misunderstanding. Please excuse my spammy posts, im just genuinely psyched by this project.

So we have a way to kind of ‘launder’ the chunk, the IP is scrubbed in hop-1, this is great.

Does the node in hop-1, who is being contacted directly by the whistleblower, know that it is hop-1 and not hop-X?

i.e. is there a kind of plausible deniability? or can the hop-1 node always d0x the whistleblower?

Hop-1 would I think know it is hop-1 because it scrubs the IP address of the sender. If it knows what the chunk is (through a plaintext attack), and it knows the IP of the machine that uploaded it, that’s pretty definitive (although it doesn’t yet know the identity of the uploader), but as @dirvine mentioned, we should look into ways of preventing this attack.

Man I just cant let this go.

Ive spent a little bit of time reading through your docs and specifically the MaidSafe-dht paper.

What I want to make clear is that I am only talking about the ‘router’ level. So lets not talk about vaults and what not, just what happens at the MaidSafe-dht level, i.e.

  • STORE
  • FIND_NODE
  • FIND_VALUE
  • GIVE_VALUE

RPC commands, issued over RUDP.

The DHT is loosely based on Kademlia with the most significant changes being the introduction of ‘managed-connections’. This allows the network to react to churn far quicker, and also allows you to optimise some parameters such as the replication factor.

Another important change, as you guys eluded to earlier, is the modification from iterative lookups to recursive ones.

Instead of Bob sending a FIND command to one node at a time and getting the answer straight back, here Bob asks Alice, who asks Claire, who asks Dave, etc.

The result of the recursive FIND_VALUE or FIND_NODE is that Bob will receive a contact tuple.

If Bob wants to STORE, he will perform a FIND_NODE followed by a STORE, using the contact tuple he previously received.

I suggest that every time he issues a STORE, FIND_VALUE or FIND_NODE, he is compromising his identity. Since the node he is communicating directly with knows his IP address and also the information he is holding (STORE) or likely looking to store (FIND_NODE)

Attackers will be on the lookout for any IP address that issues rpc commands that reference blacklisted chunk hashes. It is trivial to create such a blacklist from the plaintext which is published on MaidSafe.

Since a large sensitive document (1gb) is split into 1000 chunks, the whistleblower will need to send 1000 separate STORE messages. If any of these messages is seen by an evil node, he is toast. Likewise for FIND_NODE

Another important but separate question is how you deal with malicious nodes who are trying to subvert your FIND_NODE process? Do you have a way to detect such bad eggs?

Kademlia does not. And so attacks can be made easier since not so many evil nodes are needed, lookups are just redirected to colluding, evil, nodes.

An interesting paper - Octopus: A Secure and Anonymous DHT Lookup

1 Like