I posted this subject already on Reddit. Here again, hoping to see some more people look into the subject.
One thing I really don’t get is the fact that the same file is encrypted using the same technique and still nobody’s able to link a file to an ip-adress.
Let’s say I join the network. I have 4 close peers based on XOR. When I ask the network for a file, what happends? Do I ask the 4 closest peers (let’s call them my group) for these files? Does any peer in my group receives my request for a file? And if yes… What about a corrupt peer who does exactly what the network expects it to do, but stores the ip-adresses and the chuncks that are requested from the nodes he’s connected to? The corrupt peer will store something like:
Now, because all the files are encrypted using the same set of rules, some corrupt peer may look the part up in an index and finds that my chunk with hash: aGjHkljeBjklHikHLjjlhHHjkljhLLL is part of a file called: LinuxMint-17-32bit.iso
So the corrupt peer didn’t behave weird in any way but it sniffed the traffic and found out that I downloaded a part of Linux Mint 17. How is the network protected against this sort of privacy attacks??
And what about populair websites on the network? Websites will become faster because when more people request them. So I’m in China, running Maidsafe and I’m visiting a website about the the “Tankman”. A lot of Chinese people in the network find out about that site so more request them. the Chinese government is sniffing the SAFE-network, it finds the site with the fanous picture of a man in front of a tank. They know the chuncks and hashes which are used. So they create 100 nodes, sniff all the data requests and find out about nodes in China requesting an site about Tankman.
Is ther any way to prevent this? Isn’t it true that using the same self-encryption can safe up to 90% data but creates a weak point in the SAFE-net at the same time??
Really would like to learn about the SAFE-net and how this can be prevented.
Moreover, every node has partial knowledge of the local information of its close nodes (neighbouring ID space). The information stored at every node, contributes to message passing infrastructure of routing. Exchanging information in routing typically involves traversing a number of intermediate nodes.
What index is that?
It seems to me they cannot arbitrarily lookup hash of any chunk on the network.
If you want to anonymize your access to data on the SAFE network, you should probably use Tor or similar technology, as well as private MaidSafe shares. I don’t think the network was designed to provide anonymous browsing of public data on the network, but I may be wrong.
They would need to get to a node close to each person they want to snoop on. Never mind the enormity of the hash list this is extremely difficult (and can be made near impossible). The only way to find an IP now is to get close to that node, to figure out which node then you need to guess the ID the person uses for getting data (hint its not a username ). Then you need to be one of the 4 nodes close to that ID (which may change) and so on. It gets very mad very fast, to the point where you may need to create network_population - 3 nodes to be sure of getting that person. It goes on more than this again, so it’s not an easy task. We need to keep checking though.
No reason people cannot use different Mpid Getter ID’s per session, then this gets crazy hard. Of course nobody is saying we have thought of 100% of the cases though, but so far so good.
I would say this would be an important question to answer. Perhaps @dirvine could enlighten us? Even if we can communicate and share files privately if our browsing of public data isn’t private… well the NSA would just LOVE that.
Unless somebody is in the 4 close nodes to you then this will be anonymous browsing The id to retrieve data is a hash and not toed to you. I will go over it more, but you have many identities on the net, your Public one is known to folk you have told and then leaked etc. all others are private, including browsing.
I’ll admit this is an unlikely security hole but it is a POSSIBLE secrity hole. Someone could set up a node in the local wifi neighbourhood and become the closest geographical node correct? Thereby alllowing them access to read neighbouring data.
No not really, the closeness is XOR and not geographical. It is possible, but would be extremely rare and also its very likely the node would not know who it was monitoring, unless it was geo close and crypto (XOR) close at the same time. I do think this is similar to it being possible to have a same bitcoin private key (although factor of network nodes and not a fixed number like 2^160) as somebody else. We can work out the maths here I think to perhaps make it more clear. It will be a difficulty based on network population and churn though.
Due to the de-duplication design, if an attacker wanted to identify people who retrieved a known document, the attacker would calculate the hashes for each chunk in the document. Any request for the document should involve a request for the hash of each chunk. If an attacker could map an IP to a GET request containing the hash of single chunk, the attacker could prove the IP requested the entire document. This is more likely to happen with public or shared documents, and could be an issue for those wishing to create “whistleblower” functionality on the network.
I think its also worth mentioning that nearly all messages between peers are encrypted, which includes GET requests for documents. I think some messages in the initial connection phase cannot be encrypted, but that should be the only time (I haven’t scoured the initiation code much at all). Encrypting all messages between peers prevents attackers sniffing on the edge network in routers, etc, like @Blindsite2k mentioned. So an attacker in that position cannot map GET <--> safe_id or a safe_id <--> IP since the messages are an opaque encrypted blob.
The next question is whether any safe node could map IP <--> GET request easily. Based on the design, the GET messages could be encrypted to the DMs, which means any intermediate nodes (including nodes with direct connections to the client making the request) would not see the GET request. Since the DM is also unlikely to have a direct connection to the client, it is unlikely to know the IP. Thus an attacker would have to control both peer of the targeted client and a DM of the file being requested. This would be harder to control, but the probability should increase as the file being stored increases in size (more chunks being stored). Unfortunately, the code appears to be sending GET requests directly to the closest match in its routing table, which means the DM is directly connected to the client, allowing it to map IP <--> GET. However, I’ve only groked a small portion of the code thus far, so I could’ve easily missed something.
This network should be able to provide the privacy and anonymization that TOR provides. More comparatively analysis will need to be done, so that issues TOR had to correct aren’t duplicated in this network though.
Of course, but that’d be futile and most likely unfeasible.
A simple “rule” for any privacy-supporting site would be to add or change a pixel or frame in every image and soon there would be no deduplication for any such content; this could be created as server-side image-magic/ffmpeg-powered plugin that would automatically inject randomness into all uploaded content.
Since we are aware of the technique and how it was used against certain users of certain “Web disk service”, but because of we all know that no informed person would do the same thing again.