At the moment every message is firewalled (prevent repeats). De ranking will be done where nodes deliver incorrect data ( we check hashes and signatures), the rules will grow here a lot after launch, the key is simplicity though and that’s hard as its easy to think of a rule, but its incredibly amazing how quick fix rules blow up the system (or can). This is a real area where fundamental types have to be rigorously gone over to ensure no fast rules are ever implemented (many folks quickly come up with a rule and the side effects manifest in really invisible and bad ways).
Thanks, David. Where can I read information on what happens when a file or safecoin address is legitimately overwhelmed? I can’t remember where I read it, but additional copies are made to be able to handle the requests. But in cases like safecoin addresses, which are maintained by transaction managers, how many can be added? And how many of them need consensus on the balance of an address? How do they deal with high traffic?
No worries, if what you are after is not in the system docs then we will update them to make sure it is.
I will look forward to it. Meanwhile, would you mind a 3-sentence explanation? Just in layman’s terms, what happens when a file or safecoin address becomes overwhelmed with retrievals/transactions?
It does not get overwhelmed, the xor space means as nodes vanish there is another right there. (poking finger in the sea again). So to overwhelm a node you need to be connected (unlikely) and if you are then you use all your bw to try and dos whilst another 31 nodes try to speak to you, if you get one off another takes its place and you need to speak to it. You are surrounded by a different (somewhat) bunch of nodes so then you are in trouble.
In terms of ddos then you need to ddos the entire network really and you are not connected to that so need to hop across nodes who will filter you etc.
This is all xor networking etc. like trying to bring down bittorrents DHT, but with an awful lot more security in mind and no nodes accepting connections from nodes not near it or who improve the routing table. If oyu check my blog I do try and expalin some of this. More to follow when I am finished with routing_v2 which will show people very clearly what can be done with a secured DTH and why even that is not enough, but eons better than what exists.
Amazing, thanks for the explanation. That does indeed sound like a very good solution. I’ll definitely be doing more research on xor networking!
Additional question about xor networking: according to the lecture posted by @ioptio, each chunk is stored on the node(s) most closely matching the hash of the chunk. If that is the case, it seems that all four copies of a chunk would be stored on machines that are close to each other, possibly having the same close group of data holder managers. Surely that can’t be right, so what is the piece of information I’m missing here? How is the node on which a chunk should be stored determined?
Thanks in advance.
Actually, I think I just figured it out. Correct me if I’m wrong here:
A chunk is NOT stored on nodes most closely matching the chunk’s hash. Instead, it is passed on to data managers whose IDs are closest to the chunk. They then distribute each chunk more or less randomly on the network (taking in consideration vault ranks), and it’s their job to save the exact node ids that store their chunks. So all requests for a particular chunk will always go through the same data manager group, as they are the only ones that know the location of that chunk.
This is quite correct. But when it comes to closeness take in mind we’re talking about closeness based on bitwise XOR. So, you connect to some ip-adresses, then you get your personal file with the data-atlas inside. This is a personal file only you can decrypt. In that file there’s your routingtable with say 64 node-id’s. You connect to the four closest nodes (probably will be a bigger number with routing V2) based on XOR. So you might be in the US, your four closest nodes can be in Germany, Japan, the UK and Belgium. This has nothing to do with geographical places, it’s closeness based on some math. All the nodes and vault have an ID based on XOR.
So, if you wanna store a chunk a lot of things happen. Your balance is checked, etc. But when you have enough money, and you pass a chunk to your client mangers (again, your four closest nodes) they will just forward it for you. A chunk with a name like 5555 should be stored most ideally at node 5555, but with addresses with something like 256 numbers, the change is quite low that that node will exist. So each node receiving a chunk will pas it through to a node that’s closer to that address then he himself is. Finally in a couple of steps, a node will find that no other node it knows of is gonna be closer to 5555 than he is. At that moment he will store the chunk. Because XOR is indeed quite random (although it is very precise calculated) the chunk can go around the world 2 times before it finds the right vault. When you request the chunk again, you ask your four closest nodes for a chunk stored at address 5555. They will ask their closest nodes to 5555 for the chunk and the these closest nodes will do the same. Again, in a number of steps the right vault is contacted and the chunk will come your way. The chunk is protected by a number of managers and stored on different vaults. So if one vault goes down, the managers will make sure that the file is copied to another machine.
Either you oversimplified the nodes, or I’m now misunderstanding something.
you pass a chunk to your data mangers (again, your four closest nodes)
You mean your client managers?
A chunk with a name like 5555 should be stored most ideally at node 5555
By “stored”, you actually mean “managed” by data managers, correct? So a chunk with a name like 5555 will be stored BY data manager 5555. He will send to a random node (with sufficient rank), which could be anything, like node 3137. But he won’t send it to node 3137, but 3137’s data holder managers, who will then pass it on to 3137. (However, 3137 is first chosen to hold the chunk, and only then this node’s data holder managers will be located. Is that correct?)
They will ask their closest nodes to 5555 for the chunk and the these closest nodes will do the same.
Again, they will ask data managers closest to 5555, not data holders closest to 5555, correct?
Again, in a number of steps the right vault is contacted and the chunk will come your way.
What happens if a few years have gone by since you stored the chunks, and now there are a lot more machines that are even closer to 5555 than there were in the past? It could be that some of the old machines are still up, and compared to other nodes, their IDs aren’t even close to 5555 anymore (but they were in the past). How long will the data managers continue to ping each other to find the node holding your chunk?
Any change to any group at any time kicks of account transfer info (i.e. node goes off line). So you will see a lot of work happening in us trying to reduce the traffic costs there. The faster we can transfer the faster churn we can handle and there for smaller devices can contribute effectively.
Yes you’re right. I meant client-managers.
Client manager Vaults receives the chunks of self encrypted data from the user’s Vault.
These Vaults manage the chunks of data from the Client manager Vaults. They also monitor the status of the SAFE Network.
Data holder Vaults are used to hold the chunks of data.
Data holder managers
Data holder managers monitor the Data holder Vaults. They report to the Data manager if any of the chunks are corrupted or changed. They also report when a Data holder has gone offline.
The Vault manager keeps the software updated and the Vault running; restarting it on any error.
The Transaction manager helps to manage safecoin tranfers.
What I got by reading the systemdocs etc. is that if you run Maidsafe, a vault is created. Probably even more vaults are created, something like 16 of them each with their own XOR-address. As far as I know, if the address of one of the vaults is something like 5555, a chunk with the name 5555 will be stored in that vault (when the ranking is OK). The dataholder managers will make sure your vault is online and storing the chunk. They will ask for proof of resource. Next to your vault, the chunk is stored at at least 3 other vaults as well. So I think not a random place is chosen by the data managers but the chunk is actually safed on the vault with the closest address. To be 100% sure about that maybe @dirvine can reply about that part of the network.
Thanks for the clarification. I’m assuming the rest of my comment was correct?
So hypothetically (very unlikely, but just to make sure I understand), if all four data managers you used haven’t been shut off for a period of years and the network has grown substantially, then it would be possible not to retrieve your chunk again? If there are many newer data managers with closer IDs than the original group you used, then when requesting the original chunk, will the new data managers give up finding the original ones, since they’re not even close anymore?
I think you’re incorrect. Check out this video:
It explains that the chunk with the name 5555 will be managed by data managers with that ID, and those managers will select (random?) vaults around the network to store them. Would be great for @dirvine to clarify. And @dirvine, I apologize for bombarding you with questions, but I have to
Yes the closest nodes are the data managers (also cached from that point as well). They then manage the presence and integrity of the data via the vault managers (pmid managers) who can penalise any data loses or corruption. So vault managers manage vaults and can be sure of data requests and integrity checks from the group near the data key (data managers). It all feels complex, but it does eventually tie up.
don’t worry I get pummelled with questions anyway. I am kinda used to it now
Any churn changes the groups of managers around the churn event continually. The beauty is they are a small part of a larger network and the whole network is balanced by the account transfers and relocation of the group near the churn event (new nodes in a group cause churn by pushing old nodes out). That is OK as the pushed out nodes get pushed nearer other data and managers to manage. This is the kinda multi directional or dimensional thinking that is the vault network, It is a huge spaghetti bowl of (incredibly simple) rules that all interact in xor space in a really cool way creating sophistication from apparent mayhem and apparent confusion if looked at linearly.
Thank you, David, I finally feel like it’s all coming together now! Today and yesterday have been filled with the glorious “aha” moments.
Thanx for asking, glad I’m not the only one responsible for bombarding David and the others To me it’s still not clear how the data managers decide where to put the chunks (which dataholders are chosen?). If there’s a chunk in my vault, the closest nodes are datamanagers and are responsible for checking if my vault still has the chunk and if I’m online. But when I store a file myself on the network, it will goes like this:
The self encrypted chunks will go to the clientmanager. Form there they go to the datamanager and they’ll store it at least at 3 other dataholders. Not sure how the dataholders are chosen.
At the moment it is a random connected node in your routing table. So each node does a store and the vault managers then send back stored to the group of data managers. Then the chunk is considered stored. If there is less than 3 copies another 4 are made (one each) so we can end up with 6 copies of data (and do a lot) so number of copies can be from 2-6 and offline copies (16). So Data managers can make a decision themselves that can only be agreed by the return message which goes to the group and consensus is achieved.
So when we say 4 copies it can be 2-6 and 16 off line. It’s just easier to say 4