Datachains for dummies

I am trying to get a good understanding of datachains, and even though I have been reading a lot I still have noob questions. I would appreciate a lot if someone could help me understand.

  1. Where is the datachain hosted ? Is it public like the blockchain ? Is there just one datachain hosted on the safenetwork, or is it duplicated so that each close group maintains a duplication of the datachain ? My understanding is the datachain is scattered amongst groups within the network, and that nodes know part of the chain and validate it but nobody except the network itself has a clear picture of what the datachain as a whole is. Am I correct ?

  2. Datachain is about data identifier not data itself. I think I understand this. Still, if the network grows a lot, how do we make sure the datachain does not grow so big that it becomes difficult to handle and takes a lot of resources or weight on performances. How does datachain scale compared to blockchain ? I read david’s answer that some old block of datachain can be deleted, but what if some data identifier within these blocks needs to be kept alive because they are still relevant ? do we then need to keep the full block ? the datachain could become a very very complex thing to maintain.

  3. The concept of nodes of groups lost me. A group can be made of up to 32 computers. How many groups does it take to make a node ? Will some nodes only have the task of maintaining the identifiers and not the data ? in that case, these nodes would be crucial to maintaining the network consistancy and attacking them could have more severe impact than just data loss ?

  4. The attack we had with alpha1 is easy to prevent in beta, and spamming the network won’t be possible as we add friction through safecoin implementation. Attacking the network by providing massive amount of nodes and then when they get a good rank kill them all, seems much more dangerous to me. Is the node rank public ?

  5. On a more general note, I have been around for 2-3 months so there are things I am missing. My question is how come something that seems so crucial to the safenetwork as datachain, something that sets apart SAFE so distinctively and is at the heart of the network, how come it is only developed and tested now ? Datachains algos couldn’t have been built before ?

Please really understand that my tone is not aggressive at all. I am just trying to understand and catch up with all of you.


Here is the original blog post which links in the crate and initial proof of concept for data. I think this should answer your questions 1-4. In terms of 5, it’s more like an evolution, we did for a while have restart-able nodes which held their addresses, this was deemed insecure and prone to a human attack (advertise your ID on a web site to share with others etc.). We moved a long time back to not have persistent vault ID’s for security and using this post we realised a lot of the underlying rules more clearly. We could have done some complex code on top of an already hard to understand code base, but used the language of the network to better understand the security.

This was along journey and as you can see data chains was over a year ago with initial code on 11th May 2016, but a lot of things had to happen to the code base first and we also are on the iterative roll out. So it was not that it was only recently considered, it really has been able to be done with a network that did not yet exist but one that would in a few months (a year).

Hope this helps.