Data chains: what? why? how?


RFC: Data Chains (Datachain)
Pre-Dev-Update Thread! Yay! :D
SAFE Datachain (work in progress)

I wonder if there will be an equivalent of a block explorer?? But anonymous unless made public by those who choose? Just curious


Certainly possible for blocks that are kept (so not safecoin) or indexing also becomes a possibility over time as well. These blocks though do not necessarily hold the bits you want which is the public data map.

For SD ledger types (new, that I need to complete RFC for, but DataChains can handle this easily) then certainly a block explorer could be built, so for things like real estate transactions, government spend, budgets, speeches etc.

An SD ledger type is basically versioned everything. So will work like a normal SD, so Put on network, however with the ledger bit set, it will not delete. To update it you cannot Post it requires a Put (as you are adding data). This Put is a normal put except it’s not SD version 0, it’s an update. So these immutable updates are paid for each time.

Normal SD though is removed form disk and the chain on a new version. Otherwise it would cost too much to maintain and would be a security hole, potentially for things like safecoin etc.

Stake-To-Farm (STF) protecting the network
Is SAFE really that big of a deal?

It sounds like a great addition to the software. If it solves the problem of data persisting between reboots, disconnects, without corruption/disputes etc, then it would make the network far more robust.


I can see a problem: how do you get it to scale in proportion to the amount of data while still including all the data? Ideally, you would have an algorithm that can somehow “walk” through every piece of data once and only once (or twice and only twice, if you want that level of redundancy).


That’s the point you do not need to know all the data, each group looks after it’s data and they climb up the binary tree. You find data via the DHT, so it’s all that simple really. Think as it is now, but secured and re-publishable.


OK, so there’s another (perhaps naively conceived) problem, of recursion: isn’t the chain itself data? So what if network damage loses chain data. Would you need a chain of the chain data?


It would revert back to all previous verifiable chains wouldn’t it?


This would require loosing a whole group, those are geographically distributed. Group size should be chosen to ensure no churn between refresh looses all nodes, our refresh is seconds as we are directly connected. So this is extremely unlikely to happen, however if it did (and we are in infeasible land here) then they can only lose a small part of the chain, however, again they restart and can republish.

So a lose would take an infeasible situation followed by never starting again. I suppose if that happened data is probably our last concern :wink:


Ah, it’s kinda-sorta like RAID striping for a group, with each vault a disk.

EDIT: Or, closer to the mark, like git version trees.


Yes, that’s it. It gets better the deeper you go. So say a group looses members, then it falls back down the chain (each branch is a binary branch in a binary tree). So you have

group A (left)                 group B (right) 
      |                                         |
      |                                         |
                 Checkpoint (last link seen by both groups before split. 

If, say A loses nodes less than group level, they are all connected to B so B see’s A has lost consensus.

Both groups A + B now fall back to consensus point (the checkpoint) and republish their data. So form group C that is a group waiting on nodes that can again form group A.

So the chain gets smaller and fatter at this checkpoint creating a larger group. B can split but won,t until A has enough members. Then they do split and grow their own part of the chain. In a growing network this process continues.

It’s the ability for any node in A to still exist and republish the whole chain again. Even if B did not exist an a member from A alone restarts, they can send the chain to any network nodes and those nodes can agree the data is valid (check from genesis) and as a group, republish on it’s behalf, or the node can republish itself etc.

This shows an important issue, the chain is double ended and each end validatable itself. So a node can start join a group it was previously in and the nodes can confirm the chain from the top as the majority of them are in that chain, or anyone can confirm from genesis.

Go a tiny bit deeper, if each node now gets all checkpoint links, they can accept any chain from that point forward. So a node can say I am in group X and send the linkChain (data chain links only) and the recipient can be guaranteed that node is valid on the network and is cryptographically certain to be in group X.

There is a lot more to this though, an awful lot more.


Which is why we’re all here :slight_smile:


It is doubtful the full capability of such a system can be quantified easily and certainly not in a single blog post like this, but now it’s time to imagine what we can do?

Congratulations David, Data Chains are another big breakthrough for the project.

Now we get our own chain that could serve as a bridge of understanding to the world of Bitcoin, we didn’t have that via the consensus mechanism.


Very impressive and congrats on another breakthrough in development it seems! Will datachains be implemented before the alpha network is released or later?


Yes during Alpha phases I think.


As I noted long ago, with SAFE the network is its own blockchain; I didn’t realize at the time that, with datachains, that would be almost literally true.


I’ve just read the blog post… So actually we’re speaking of factom on steroids here but with a split ledger and not a linear one like bitcoin.

Data Chains seem to have many many benefits as outlined by @dirvine … 2 Cool !
Is it mutable or immutable (I assume immutable? // or mutable by the creator(s) multisig style?)


By design and may be validated from a genesis (the network first node) or the current close group. So 2 way validity of an immutable ledger that can hold any data or transactions. It has a lot of nice features for sure.


What are the differences between a blockchain and data chains?


A DataChain is a secured list of “stuff” that is secured via group consensus. A blockchain is a secured list of transactions (with some extra fields that can be used for squeezing small stuff in).

BlockChains are secured via proof of work and DataChain is secured decentralised in groups that form more a tree like structure (a binary tree).

The DataChain is designed for data and blockchains designed for transactions. Transactions are small and many can fit in a chain. DataChains are larger, but split up across the network. As DataChains are designed for Data, they specifically hold data. the data can represent transactions when the data in question is structured data.

In SAFE the data is secured on the network and recorded in DataChain, these are both secured, the data and the identifier.

So similarities are there for sure. If you record data (hashes) etc. in a blockchain and that chain is trimmed or improved for reduced storage etc. (which is very valid) then those transactions holding the hashes etc. can be removed (as they are not required to be maintained for a historical view of transactions) then the hashes (which are not enough for data identification) may be removed.

So really horses for courses. Data -> DataChains. Transaction in a currency and POW -> blockchain.

Hope that helps.