Along the same lines as OP, here are some thoughts I had to better get my head around the concept:
There are really two fundamental data structures defined in the Datachain proposals.
- The structure used to track and validate section membership. I'll call this the section-chain.
- The structure that tracks and validates information about the data stored in the network. I'll call this the info-chain.
These structures perform very different functions and have very different requirements. The current discussions (option A and B) about datachains are discussing primarily the section-chain.
This is the root of trust for the entire network. It defines the nodes that make up a section at any given moment in time. A more accurate name for the entire structure would be "section-graph" . It is a complex network of inter-related blocks, which define section membership events in the system. However, if you take any given leaf block in the graph , you can create a "section-chain" by following its link to its parent block all the way back to the genesis block.
You can validate a given block is valid using only its section-chain. You don't need the entire graph. As a result, any node that needs to validate the membership of a section  only needs the relatively small section-chain for that section. No node ever needs the entire section-graph, only the section-chains of the nodes it communicates with. 
In other words, this structure can scale in ways that a classic block-chain cannot.
There is a lot of subtlety here, but the basic idea is that scaling anything of this type (datachains, blockchains, etc) is always a tradeoff between security and performance. Bitcoin uses 100% of the security available for every piece of data in the blockchain. As a result, scaling above a certain limit is effectively impossible. This is why you hear talk about "off chain" transactions.
With datachains, the network dynamically distributes security evenly across all sections. As the network grows, the total available security and total available performance grow with it. The sections split to maintain a constant level of security and performance in each section. 
This structure protects the integrity of some amount of data. It hasn't been completely specified, but it is probably an actual chain (a one to one relationship from parent to child). It defines the state of the data at any given point in time.
An info-chain references a section-chain throughout the chain. A section-chain has no knowledge of info-chains and will never reference one.
The info-chain uses the section chain as validation. Its proof of validity is directly tied to the validity of the section-chain it references. When an info-chain references a section-chain, it is using it as proof that the nodes signing the info-chain block have the authority to validate and sign that block.
So, you can think of the info-chain as riding on top of the section-chain. The section-chain is a lower level piece of infrastructure that the info-chain is depending upon.
In order to validate a block in the data chain, you only need the latest subset of that chain, back to where it references a validated section-chain block. Only nodes involved in managing a piece of data need the info-chain for that data. This will probably be a subset of nodes within a section.
So, therefore, info-chains can scale to a greater degree than even section-chains. Only a small set of nodes need to track and manage any given info-chain because it is using the more widely distributed section-chain for its root of trust.
The subtile genius of the system is that you can scale the security of the data on an info-chain by info-chain basis. An info-chain protecting safecoin may be more secure, by requiring more signatures, than an info-chain protecting the integrity of some random mutable data. So we get to choose the security/performance tradeoff for every piece of data managed in the network.
 The technical term is a DAG. You can also think of it as a Tree, if it helps you grok what is going on, but this isn't strictly accurate.
 A leaf is any block that is not the parent of any other block. The latest, most current block in a chain.
 This is probably only nodes directly connected to a section (as dictated by the routing layer).
 This is complicated a bit. There are optimizations that may be desired, for instance to allow us throwing away very old blocks. These optimizations may need to see a larger part of the section-graph, or even all of it.
 This is really only true of very large mature networks where performance and security both grow proportional to network size.