SAFE Network Explained: Architecture

To address an ongoing demand for an overview document of the SAFE network, I have written what I believe to be a reasonably succinct summary of the architecture of the network. Sure it’s a complex system, but I believe it can be explained in a way most people can understand. Here’s my attempt at doing that:

https://safe-network-explained.github.io/architecture

The focus is on conceptual understanding rather than technical exactitude. There’s no mathematics or code in the document.

I hope the document allows readers to grasp the possibility and implications of the network. I want them to think ‘huh, I guess that should work pretty well’.

The document is designed to be similar in scope to Satoshi’s Bitcoin whitepaper (but this definitely isn’t a whitepaper). It’s about 3460 words (the bitcoin whitepaper is about 3490) and aims at about the same balance between concepts vs implementation. The target audience is intended to have a similar degree of prior knowledge as those who might read the bitcoin whitepaper.

There are two main sections - Client Operations and Network Operations. This should give an overview of the features relevant for the purposes of both end users and vault operators.

It’s a work in progress and suggestions are most welcome. To reiterate, I’ve mainly tried to balance conceptual understanding against implementation detail. Please let me know what you think.

68 Likes

Really nice Job @mav, a lot of work here which will hopefully be utilised as the community help us improve network documentation. Maybe referring to the network as data and communications as opposed to storage and retrieval?

I’ve just skimmed the first couple of pages, but will read in more detail today. Thanks again!

10 Likes

Wow, this is great! I’ve also only scimmed it yet but looks good so far. I may suggest a small additional section about the difference to “similar” projects (other distributed storage systems and block vs. data chains) might be helpful. Forgive me, if I just missed it. :wink:

2 Likes

I have not posted in a long while (though I read everyday), but this goose-bumps-inducing-document put my fingers in auto-mode to address a big thank you @mav. I found myself with the same feelings I experienced when first learning about the existence of safe :joy:

Concise and precise, simple yet thorough in its explanation of each concept (at least according to my perfectible understanding), I think this should go right on top of the doc list proposed by @polpolrene; it could be the missing piece showcasing to technical people how the concepts on which this project is based are rock solid, and at the same time revolutionary.

11 Likes

Wow! Great work! Just read through over breakfast and will be forwarding to a few other people who are I interested in (but overwhelmed by) the technology.

This is a really good overview of the architecture from 1000ft. Enough details to see the concept has been thought through, but not too much that you get stuck in the weeds.

One suggestion I would have, would be to compare to current infrastructure and platforms out there, in a similar way to the blockchain references. People know what infrastructure as a service and platform as a service are (or sound like) and it may help to explain ‘what’ safe net is. E.g. there is a platform API and the self healing, self scaling is self managing infrastructure. Which is all very lovely, ofc! :slight_smile:

6 Likes

Thanks @mav this is superb and valuable. As I’m reading I have a few queries - I don’t know the detail so you no doubt have this correct but I want to ask just to get confirmation before committing your descriptions to memory.

Oh, and the odd suggestion too :slight_smile:

Intro

  • How about “data storage and communications network” in the first sentence (‘retrieval’ seems redundant).
  • Network tokens are distributed to vault operators by the network for providing these resources. How about Network tokens called Safecoin…?

Self Encryption

  • The datamap also acts as an encryption key for the chunks it refers to. Sounds very neat, but is this right, or does the data map include / hold the key?
  • … the file may be encrypted by the client before being uploaded using the encryption option built-in to the client software. I thought all files were encrypted, but that private files were made private by encrypting the data map, and public files shared by sharing the unencrypted data map.

Resource Identifies

I’m a bit vague about what qualifies as a resource identifier and what a resource is. I’m thinking that they are an address of something, such as a data map (eg for an immutable data item /file) but maybe not always. Could you explain that a bit more here?

Mutable Data

David (on the forum - I can find if you need it) has outlined how smart contracts can be implemented with existing MD, so maybe add this to the use cases with a reference or short explanation?

Vault Naming

  • All vaults are allocated a random unique 256 bit identifier by the network upon joining. Maybe add … or rejoining.

Churn

Maybe also mention natural churn, as vaults are switched off by their owners, or become cut off due to connection problems etc, and point out that this assists security as you’ve explained, but that data are still preserved.

I hope some of that helps, and thank you again. Great to have this level of explanation all in one place. Lots of work obviously, comprehensive clear, and very well written - really well done IMO. :clap:

13 Likes

This read so well, way to go @mav this is great

5 Likes

Just brilliant! Well done @mav, great work. It reads very well and if a non-technical person like me can understand it all then you’ve done a fantastic job of keeping it clear and simple. You’re a huge asset to this community dude.

'Nuff respect.

11 Likes

That must have been a lot of work :+1:
2 minors things that could be corrected:

  • enpoints → endpoints
  • Is this sentence correct?:

The identifier for these resources are SHA3-256 hashes of on the resource content.

EDIT: to be clear, I suspect the ‘on’ in the sentence above must removed (nitpicking, I know).

5 Likes

Yes the sha3_256 hash of the content == name (of Immutable Data). for mutable we can name it many ways though.

5 Likes

Very complete description of safe network!!!

A few details to be corrected though:

  • What you call group is now called section. See the change of terminology in MaidSafe Dev Update – December 6, 2016. A group is now a subset of a section of exactly 8 nodes. It contains nodes that are the closest ones to a specific address.

  • What you call section is in fact called prefix (this is what you defined as “the leading bits of their identifier”)

  • Section size is >= 8 but is not necessarily <= 16 because a section is split in two only if both halves are greater or equal than 11. Meaning that it splits when it reaches 22 if it is well balanced, but can grow above 22 if it is unbalanced. The margin (11 – 8 = 3) is a hysteresis factor added to avoid a merge quickly after a split.

  • Permissions are not associated with specific keys but are applicable to all keys of a Mutable Data.

14 Likes

Amazing that you both picked up on a point that I really labored on while writing. For me, ‘data and communications’ seemed too broad to capture the readers interest, so using ‘storage and retrieval’ seemed focused and precise. Having seen this feedback I think it’s best changed to ‘data storage and communications’.

This is a good idea, but I think the document becomes too lengthy. I also think talking about what’s broken in existing systems doesn’t actually help in understanding the safe network in the context of this particular document.

Good one, I’ve updated this.

Datamap is defined in maidsafe/self_encryption as “Holds the information that is required to recover the content of the encrypted file.” The data it contains is just the list of chunks. There’s no indication of encryption for private data (although from memory this is not yet implemented).

Each datamap chunk “Holds pre- and post-encryption hashes” (L20), implying the chunk is what is encrypted (via self-encryption, not to differentiate public / private data).

So I’m not totally clear on the specific detail and would benefit from further clarification from maidsafe devs about how public / private data differs (preferably a link to an existing document).

I’d like the explanation in the document for private / public data to be a little clearer.

Yes I feel there’s not enough clarity in my use of the terms chunks vs resources vs files etc. I’m pretty sure the usage is consistent, but it’s not clear. I’ll try to improve it.

Yes, smart contracts are briefly mentioned in the Messaging section of the document, but I think it could benefit from a little more detail. I’ll see about adding some more info.

Good idea. I’ve added this.

Good catch on the typo. Fixed.

One detail I’m not clear about is how mutable data names are defined. The Mutable Data RFC doesn’t seem to specify this. Can anyone link to a document or code that clarifies how mutable data names are determined?

Thanks, I’ll fix this terminology. I tried to be consistent with the terms as used in Close Group Consensus vs Disjoint Sections… maybe some guidelines around when to use Group vs Section could be helpful to me.

Good catch. I’ve changed this.

I’ll update this as per your detail. Great use of the word hysteresis too!

My phrasing is probably a little ambiguous, but is still correct. The Permissions Section of the Mutable Data RFC says MD can have multiple users with multiple permissions. So permissions are applied per key, not to all keys of a MD.

To clarify via the code for MD permissions, permissions is a BTreeMap<User, PermissionSet>, which “Maps an application key to a list of allowed or forbidden actions”. This uses different terms for the same concept of User and Application Key.

Permissions as designed in the RFC allow multiple permissions per user with the definition permissions: BTreeMap<User, BTreeSet<Permission>> but as implemented in code is defined as BTreeMap<User, PermissionSet>. Just a subtle inconsistency in the definition of ‘multiple permissions’.

This also means permissions are allocated to users, not users allocated to permissions.

… but I’m getting stuck in the weeds here… I think the phrasing in the document is adequate to convey the concept! If you can think of a more suitable way to word it I’d be glad to know.


Thanks everyone for the feedback. This is such a great community and project to be part of.

21 Likes

Nice architecture overview! I tried to find discussion on performance aspect of the SAFE network but I couldn’t find or maybe i overlooked (apologies in that case). Could someone explain how the performance would be compared to traditional client-server model? since the files in the SAFE network are not only divided into multiple chunks and stored on multiple nodes, they will be encrypted as well. The response to a resource request will have to grab all those chunks, decrypt, assemble and serve the resource to the client. This whole process wouldn’t be a costly operation in terms of disk IO and therefore increase the response time significantly? I would appreciate some discussion/comments on this subject.

3 Likes

Ahh, that was an enjoyable read. Really outstanding work! Finally something perfect for showing around! Thanks thanks thanks!

3 Likes

The network doesn’t define it. It is chosen by the client app and can be anything, for example a random xor name or a hash of an application identifier.

While true, this doesn’t imply the following:

It only means that for example, user1 can have the right to create new entries and to delete any entries, while user2 can have the right to update any entries.

2 Likes

Maybe Profiling vault performance could be a good start. It’s not exactly a comparison to traditional client-server.

The chunks are assembled by the client, so the network only needs to know how to deliver chunks. This allows for very simple caching rules that should give high performance, especially for popular chunks.

What happens if this is chosen deliberately by the client app to be the same as an existing xor name on the network?

Or slightly extending that idea, what if a user computes the xor name for an immutable data chunk before uploading to the network, creates an MD at that same xor name, then tries to upload the immutable data? Won’t this cause conflicts? I’m just not quite clear on how names can be set by someone other than the network but still be secure.

Thanks for clarifying about MD. I really appreciate your in-depth knowledge of the code.

5 Likes

For MDs it’s XOR name + type tag that’s used. I’m not sure of the details of how this work, but I think Immutable Data has one address space and Mutable Data has one address space for each type tag.

If you try to upload an MD with the same XOR name and type tag as an already existing MD I believe you’ll just get an error message.

1 Like

Do you have a Maidsafe address for donations? :stuck_out_tongue:
That goes for you also @polpolrene

Great work gents, keep it up

3 Likes

Thank you @mav! I haven’t’ actually read the document yet, but I ran it through a simple spell-checker in Libreoffice and found at least the following:
eg → e.g.
ie → i.e.

I would recommend using another spell-checker to get rid of other possible mistypings. I think the one in MS Word may be decent for English. (I don’t have Word myself although I was on the team that created the checkers for e.g. Swedish.) Spelling and grammar checkers can be great tools for spotting mistakes, but never authorities on correctness.

2 Likes

Using eg and ie for e.g. and i.e. is perfectly acceptable - there’s no real right or wrong. I find Word always tries to Americanise words. For example Americanise → Americanize! e.g. and i.e. are more common in the US where they tend to use more punctuation than Brits and Aussies do - for example U.S.

3 Likes