SAFE Network Explained: Architecture

Thanks @mav this is superb and valuable. As I’m reading I have a few queries - I don’t know the detail so you no doubt have this correct but I want to ask just to get confirmation before committing your descriptions to memory.

Oh, and the odd suggestion too :slight_smile:

Intro

  • How about “data storage and communications network” in the first sentence (‘retrieval’ seems redundant).
  • Network tokens are distributed to vault operators by the network for providing these resources. How about Network tokens called Safecoin…?

Self Encryption

  • The datamap also acts as an encryption key for the chunks it refers to. Sounds very neat, but is this right, or does the data map include / hold the key?
  • … the file may be encrypted by the client before being uploaded using the encryption option built-in to the client software. I thought all files were encrypted, but that private files were made private by encrypting the data map, and public files shared by sharing the unencrypted data map.

Resource Identifies

I’m a bit vague about what qualifies as a resource identifier and what a resource is. I’m thinking that they are an address of something, such as a data map (eg for an immutable data item /file) but maybe not always. Could you explain that a bit more here?

Mutable Data

David (on the forum - I can find if you need it) has outlined how smart contracts can be implemented with existing MD, so maybe add this to the use cases with a reference or short explanation?

Vault Naming

  • All vaults are allocated a random unique 256 bit identifier by the network upon joining. Maybe add … or rejoining.

Churn

Maybe also mention natural churn, as vaults are switched off by their owners, or become cut off due to connection problems etc, and point out that this assists security as you’ve explained, but that data are still preserved.

I hope some of that helps, and thank you again. Great to have this level of explanation all in one place. Lots of work obviously, comprehensive clear, and very well written - really well done IMO. :clap:

13 Likes

This read so well, way to go @mav this is great

5 Likes

Just brilliant! Well done @mav, great work. It reads very well and if a non-technical person like me can understand it all then you’ve done a fantastic job of keeping it clear and simple. You’re a huge asset to this community dude.

'Nuff respect.

11 Likes

That must have been a lot of work :+1:
2 minors things that could be corrected:

  • enpoints -> endpoints
  • Is this sentence correct?:

The identifier for these resources are SHA3-256 hashes of on the resource content.

EDIT: to be clear, I suspect the ‘on’ in the sentence above must removed (nitpicking, I know).

5 Likes

Yes the sha3_256 hash of the content == name (of Immutable Data). for mutable we can name it many ways though.

5 Likes

Very complete description of safe network!!!

A few details to be corrected though:

  • What you call group is now called section. See the change of terminology in MaidSafe Dev Update – December 6, 2016. A group is now a subset of a section of exactly 8 nodes. It contains nodes that are the closest ones to a specific address.

  • What you call section is in fact called prefix (this is what you defined as “the leading bits of their identifier”)

  • Section size is >= 8 but is not necessarily <= 16 because a section is split in two only if both halves are greater or equal than 11. Meaning that it splits when it reaches 22 if it is well balanced, but can grow above 22 if it is unbalanced. The margin (11 – 8 = 3) is a hysteresis factor added to avoid a merge quickly after a split.

  • Permissions are not associated with specific keys but are applicable to all keys of a Mutable Data.

14 Likes

Amazing that you both picked up on a point that I really labored on while writing. For me, ‘data and communications’ seemed too broad to capture the readers interest, so using ‘storage and retrieval’ seemed focused and precise. Having seen this feedback I think it’s best changed to ‘data storage and communications’.

This is a good idea, but I think the document becomes too lengthy. I also think talking about what’s broken in existing systems doesn’t actually help in understanding the safe network in the context of this particular document.

Good one, I’ve updated this.

Datamap is defined in maidsafe/self_encryption as “Holds the information that is required to recover the content of the encrypted file.” The data it contains is just the list of chunks. There’s no indication of encryption for private data (although from memory this is not yet implemented).

Each datamap chunk “Holds pre- and post-encryption hashes” (L20), implying the chunk is what is encrypted (via self-encryption, not to differentiate public / private data).

So I’m not totally clear on the specific detail and would benefit from further clarification from maidsafe devs about how public / private data differs (preferably a link to an existing document).

I’d like the explanation in the document for private / public data to be a little clearer.

Yes I feel there’s not enough clarity in my use of the terms chunks vs resources vs files etc. I’m pretty sure the usage is consistent, but it’s not clear. I’ll try to improve it.

Yes, smart contracts are briefly mentioned in the Messaging section of the document, but I think it could benefit from a little more detail. I’ll see about adding some more info.

Good idea. I’ve added this.

Good catch on the typo. Fixed.

One detail I’m not clear about is how mutable data names are defined. The Mutable Data RFC doesn’t seem to specify this. Can anyone link to a document or code that clarifies how mutable data names are determined?

Thanks, I’ll fix this terminology. I tried to be consistent with the terms as used in Close Group Consensus vs Disjoint Sections… maybe some guidelines around when to use Group vs Section could be helpful to me.

Good catch. I’ve changed this.

I’ll update this as per your detail. Great use of the word hysteresis too!

My phrasing is probably a little ambiguous, but is still correct. The Permissions Section of the Mutable Data RFC says MD can have multiple users with multiple permissions. So permissions are applied per key, not to all keys of a MD.

To clarify via the code for MD permissions, permissions is a BTreeMap<User, PermissionSet>, which “Maps an application key to a list of allowed or forbidden actions”. This uses different terms for the same concept of User and Application Key.

Permissions as designed in the RFC allow multiple permissions per user with the definition permissions: BTreeMap<User, BTreeSet<Permission>> but as implemented in code is defined as BTreeMap<User, PermissionSet>. Just a subtle inconsistency in the definition of ‘multiple permissions’.

This also means permissions are allocated to users, not users allocated to permissions.

… but I’m getting stuck in the weeds here… I think the phrasing in the document is adequate to convey the concept! If you can think of a more suitable way to word it I’d be glad to know.


Thanks everyone for the feedback. This is such a great community and project to be part of.

21 Likes

Nice architecture overview! I tried to find discussion on performance aspect of the SAFE network but I couldn’t find or maybe i overlooked (apologies in that case). Could someone explain how the performance would be compared to traditional client-server model? since the files in the SAFE network are not only divided into multiple chunks and stored on multiple nodes, they will be encrypted as well. The response to a resource request will have to grab all those chunks, decrypt, assemble and serve the resource to the client. This whole process wouldn’t be a costly operation in terms of disk IO and therefore increase the response time significantly? I would appreciate some discussion/comments on this subject.

3 Likes

Ahh, that was an enjoyable read. Really outstanding work! Finally something perfect for showing around! Thanks thanks thanks!

3 Likes

The network doesn’t define it. It is chosen by the client app and can be anything, for example a random xor name or a hash of an application identifier.

While true, this doesn’t imply the following:

It only means that for example, user1 can have the right to create new entries and to delete any entries, while user2 can have the right to update any entries.

2 Likes

Maybe Profiling vault performance could be a good start. It’s not exactly a comparison to traditional client-server.

The chunks are assembled by the client, so the network only needs to know how to deliver chunks. This allows for very simple caching rules that should give high performance, especially for popular chunks.

What happens if this is chosen deliberately by the client app to be the same as an existing xor name on the network?

Or slightly extending that idea, what if a user computes the xor name for an immutable data chunk before uploading to the network, creates an MD at that same xor name, then tries to upload the immutable data? Won’t this cause conflicts? I’m just not quite clear on how names can be set by someone other than the network but still be secure.

Thanks for clarifying about MD. I really appreciate your in-depth knowledge of the code.

5 Likes

For MDs it’s XOR name + type tag that’s used. I’m not sure of the details of how this work, but I think Immutable Data has one address space and Mutable Data has one address space for each type tag.

If you try to upload an MD with the same XOR name and type tag as an already existing MD I believe you’ll just get an error message.

1 Like

Do you have a Maidsafe address for donations? :stuck_out_tongue:
That goes for you also @polpolrene

Great work gents, keep it up

3 Likes

Thank you @mav! I haven’t’ actually read the document yet, but I ran it through a simple spell-checker in Libreoffice and found at least the following:
eg -> e.g.
ie -> i.e.

I would recommend using another spell-checker to get rid of other possible mistypings. I think the one in MS Word may be decent for English. (I don’t have Word myself although I was on the team that created the checkers for e.g. Swedish.) Spelling and grammar checkers can be great tools for spotting mistakes, but never authorities on correctness.

2 Likes

Using eg and ie for e.g. and i.e. is perfectly acceptable - there’s no real right or wrong. I find Word always tries to Americanise words. For example Americanise -> Americanize! e.g. and i.e. are more common in the US where they tend to use more punctuation than Brits and Aussies do - for example U.S.

3 Likes

I mainly wanted to point out the usefulness of automatic checkers. I stick to American spelling, because I learned English in the US, but I guess documents for Safenet should be spelled according to British standard. Of course one has to choose the desired dictionary in Word manually. The problem with Word is so many functions are active by default. The problem is with the settings - not the checkers themselves, if used correctly.

(I also localized Clippy, the famous Office Assistant that everybody hated, but that doesn’t mean I like the implementation of it.) :wink:

“Enpoints” would also have been picked up automatically.

I’m no authority on English, but I still think those abbreviations should have periods (full stops). When it comes to spelling, the important thing is to be consistent, I think.

https://en.oxforddictionaries.com/definition/i.e.
http://dictionary.cambridge.org/dictionary/english/ie

1 Like

Good and logical choice to use what is already out there and proven thoroughly.
I don’t know the finer technical details of sha3_256 (Keccak), but I see that Belgians are involved in its creation, so it can’t be that bad :wink:
Probably better then making your own hash function, like IOTA: https://medium.com/@neha/cryptographic-vulnerabilities-in-iota-9a6a9ddc4367

1 Like

Exactly. This means that several mutable data having the same name but different tags can be uploaded in the network. They all will be stored in the same group of 8 vaults (the closest ones to the common name).

To avoid collisions with existing MDs having the same tag an application should use a random nonce concatenated to the source identifier and then hash the result to compute the name. This also renders the name unpredictable, which prevents an attacker or a competing app from squatting a name that the app will need in the future.

I share your concern about this. If all clients use a hash function to generate MD names then they will be uniformly spread in the xor namespace, which is good. But the problem is that apps are not forced to do that. I see a potential attack based on the creation a set of MDs having a specific name to overload a group of nodes in the network, for example to get control of a section by eliminating these nodes.

5 Likes

Won’t this only mean that in the worst case someone might be able to fill the hard disk space shared by some node(s) by creating lots of MDs with almost the same names, maybe just increase it by 1 bit every time, but then some other random(?) nodes should come and take their place?

Americanisation and Britishification of text is a challenge, even for those of us with a couple of decades in both countries.

Using tap instead of faucet, or boot instead of trunk just takes some getting used to.

The real tricky situations come into play when someone uses a word common to both, that means different things. The spell checker won’t fail it and eyebrows will raise - especially if you use fanny when you mean butt (or buttocks or bottom). It’s not the same part of the anatomy in Britain.

Just remember to splash in plenty of extraneous “u’s” and remove all “z’s” and your American English becomes British English, largely :wink: