Update 09 June, 2022

We had planned to go deeper into the governance issues we discussed last week but unfortunately haven’t been able to do so due to a couple of key members of the team needing to take unplanned time off. All being well, we’ll come back to this next week.

In the meantime, please read @jimcollinson’s post on our strategic aims, and be aware that our objectives and vision have not changed one bit. And, please remember to keep the discussions, however passionate and heated, always respectful. We have a forum code of conduct that all members are expected to uphold.

This week we’ll look at data on the network, and what it means for files to be public or private.

General progress

Yogesh has been looking at databases to replace sled db, which is buggy and doesn’t seem to be actively maintained. So far the prime candidates appear to be Persy, a transactional database that optimises for consistency, and Cacache which @yogesh says “seems to offer the best speed out of the lot with built-in metadata creation and handling”. Neither are perfect but both would probably do the job. Testing continues.

Thanks to @josh for organising the DBC comnet last week. As @Chriso mentioned, depositing owned DBCs isn’t working yet but this is what he’s been working on this week, and @Qi_ma is looking into a DBC reissue bug and also working on spentbook integration.

Meanwhile, @davidrusu continues to work on getting membership information to adults in order to ensure membership and network knowledge (via the signed Section Authority Provider) are in sync across the section.

Public and private data on the Safe Network

What is a file on Safe Network? Simple enough question but the answer is a bit more involved. The basic answer is “content + metadata + datamap” - but what does that mean?

Content

Content is the raw material of the file, the basic binary information. Once this gets to more than 1 MB it is automatically self-encrypted to produce chunks and a datamap. Because of the way self-encryption works, this is deterministic, i.e. self-encrypt the same content any number of times and you’ll get the same chunks. Its security is largely independent of the encryption algorithm (we use AES256) meaning that if the algo is cracked the chunks are still secure.

OK so what is a chunk? Unless you have the datamap, a chunk is a meaningless blob of bits, mostly around 1MB in size with a name that’s also its hash. This means we can check if it’s valid – does the name match the hash – but we can’t tell anything else about it. We can see it but we can’t read it, or know where it came from.

Datamap

Right, so what’s a datamap? The datamap is a simple file that contains the unencrypted name of the content and the names of all the encrypted chunks that make it up, so we know where to find them (chunk name == Xor address). If it’s stored unencrypted on the network then anyone can use it to recreate the content. If it’s encrypted or stored on our private client then only we can do that. We’ll come back to encrypting the datamap in a second.

Metadata

And the last thing we need to mention is the metadata, information about the content. This optionally includes its size, its name, the file type and potentially date created, accessed etc. But wait a second, Safe doesn’t do time! True, but that needn’t be a limitation.

The reason we don’t include metadata with the content is it would ruin deduplication. Let’s say someone uploaded the Sex Pistols song GodSaveTheQueen.mp3, and someone else uploaded exactly the same MP3 but called it GSTQ.mp3. If the name was part of the content the chunks would be completely different so there’d be no deduplication. This means we store the metadata separately from the chunks. We can store it in a datamap on the network or on our client, which allows us to arrange these apparently meaningless blobs to our hearts content, name and label them as we wish – including time created and time accessed – and organise them into our own directory structures.

Directories can also be content, encrypted, chunked and stored as files with their own data map (which is why small files which don’t go through self-encryption are unreadable – all content is stored in a directory, but that’s one for another day).

Public and private data

The way Safe works is that data that is valid must be stored. This means we can’t delete chunks. But remember files are content plus a datamap.

Content is just meaningless blobs without a datamap, and those blobs are as secure and unknowable as is possible with current technology. To make GodSaveTheQueen.mp3 publicly available we upload it, publish its datamap on the network unencrypted and link to it. Chances are, with a well-known song like that the chunks will already be there but the original uploader, who named it GSTQ.mp3 chose to encrypt the datamap or keep it on their client and therefore private.

So that is the basic difference between public and private data.

If we encrypt the data map with a BLS key, this also allows us to create key shares that we can then send to other people, meaning we have shared private data. BLS gives us this magic for free. This means public/private and shared data are all client-side actions. The network stores data forever and clients use the (root) data map and encryption to make data public, private or shared private.


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

52 Likes

Blind squirrel got the acorn, I mean Gold!

19 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

I’ve noticed that the Uniswap DEX that was set up for eMAID hasn’t been used in a few days. :racehorse:

14 Likes

Wish I could get something other than bronze for once :wink:

16 Likes

A lot of parts in motion as usual super ants!

I don’t know if a typo in the update, but it reads ‘caracache’, but appears to actually be called ‘cacache’.

While I didn’t notice if Persy used async, cacache does, which seems very nice. The cacache DB also looks simpler than the Persy - and if the best part is no part, then that may be a plus too, depending on requirements.

As usual, thanx for the update! Cheers.

18 Likes

Well done to all the team another well thought out and informative update :slight_smile:

9 Likes

So chunks are like grains of sand, each one different, each once being part of some rock that was ground up by the sea and carried by the tides to make its origins unknowable. AES is quantum-proof, and self-encryption extends the protection further which is good, but the data map is an obvious weak point for private data. I wonder how that’s going to be protected.

5 Likes

What do you see as the weakness? The data map is encrypted and the keys are accessible only to the owner and anyone they share them with. So it’s a bit like a wallet.

4 Likes

Just thinking ahead to the quantum era when all current asymmetric cryptography will be vulnerable

7 Likes

I would think that the uploader when uploading sets the metadata according to the info they supply for the file. IE in the datamap <----- this is like a universal meta store for everyone

Then anybody, including the uploader then can store a set of meta data in their own directory structure allowing renaming, last accessed (not affected by others) etc etc. <------ this is personal meta store

5 Likes

I think we have a neat fix for that also. I will explain at high level with some points (to show work in progress)

  • AES is quantum-resistant for at least foreseeable
  • We use self encrypt for higher levels of security than that for chunks
  • We could default to a double symmetric enc mechanism for the root data map
  • As we use a base asymm key that is never exposed to generate more keys specific to events, i;e; a key per register, website etc. we can use the same for your root data map.
  • So AES(chachapoly(Hash(base secret key + "your root")))

This gets us past trusting a single algo and using 2 of them with a very strong derivable key. I like the simplicity of that, but still feel it’s worth checking for even better.

This is what I was writing up yesterday while travelling (to a funeral). I see this as the way to separate content from personal views (metadata) and allow folk to use timestamps if they wish on their personal data. Also worth noting with BLS encryption we can share keys easily, but the above does not do that for quantum resistance. So some work here yet.

11 Likes

Just to be clear, and from the response I am not sure it was seen. The idea is for any files, public or private. The original meta data is supplied by uploader and stored with the datamap for private/public since private data can be shared with the data map. Then each user when they add it to their directories will have another meta data set in the directories allowing each user to have last accessed date, permissions, etc for each place it exists in their directories.

This allows a universal known set of meta data stored with the file/datamap and local meta data and of course the user can reset their local back to the global. Advantages here is that a global meta with filename video001.mp4 that is public can be stored by the user as “Queen at Wembley Stadium.mp4”

6 Likes

Getting close to tags (cc @joshuef ) and possibly RDF (cc @happybeing ). Interesting angle.

I suspect just public or privately shared as private does not matter so much as original uploader can do what they want and nobody knows.

Unless we go tags/rdf where the content can apply to many tags/dirs/links etc.

6 Likes

Any idea when the Safe Browser will be able to view files that are stored on the community / public test-networks? To me that allows the non-technical to glimplse beyond the wall of the CLI and see actual sites hosted load in front of their eyes.

17 Likes

I love that you chose a picture with a hot-air balloon, @DeusNexus - reminds me of the logo for Smalltalk: Smalltalk - Wikipedia

4 Likes

As far as I’m aware, things like the browser would be a post-launch concern. The intention for initial launch would just be the CLI.

5 Likes

Thx 4 the update Maidsafe devs

:clap: :clap: :clap: Keep hacking super ants and never give up, we’re so close…

8 Likes

I think the limit is 3 KB

2 Likes

Thank you for the heavy work team MaidSafe! I add the translations in the first post :dragon:


Privacy. Security. Freedom

9 Likes