MaidSafe Dev Update :safe: 27th April 2015

Hi Everyone,

Last week was an incredibly busy and productive time and we are expecting more of the same this coming week. We spent much of last week working in Crust (the SAFE Network’s transport layer) and today we are looking to implement a local Crust network in the office. All things going well we will roll out routing and vault networks later this week. In addition, the installers are now functionally complete and are in testing.

Those of you diligently following our progress will remember that our aim at the end of the current sprint, which finishes at the end of this week, is to implement a TCP (only clients will be able to connect with this network initially, vaults will be enable to connect once UTP/UDT has been added to Crust) network running on Digital Ocean droplets, with self-authentication and self-encryption. We are also hoping to enable the Put and Get of simple chunks by implementing a few interfaces directly from routing.

During last week, several of the core libraries went beyond version 0.1.0 and those of you with a curious disposition will have noticed the ferocious amount of activity on [David’s Git repositories] (dirvine (dirvine) / Repositories · GitHub). As per our update last week these will be migrated to the company repositories once the current sprint ends.

Next week we will enter another planning phase as we figure out what parts of the network we turn our attention to next and these will be the focus for the next sprint. One part of the system that will be getting some attention will be the transport layer (Crust) where we will be looking to implement additional protocols, which as mentioned above, will allow vaults outside a single local network to be added to the test network.

So, now that we are just over 50% through the sprint I’m pleased to confirm that we are still on track to achieve our pretty aggressive objectives for this iteration. Everyone at MaidSafe continues to do a great job under considerable pressure and by remaining laser focussed on our goals we will continue progress the roll out of the network at an ever increasing velocity. It’s going to be an exciting ride!

That’s it from me, click here to read the dev update transcript. Now, some additional thoughts from David….

From my viewpoint the change in organisation has been nothing less than revolutionary as targets are defined, planned and then simply executed with overall structure constantly under scrutiny. This change is already increasing development pace to something that we thought was not possible. That sounds exaggerated, but it’s fast, iterative and very focussed on deliverables.

To that end, here is this weeks library/application to focus on; self_encryption. This is as you know is a key component of the network and as you will see from the readme, it’s ready for use now. We have a few small things to add (nmap for large files, threading for speed and compression), but none of this changes the API. The library includes a very simple example app with instructions for playing with the code. So please feel free to dive in and improve your coding skills and our core code :slight_smile:

So what can you so with this?

1: Encrypt files and safely store chunks on any cloud platform
2: Use in a small program to de-duplicate your filesystem and encrypt all files
3: Add in a self-authentication system (from maidsafe_client perhaps) and store session data in a key value store

So where can you store chunks?

Any key value store from a hard disk to cloud providers to an actual DHT or key value store, redis, SQL database, or anything you want. All you need to do is implement the trait for get and put chunks. This is all in the documentation which is here.

Of course any questions or help then please don’t hesitate.

36 Likes

Again I’m nothing less than amazed at the skill and focus here. As I was reading I kept getting confused because I think I have a few things mixed up. When Nick was mentioning CRUST as the data transport layer I thought what he meant was CRUX (connected reliable UDP exchange). So is CRUX contained in CRUST? Or in routing? And also I was under the impression that CRUST was a Rust library (probably because they have themed library names such as sodium etc). So is CRUST a strictly Maidsafe library? If anyone can help me straighten that out real quick that would be awesome. Thank you. :smile:

2 Likes

Crust will replace Crux and sit underneath routing in the SAFE Network stack. The revised diagram should help explain how it all fits together.

Crust is a Rust library that MaidSafe created, but will be highly useful for other P2P projects coded in Rust looking for a transport layer with NAT Traversal. I hope this info helps!

9 Likes

Thank you for the clarification @nicklambert. This makes sense to me now :smile:

2 Likes

Ok couple questions. 1. It says installers are implimented. Does that mean that we have installers that we can use now? If so can you please direct me to them so that I can install maidsafe. I’m currently using Fedora 21 and Windows 7. Either/or works.

Is this theoretical or practical yet? Meaning is there such a program in existence and does this mean that such a program could say sort through my music archive for example and get rid of all the odd duplicate songs for me that crop up over time? What exactly is this and what does this mean. Please clarify. :smile:

Sounds like they are merely testing installers and some network connections in house right now and the stuff in David’s repositories are prolly something you’d have to build but I don’t know that for certain

The installers are functionally complete, but not yet tested. Upon completion of successful testing we will roll these out. I appreciate you have been patient waiting for these, hopefully not too much longer to wait.

This program exists now, but is something better suited to developers used to running examples. What this library will do (quoting from the link): ‘This library will provide convergent encryption on file based data and produce a DataMap type and several chunks of data. Each chunk is max 1Mb in size and has a name. This name is the Sha512 of the content, this allows the chunks to be confirmed. If size and hash checks are utilised, a high degree of certainty in the validity of the data can be expected.’

So it will chunk up and encrypt files producing a data map, but will not deduplicate files, such as songs. I hope this gives you a better understanding of where we’re at.

2 Likes

@nicklambert thanks for the update. This is the most exciting news yet! :slight_smile:

Is that API layer diagram from the docs somewhere or is it new? I’d like to include it in the How to build an App on SAFE Network? FAQ.

Thanks.

It’s new Mark, slightly amended from this blog post to incorporate recent changes. It would be great to include this in the FAQ, especially informative for visual learners.

Thanks @nicklambert, I’ve added the diagram and link to your blog.

I’m still trying to understand what is meant by the API.

So far I understand that it includes:

  • the RESTful API (POST/PUT/GET/DELETE key-value store), though I am not sure how this might provide more extensive NoSQL features (such as in the Redis API, which handles collections etc.)
  • the NFS API (retrieve a list of file descriptors via the Launcher, then manipulate files using the standard file system library)
  • access to a virtual drive (provided by Drive)

I’ve summarised this in the FAQ mentioned, but I’m wondering if there is more to the API than this, for beta, launch, or planned later (see my questions here). Any corrections or additional info you can provide would be helpful, though not urgent :slight_smile:

Thanks for the update! Been sleeping so well since I invested in Maid! :relieved:

Ok judging from your description there and the library description (which by the way makes me think we need to have a new set of descriptions and documentation written in English for the rest of us) it sounds like this “dedumplification” library doesn’t actually do deduplification at all but rather converts files into data chunks, encrypts them, and THEN deduplificates those encrypted chunks. Which is something entirely different than deduplicating actual data in a file system. (Which again is why I think we need better documentation: So that we have more effective communication as to what things actually do.)

Many folks miss this part. The chunk store in effect de-duplicates in real time as any same files will produce the same chunks. So in effect de-duplication is a natural side effect of such a convergent encryption scheme. So you can consider it real time as the data is encrypted, basically it means the store won’t grow in size if you encrypt data that has already been encrypted elsewhere. So in the network this ‘elsewhere’ is pretty large so for normal data that’s seen by many folks, there is an increasing chance it’s already encrypted/stored etc.

6 Likes

Sure, let me check with @Viv and we’ll answer as much as we can right now. Thanks Mark.

1 Like

If I get this correctly, the de-duplication process checks for very similar small chunks coming from different original files.
Is it possible for these chunks to combine into a corrupted file during the merging (GET?) process ?
Do you measure how “duplicate” two chunks are? Is it always 0%(Not duplicate) or 100%(Fully duplicate) or sometimes with 98%-99% accuracy?

The chunks need to be an exact match for the algorithm to deduplicate, even minor differences in chunk contents will provide a completely different hash. It is possible for chunks to become corrupted for a variety of reasons, but the data managers look after and check the integrity of the data recreating chunks where required.

5 Likes

Its always exact match because the deduplication is based on the hash of the chunk content (I believe :-)), which is unique for each chunk, but the same for identical (encrypted) chunks.

2 Likes

Yes, SHA512 of the chunk contents. This 512 bits also serve to determine their position in the XOR space. In this way the different managers know where to save in the PUT (or know that already exists) and where to look for in the GET.

2 Likes

So when is TestNet 3 launching? I posted here about 3 weeks ago asking this same question and the answer was “3 weeks to a month”.

Fast forward 3 weeks and here we are back at the same point yet again. You guys seem more like it’s going to take you somewhere close to 3 months before you even hit TestNet 3.

Here is what I said and we are still open development for all to see every move.

1 Like