New User Question - But how will my data be secure if a machine goes down that holds it?

Had to hammer this home to a person over drinks on Friday, to an engineer of all people. :sweat_smile:

I finally hit the ball out of the park when I said this.

  1. The planet will be turned in to one giant hard disk.
  2. Each users donated hard disk space will be a sector on this giant hard disk.
  3. The data will be backed up accross all harddisk-sectors smartly…( and this next part convinced them
    to say “take my money”) The algorithm is like RAID but better!

I went on to give the example we all now as a planet download the same movie or music multiple
times, and as part of the SAFENetwork having multiple backups will be very reliable.

At a very basic level I mentioned that data will be encrypted and sliced up in a million pieces and
backed up across the planet with a RAID type algorithm.
Technically saying RAID is incorrect isn’t it? But its the only way, TO AN ENGINEER!!!, I could explain it in this case. oh man.

Anyway 2 more people are on board now.

11 Likes

RAID - Redundant Array of Independent Disks

RAID-0 : consists of striping, but no mirroring or parity.
RAID-1 : consists of data mirroring, without parity or striping.
RAID-2 : consists of bit-level striping with dedicated Hamming-code parity.
RAID-3 : consists of byte-level striping with dedicated parity.
RAID-4 : consists of block-level striping with dedicated parity.
RAID-5 : consists of block-level striping with distributed parity.
RAID-6 : consists of block-level striping with double distributed parity.

RAID-NL : Network Level Redundant Array of Independent Disks ( SAFENetwork).

or as suggested

RAID-∞

edit:

More correctly is is just

MAID

From a convincing a newbie perspective I had to go down the RAID route first.

3 Likes

The network has a parameter that controls how many times a chunk is duplicated. Say for example that its value is 8: this means that each chunk is stored on 8 machines. As soon as one of these machines goes down, another copy of this chunk is immediately created on another machine to keep the permanent 8 copies.

So, a simultaneous drop of the 8 machines would be needed to lose the chunk. This is very unlikely because the 8 machines are geographically randomly dispersed all over world.

Maybe we could name this system RAID-Infinite (Redundant Array of Independent Disks - Infinite).

2 Likes

It’s not RAID, it’s MAID.

Massive Array of Internetworked Disks

2 Likes

source: https://www.pinterest.com/pin/434245589040327925/

Unskilled edit, someone with more artistic talent be my guest.

5 Likes

even with 8 machines down isnt there caching all over the network?

This is not reliable because not all data are cached (mutable data are never cached and immutable are not cached when they are not access frequently).

1 Like

My only outstanding question here is how quickly does the overarching network recognize the drop of a host that had a specific chunk to determine that chunk needs to be copied elsewhere to keep say the 8 replication factor(thinking in terms of apache cassandra lol :stuck_out_tongue: ) . If it can detect it within say 1-5 mins that likely should be fast enough to prevent a chunk loss, but if it takes say 30 mins to an hour to realize a node went down with that chunk I think we will be in for some problems.

We can’t assume 8 personal random computers around the globe storing data are going to be crazy stable. Certainly some network die hards will be but I think 50% or more of the hosting power will be those interested in testing it but not fully invested into helping the ecosystem(meaning they will cut off after some time when they get bored/not getting rich off their few maidsafe coins collected heh).

1 Like

In the past IIRC it has been stated that a missing vault would be detected in well under a second, and data replicated in a fraction of a second.

2 Likes

Looking forward to hearing how the math and estimates work out there. Sounds impossible on a massive network at scale (talking millions of nodes).

Replying to your post takes milliseconds from the point of hitting reply to it appearing on your computer.
My reply is channeled to a URL.
I assume over the safe network, checking the addressed bit of data also takes milliseconds.
Edit: What I am looking forward to finding out over the vauls at home, is the amount of CPU time
a vault uses to do the above.

Each chunk is duplicated locally in the section responsible for it. With replication factor = 8, we are talking about only 8-22 nodes, all directly interconnected.

3 Likes

Gotcha, and that section talks to other nodes that are not geographically within the same location so we are pretty safe there, at 8-22 nodes I still would expect consensus to take 5 seconds or so, not sub 1 second.

I suppose the technique of how it knows comes into question. Does it check all hosts every time a GET is made by a client(which can be seen as client polling, imo not good enough because vaults could go down long before a client ever re-requests the data). Does the network stay chatty where in the background section validates that all chunks are still at a RP(replication factor) of 8 for availability on the network? If this has all been answered elsewhere feel free to link me over to it. Then even more questions come into play well when you have vaults with 100,000+ chunks of data, validating its all present in the bg, seems like scale could play an issue unless there is some “hash” check that a vaults share to check all the data is present it would expect compared to neighbor section vs each individual chunk(maybe in this case an elder has the “trusted” hash other vaults have to match with).

I wish all the low level interesting tech stuff of the SAFE network was well documented in a glossary of sorts with hyperlinks to how every little detail works. Not scattered forum posts.