What about a catastrophic event that wipes out millions of nodes

That indicates some kind of ‘pointer’ functionality that this might rely on in the client :​D
Standardisation to ensure that everyone’s using it in the same way is still vital for pre-release in my opinion.

Go Here and check out the section of MutableData.
https://safe-network-explained.github.io/architecture

“The content of mutable data may point to other mutable data, allowing the creation of chains of mutable data [6] that can be used for many purposes such as version control and branching, verifiable history and data recovery.”

I think you will find that what is described is that the network has the inherent capability do exactly what you want… It might just take an app developer to implement it in the manner you desire.

So if MaidSafe comes up with a standard for version control, involving pointers to previous versions, I guess it wouldn’t take much to adapt that standard with file splitting in mind when that feature starts being tested (​:

One thing I don’t think we’ve talked about in this thread, but also bolsters your desire for “pre-chunking”, is the ability to use multiple threads on a single file. Pre-chunking large files into 4 or 8 pieces, then self-encrypting each piece in parallel sure would speed things up a bit on typical commodity desktop hardware. Especially if you write an app with a threadripper or GPU acceleration in mind.

Good point. Perhaps a tree-like structure of chunk dependency would also have the benefit of working well with parallel processing. One thread for the first (trunk) chunks, then more threads as the later (branch) chunks can be encrypted without taking other branches into account.

Related forum threads :

1 Like

Yes. When catastrophic event wipes out millions of nodes: I will worry about my 9.52% of lost video files. Or: I will worry about millions of lives lost in same event. EDIT: True: If it’s a computer virus, maybe no lives lost.

This may be true. You could do RAID on top of SAFE:

  • Split your file to 1MB blocks.
  • Group them into groups of a few blocks.
  • Make one or more parity blocks for each groups.
  • Store all blocks as separate documents on Safe Network.

You can restore your file if no more than the number of parity blocks is lost from each group of blocks.

So the solution is one of these:

  • Somebody could write an app for this. Tools already mentioned.
  • Safe Network could store chunks like this. Do we know: Why does it not?

Main reason would be there is no RFC yet, granted our RFC process has been lacking of late, but that is about to change (new hires). The other thing is replication/error correction/raid/reed solomon etc. factors always have a limit. So we can always lose up to the max of any algo and folk can still say then why not double up, re-replicate each chunk several times, increase replication count, add a mix of replication and another scheme etc.

So it is a very interesting topic and an easy resolution, like group size the replication factor or similar scheme is just a figure we choose. The limit will always be breached given enough of an argument, its an easy arg, just lose 1 more piece than the algo can handle. The trick is 2 things really.

  1. Are the pieces lost forever (will the nodes come back on line?)
  2. Are there copies available (Archive nodes)

The list goes on. for worlds data such as wikipedia, ancient texts etc. there are really nice things in design these days like the laser etched disks that SapceX put in the Tesla. Archive nodes handling this data would be definitely feasible. Then if huge swathes of data can be made tertiary like that the replication factor of more live data can be increased.

I think the story is a bigger one and likely to change over time. Increasing replication count is easy if we offload data, which I fully believe will happen. I can see these kinds of new long term storage becoming widespread where nearly every computer can hold old immutable data in huge quantities in ways that withstand the horrors the humans may do to the planet. Then SAFE becomes a mechanism where we are agreeing and arranging current data that will eventually go tertiary like the rest.

I am babbling a wee bit, but hopefully this expands the area a little to encompass possible future proofing data storage, creation and agreement over time.

8 Likes

Hmhmm if one would take raid software and then do the ‘standard’ self encryption on top of it… Then data could be replicated even if parts of it are lost and it might use less storage space all in all than raising the replication count :thinking: (and you don’t really need to request all parts if everything goes well)

Just thinking loud…

Yes, that is what I mean though raid 2,3,4,5,6,7,8, etc. or replication count 2,3,4,5,6,7,8 etc. The trade offs are very similar and there was even a French chap who did a Phd on it that concluded replication was simpler and just as effective in real world situations. A bit vim/emacs it is a debate that could rage through the ages though :wink: Bottom line is there will be a limit to whatever choice is made and those limits will only allow folk to say what about limit+1 it breaks so why not increase the limit, either raid or replication will do that, its just a matter of increasing forver, until all nodes holds all data, if that makes sense.

4 Likes

Yes. When catastrophic event wipes out millions of nodes: I will worry about my 9.52% of lost video files. Or: I will worry about millions of lives lost in same event. EDIT: True: If it’s a computer virus, maybe no lives lost.

Sure. But if the choice is between miillions of lives and files being lost, and the same millions of lives but no files being lost, why would we prefer the former?

Somebody could write an app for this. Tools already mentioned.

Yes, and multiple apps will be written with no unambiguous standard, which will create file duplication on the network. Maybe when the network is closer to release, a splitting standard will be defined, and then anyone using a different method will just have to pay more to upload. Without a standard, anyone uploading non-unique files using a splitting app/plugin will have to pay more.

2 Likes

I’m not entirely French but also vote for replication. I’ve never really liked dealing with anything other than raid 0 or 1, although raid 1 doesn’t really offer much data security since it doubles your troubles just as fast. :sweat_smile: One of the things that attracted me to safe was what I perceive as its ability to do away with all the luks, raid, lvm, snapshot, rsync-hardlink, zfs nonsense.

The point where storage capacity and related technology increases faster than the generation rate of human information is an interesting scenario. Until we get to this holonomic inflection point where each node has the capacity to store all information contained in the whole network, I’ve often wondered how difficult it would be to offer the user manual control over the number of non-sacrificial replications per chunk. (Does the network still deal in sacrificial vs. non-sacrificial copies?) IMHO the only things blockchain do well are immutability and redundancy. SAFE wins on immutable generic data support so it would be interesting to look at the cost/benefit of allowing individuals to specify the redundancy setting either on a per file or per account basis… potentially allowing them to store a file across across all existing network vaults to achieve the maximum redundancy. (Ex. A 1MB file per vault for ~10Billion vaults for a cost of ? $ per chunk = ? $)

I see why you might not want this functionality in the network layer, and understand the design decisions behind fixing the count globally, so it makes for the possibility of an interesting app or utility that would automate the more manual approach to increasing the non-sacrificial count (ie. pre-encrypting the file in multiple different ways or saving the same data to other formats). From a psychological perspective it would be reassuring to know that a for little extra safecoin one can ensure that 64 or 128 non-sacrificial copies of a birth certificate or deed would always be maintained, even if the algorithm tests show that one doesn’t really “need” more than the network’s default 4 or 8 non-sacrificials to ensure safeness. Humans will be human.

EDIT:

I get your point about having it in the app layer will lead to “needless” duplication that reduces network efficiency, but this will only be the case if people use such an app to reupload a lot of common public data… I’m not sure if there is an incentive for that. Private unique data is unique by definition so not really a factor. To a certain degree isn’t manual duplication of public data via an app level utility essentially a free market vote as to the data’s relative importance? If you do try an implementation in the network layer, then it seems like it would lead to a variable redundancy setting for each chunk… which may allow for some interesting optimizations related to upload popularity and caching… but probably not feasible from a simplicity standpoint or worth the bother… maybe it is… don’t be afraid to brainstorm. I’d say brainstorming always saves dev time, as long as we give reasons for why they should not waste time on something. :grin:

3 Likes

My point isn’t about it being in the app layer, just that it should be somewhere and on by default.

Many file uploads will be duplicates. Such as backups of an operating system. Users won’t want to sort their files into public and private; they’ll just upload everything. I think most uploads will be unique, but why not save the users a bit of money by letting them upload their OS for free? :​)

This is a goal of the network from the start. Not let them upload as such, but have immutable copies of OS’s that are security audited and updated properly.

Taking it further though, micro kernel OS’s where on login you “boot” into a secured OS from any device is interesting. There is a lot to it, but a secured decentralised non owned network that has immutability built in is a great way to skip any need for virus checks etc. at least of OS related files. So we can squash the attack vector a bit and remove a swathe of attacks on people. I wont go into it all again, but there is a stonkingly good project that has many of the bits already in place when we launch. Secured boot against data that is received and hash checked etc. is great, not the full story by any means, but a great start (the kernel you boot from can be attacked so it reads bad hashes as ok and all that stuff). In any case we provide a new mechanism that is well worth investigating here.

9 Likes

Yes, but we could define a target. Something like: “We accept that one of a trillion 1 GB files (that is, any of their 1024 chunks each) would get lost with a 50% chance over 250 years.” The target itself is not as important as being able to reason about probabilities of data loss under different circumstances.

Could you at least name it? This got me very curious.

1 Like

Sorry I did not mean one was in existence, I meant this was a fantastic project waiting to happen. Sry for the confusion.

2 Likes

Its as if SAFE is a viral hologram. Its starts off as a viral crystal. Its an Indra’s Net.

There might be an easy way to do it following the same procedures used for diskless
network booting. Basic description here:

You could use rolling hashes to split the chunks (IPFS does that), so say you add a byte in front of a file, currently all subsequent chunks would also change, but with rolling hashes just one chunk would change => more deduplication.

1 Like

This does sound like a promising idea.

But I mean more along the lines of preserving files from existing OSs, such as the millions of Windows 7 installs that people keep around on their disks. OSs are just an example, I’m sure there are many cases where users might want to upload content that contains mostly duplicated content.