If a file is pre-chunked (split by the upload service into multiple unencrypted files), then a missing chunk means that some of the file is lost. In a large video file, this might mean a few seconds goes blank, which could be irritating for the viewer.
If the network’s chunking system is used without any pre-chunking, then any missing chunk would mean the whole file becomes completely unreadable (unless I’ve got this wrong?).
For large files, I’d assume that a ‘master chunk’ to refer to all other chunks would be the best bet. A ‘sea anemone’ layout instead of an ‘eel’ layout. But of course I am not one of the experts, nor have I spent years theorising about and building this :^)
Try to calculate with 8 copies of each chuck which is what they’re actually implementing. And I did this before. It was something like 2% of networks data being lost with a 50% network destruction! Which seemed OK!
Also keep on mind popular data get replicated a lot more. And I also agree on the fact that people may have to split their data, and it’s due to the fact that the network can’t read data unless all chucks are gathered but if we split the video into 1000 pieces of 30 second videos let’s say, we can still make do without seeing a 30 second period of the video. But for the network, as long as one isn’t available all is unreadable.
Alternatively, store multiple copies if the same data if you have really important data. Like your btc private keys.
I think the idea is you store the data in your LAN SAFE network, then also store it in the public SAFE network. You can read the data quickly off of your local network, but rely on the resilience of the public SAFE network if your office suffers a natural disaster or something. It’s effectively like you just cache all of your organization’s data on your LAN.
As I understand it, it will be more complex than 8, and more of a “there’s these main chunks, but also these backup chunks, and these other kinds of chunks” kind of thing which makes the calculations a lot more complex. But assuming 8 for simplicity…
Under the simplified system, 50% network destruction to the power of 8 copies is 0.390625% chunk destruction, or 99.609375% chunk survival, which means every file consisting of 6 chunks has a 2.3% chance of being wiped. Pretty good deal, but the bigger files get a worse deal.
Storing multiple copies of a video file would cost twice the amount as pre-chunking the video file, and still have a much higher risk of the file becoming completely unreadable
I think people won’t split their data because that’s hassle. It will be every program that talks to the network that automatically splits the data. I must be missing something vital, because I can’t imagine why any program wouldn’t pre-chunk the data by default, thus making the network’s ability to store large files obsolete.
Have you considered this might work in reverse as well? If the global network went down for whatever reason but your local network could stay intact (say local power generation and additional protection) could you republish data back to the global network from your local network?
Also would local LAN SAFE networks perhaps solve the problem gamers have of needing rapid information processing? If it’s local that means less and shorter hops. (Different problem but related to your solution here.)
Yes, this is true. Some of my posts above were based on misunderstanding/ignorance, and thinking that SAFE was one thing, when it is in fact something else. I’ve had some more time to educate myself since then, but I still support and am thankful for MaidSafe’s choice of GPL. I hope they stick to their guns. It just seems to me that SAFE as an autonomous network is most powerful/resilient as a communication tool when all nodes are running the same code set (i.e. KISS), and that GPL will make that most likely to occur. I know there are various perspectives…
SAFE@Home: A little random brainstorming
The idea of being able to setup 4 to 8 (or 32?) nodes on a private SAFE cloud on my home LAN that would only contain redundant copies of my own data ready to be served at high speed (ex. ip over infiniband) and low latency is rather attractive. “Backing up” my redundant home network with the scale of the planetary network (a SAFE PLAN ?) makes me feel even safer in case my house is flooded or there is a fire or a lightning strike. Although I’m not sure how well you could really set up a local SAFE network like that now. I’ve read that security improvements have led to the procedure that nodal data is now discarded when a node goes offline/churns, to then be filled back up again when it rejoins. Consider this :
I can see the robustness, security, and simplicity this offers to the network, but it would seem that using the same policy for small scale local SAFE on home networks for just my files on my LAN or your special files on your LAN and similar “micro cloud” use cases becomes challenging to accommodate. Maybe it doesn’t need to accommodate those uses, which is fine, but doesn’t SAFE’s power come also come from being scale agnostic beyond some minimum size to maintain consensus. It’s also likely that the MaidSafe team has already accounted for these use cases and I just haven’t read enough to find it, so my apologies if this consideration seems foolish. Anyhow, none of it is an issue if you just plan to restart your freshly reconstructed LAN by pulling data down from the global/planetary SAFE network, essentially treating the local network as a high performance but volatile cache. It’s just that under this scenario one might be without local access to data for an extended period of time if your internet connection was down or your isp or meshnet was having long term connectivity problems.
In regards to the main question posed in this thread, I would say that the microcosm of the local SAFE@home or SAFE cache mentioned above and its use of a planetary/global SAFE network as backup in order for data to survive a localised flood/fire/lighting strike is analogous to a case where humanity’s use of a global SAFE network survives a flare, EMP etc using backups in an interplanetary SAFE network on the Moon, Mars, Europa or somewhere else “off-site”. Since this isn’t feasible for the near to mid term until Elon meets his goals it would seem that an ideal solution to the problem of global reset may be found by solving/considering the local problem under the constraint that off-site backup is not an option. I’m not saying I have a solution for this, but rather saying that the microscale perspective may be a good way to attack the macroscopic issue. Mr. Irving mentioned that the archive nodes are likely the means by which the system could knit itself back together to survive a global reboot and I don’t doubt him. However, it might also be good to have other more localised redundancies (which appear to have been included during earlier versions of the network design? ) built in as well to ensure a backup plan for the backup plan. Perhaps this is exactly what he means by reintegrating the data from isolated peers via data chains. Very fascinating and eager to learn more.
TEOTWAWKI vs. Intel vs. Pop Culture
Also, I don’t think one really needs to concoct dire edge case scenarios either to consider these issues. Perhaps some unknown CPU hardware exploit takes out 90% of all nodes, let’s say for example those running on any Intel chip manufactured in the past 25 years perhaps (I know, ultra low probability and just a spectre of the conspiracy theorists imagination…). Instead, you could also consider things from a marketing/user perspective. Let’s just say some famous celebrity on twitter asks the question, “What happens to my selfies and cat videos in the SAFE network if the entire internet and/or all nodes have no electricity or battery backup for a day?” (Although I am willing to admit that an unscripted pop star discussing network battery backup requirements may be less likely than a global power outage). Imagine MaidSafe marketing having the ability to respond to that pop star’s question with something like “Nothing.”, or, “As soon as you are able to get your internet access back, the SAFE network and all your data will be there waiting for you.”, rather than, “In that scenario you have bigger problems to worry about.”, which may be the most true statement, but it is also the least cheerful and awe-inspiring. Most human endeavors can get away with that type of reasoning, but SAFE has set a higher standard for itself has it not?
Solving less dire “dark minute” or “dark day” black swan scenarios may likely yield a solution to more sever but very low probability crises as well, however I don’t think any of us expect MaidSafe to be focussing on these edge cases at this point nor would I want them to take their focus off the current tasks at hand. Theorising on topics like these is what they have us forum users for right?
I found a few other old forum threads that are related to this discussion and might be what the OP was referring to:
I assume you mean making 3MB files out of the large file.
I say this because the client already chunks the file into 1MB before uploading, so I assume you mean making 3MB files (making the minimum 3 chunks for any file).
The reason would be convenience for most people. But the idea of a modified client that did this automatically for you is interesting and could work. The problem would be getting wide acceptance for public files (eg vids) by enough people to be widely accepted.
For private files you could maybe use an APP to do this for you. Maybe use quickpar and set the output file size to 3 MB
If its optional then an APP could provide this functionality. The reason for suggesting an APP is that it reduces the updating required whenever the client is updated. Also whenever you can do something satisfactory in an APP it is preferable.
Also the reason I mentioned using the quickpar libraries/module in the APP is that “par” files can be created which will recover lost sections of a file.
Sorry for going on a tangent with regard to your original post.
I can see how one might want to pre-chunk files prior to sending them on the network for improved resiliency. It is similar to the situation where you could pre-encrypt your files with Blowfish prior to placing them on the network to give a feeling of improved security. In both cases you might gain some benefit but as @neo stated you also increase the chance for user error and frustration. Regardless, I don’t think the main goal for chunking the data to 1MB was for the purpose of resiliency but rather for obfuscation of the data as it flows through the network. I could be wrong and others here know more about this.
While pre-chunking would achieve similar resiliency using less redundancy, it may be a moot point. Dynamically modifying the degree of chunk redundancy to 4,8,16, or 32+ copies achieves the same or better resiliency for any given file size, spreads things out more geographically, and may also offer other read performance or decreased latency benefits, albeit the cost of increased storage space. It also makes a lot of sense not to do any sort of pre-chunking at the network level in order to KISS. Perhaps this points to a demand for an app or desktop tools that automates/manages pre-chunking, pre-encryption, and pre-multiplication of files in order to cater to the uncommon demands of unique users. The users could can then pay more/less safecoin for improved/reduced consideration of chunks depending on their needs. An app like this might also tie in well with @neo’s idea for private temp data.
While looking at your probability figures another question came to mind, which I think @foreverjoyful might have been alluding to:
If all redundant copies of a single chunk are lost for a file, then all sequential chunks following that lost chunk also become unreadable due to the self-encryption series… right?
Very likely, but we must consider what is lost, it will mean the redundant copies are lost, any archive nodes holding it lost it, any client holding it, any cache etc. It is best to think there will be a point of loss, but what is it. With storage tech soon, we will be able to have incredible redundancy (I feel) but maintain security at the same time. So the definition of lost should perhaps be explored to define what does that mean today in terms of lost from where and what can be done to make this infeasible, sans global destruction etc. i.e. data chains will allow peers who fail to restart and republish data, so lost here on the running network (ignoring cache, archive, clients etc.) has to be peers go off line and never come back on line (ever). So it does get pretty deep and not as simple as off line == deleted a chunk.
Perhaps Average Joe Laptop or Plain Jane Cellphone wouldn’t pay for faraday cages and underground bunkers but consider how much governments and major corporations might value their data and want to invest in it. Ultimately since one can’t tell one type of data apart from another protecting one part of the network is to protect it all and so to invest in such security measures would be a worthwhile investment for any high rollers investing in the SAFE network.
No worries about the deviation, some important points have been raised. My OP didn’t even mention pre-chunking anyway.
It is similar to the situation where you could pre-encrypt your files with Blowfish prior to placing them on the network to give a feeling of improved security.
I would say it’s not similar, because pre-encryption shouldn’t be productive if the network is coded correctly, whereas pre-chunking would still increase resilience.
Dynamically modifying the degree of chunk redundancy to 4,8,16, or 32+ copies achieves the same or better resiliency for any given file size, spreads things out more geographically, and may also offer other read performance or decreased latency benefits, albeit the cost of increased storage space.
Not really true for any given file size, but my point isn’t about the number of copies. I have no doubt that the number of copies will be configured sensibly.
It’s not a moot point, it’s a way of increasing resiliency with no downsides as long as it’s implemented properly. If it’s not implemented by MaidSafe, then it will still be implemented but in a non-standard and potentially troublesome way.
It also makes a lot of sense not to do any sort of pre-chunking at the network level in order to KISS.
I am not sure about this. The network chunks anyway, so why not change the way it chunks to make it a hybrid of my pre-chunking idea plus the existing system, instead of having both done by different entities?
Either way, any upload program that tries to upload a >3MB (or possibly >5MB) file should pre-chunk by default in a standardised way anyway.