original folder size with 100x9MB random files (made with rdfc.exe):
900 MB (943.759.360 bytes) folder
7-zip compressed with maximum setting:
91,8 MB (96.337.920 bytes) .7z
safe network files put size of all 11 nodes of run-baby-fleming
201 MB (211.402.752 bytes) with 3-4 duplication
if safe network keeps 3 copies then (without deduplication) 900MB compressed into 67MB!!!
if safe network keeps 4 copies then (without deduplication) 900MB compressed into 50MB!!!
I guess there should be some deduplication, maybe rdfc.exe doesnt produce really random data as its an efficient random file generator! or indeed safe network has the crown of compression!
I am now “uploading” 6000 x 10MB files in the 11 node local baby-fleming! we are still at 10GB limit per node?
no memory leak so far, cpu is at 16threads x almost 100% all the time! safe files put . is taking 50% of the threads and the other 50% is used by the 11 nodes!
is baby-fleming the new raid local storage for securing your files?
and just came up, can nodes go offline and recover? can the whole local network go offline and then restart it from where it left? @maidsafe
I am also interested to know if you are going to optimise the code for a local storage scenario
right now I see low ssd usage as I am running a local baby-fleming
there is a use case of launching some nodes (11) in your local machine and storing your data in those nodes, as the requirements for “uploaded” storage emerge then more nodes would start up.
I see a use case as having your files deduplicated and stored 3-4x in several nodes you can have a very nice local backend for your system where you are safe from data corruption and loss and encrypted and compressed (as this thread claims better than 7-zip)
there is also a use case for a local network where you have 3 rasbery pies and you run a network between those so if one rasperie fails the other hold a copy of the data. would be cool to have a networked raid like system with all the pros I mention earlier!
plus you can also invite friends and you can share things with them, or even a community or host a website in your own safe network and give access to anyone just by running the safe client/browser/other apps with your config
Getting excited by seeing your results @SmoothOperatorGR !!!
I always love to see my whole PC specs being used, that’s why I bought those many cores in the first place
I’d be curious to know if downloading the random data gives the same as the uploaded data (ie all chunks have successfully uploaded). The compression ratio seems a bit too good for random data.
The key here as i said is that the random generator is a highly efficient one and its a program from 2014 so I guess its not fully random so there must be deduplication.
rdfc.exe is a garbage. Its age is no excuse. This program is just made in the stupidest way possible.
You can never compress truly random data. No matter what compression algorithm you use (except for cases when compression algorithm knows about that particular data sequence of course).
It uses standard randLCG which produces only 15 bits of data. But it writes data in 32 bit chunks. Other 17 bits becomes filled with zeroes.
When half of the file have zero bits, it can’t be called random.
@SmoothOperatorGR I agree there’s no point in using random data. Real life data, common data - movies, photos, audio, documents would be most informative in this test IMO.
If you do want random files for some reason, easy enough with BASH
Both synthetic and real-world tests are useful.
For example, since random data is not compressible, it is possible to use it to quickly check if any data is lost, just by comparing amount of used space.
As for /dev/urandom, its algorithm (ChaCha20) produces results even better than rnd64 (PCG32). So for Linux users it’s a good choice for most use cases.
how to use this? what are the arguments I need to provide to this script when run?
ok it needs the number of random files to be generated
but could you make it so I can name it whatever I want and not the scripts name itself, and also followed by count not a random string
I can understand now why you did it this way…
cause if you rerun the script you dont care if there are already files there and you just name it a random name so the new random files are added to the existing random files without hassle