Bravo maidsafe! you got better compression that 7-zip on max compression!

original folder size with 100x9MB random files (made with rdfc.exe):
900 MB (943.759.360 bytes) folder

7-zip compressed with maximum setting:
91,8 MB (96.337.920 bytes) .7z

safe network files put size of all 11 nodes of run-baby-fleming
201 MB (211.402.752 bytes) with 3-4 duplication

if safe network keeps 3 copies then (without deduplication) 900MB compressed into 67MB!!!

if safe network keeps 4 copies then (without deduplication) 900MB compressed into 50MB!!!

I guess there should be some deduplication, maybe rdfc.exe doesnt produce really random data as its an efficient random file generator! or indeed safe network has the crown of compression!

8 Likes

I am now “uploading” 6000 x 10MB files in the 11 node local baby-fleming! we are still at 10GB limit per node?

no memory leak so far, cpu is at 16threads x almost 100% all the time! safe files put . is taking 50% of the threads and the other 50% is used by the 11 nodes!

4 Likes

They cracked “middle out” dontchaknow.

4 Likes

using all cores! thats a nice feeling!

5 Likes

is baby-fleming the new raid local storage for securing your files?

and just came up, can nodes go offline and recover? can the whole local network go offline and then restart it from where it left? @maidsafe

I am also interested to know if you are going to optimise the code for a local storage scenario

right now I see low ssd usage as I am running a local baby-fleming

there is a use case of launching some nodes (11) in your local machine and storing your data in those nodes, as the requirements for “uploaded” storage emerge then more nodes would start up.

I see a use case as having your files deduplicated and stored 3-4x in several nodes you can have a very nice local backend for your system where you are safe from data corruption and loss and encrypted and compressed (as this thread claims better than 7-zip)

there is also a use case for a local network where you have 3 rasbery pies and you run a network between those so if one rasperie fails the other hold a copy of the data. would be cool to have a networked raid like system with all the pros I mention earlier!

plus you can also invite friends and you can share things with them, or even a community or host a website in your own safe network and give access to anyone just by running the safe client/browser/other apps with your config

6 Likes

That’s the plan, ultimately. Right now the BLS elder keys are gone on reboot, but we are working on that (stable set)

13 Likes

ok new data

with the rdfc.exe I created 6000 x 10MB random files

58,5 GB (62.914.600.960 bytes) the random file folder size

11,9 GB (12.849.020.928 bytes) baby-fleming 11 nodes size

waiting for the 7-zip size with max compression

from the data it seems that rdfc.exe doesnt produce fully random as the safe network is deduplicating it or using a godly compression!

7-zip takes long, and I have to go, no biggie, we know what the network does!

8 Likes

Getting excited by seeing your results @SmoothOperatorGR !!!
I always love to see my whole PC specs being used, that’s why I bought those many cores in the first place :laughing:

6 Likes

Very interesting experiment!

The encryption in use is brotli (see self_encryption/Cargo.toml L24).

I’d be curious to know if downloading the random data gives the same as the uploaded data (ie all chunks have successfully uploaded). The compression ratio seems a bit too good for random data.

5 Likes

The key here as i said is that the random generator is a highly efficient one and its a program from 2014 so I guess its not fully random so there must be deduplication.

And yes all data verified

2 Likes
  1. rdfc.exe is a garbage. Its age is no excuse. This program is just made in the stupidest way possible.
  2. You can never compress truly random data. No matter what compression algorithm you use (except for cases when compression algorithm knows about that particular data sequence of course).
2 Likes

Maybe you know of a powershell script or program to create random files?

Blunt and brutal as always @Vort :slight_smile:

I don’t know anything about it but why is it garbage?

1 Like

Look like I’m now obligated to do that search :sweat_smile:

It uses standard rand LCG which produces only 15 bits of data. But it writes data in 32 bit chunks. Other 17 bits becomes filled with zeroes.
When half of the file have zero bits, it can’t be called random.

2 Likes

Now that i think of it its no use to use fully random files except for the purpose to fill the network.

Better to have audio video and images uploads for realistic results as far as we look at compression

5 Likes

Thats why it deduplicates all the zeros! Cool non the less!

5 Likes

I found good quality program for it:
https://github.com/Tinram/RND64
Windows binary is here:
https://github.com/Tinram/RND64/releases/download/v0.41/rnd64.exe

Example command:
rnd64 -a 2m data.bin
Creates 2 megabytes file data.bin.

4 Likes

@SmoothOperatorGR I agree there’s no point in using random data. Real life data, common data - movies, photos, audio, documents would be most informative in this test IMO.

If you do want random files for some reason, easy enough with BASH

#!/bin/bash
# rfm -- rand file maker
counter=0
filesize=9000000
while [ "$1" -gt "$counter" -o "$1" -eq 0 ]
do
  filename=$(cat /dev/urandom | tr -cd 'a-f0-9' | head -c 9)
  head -c "${filesize}" </dev/urandom >"${0}${filename}"
  counter=$((counter + 1))
done

exit 0
2 Likes

Both synthetic and real-world tests are useful.
For example, since random data is not compressible, it is possible to use it to quickly check if any data is lost, just by comparing amount of used space.

As for /dev/urandom, its algorithm (ChaCha20) produces results even better than rnd64 (PCG32). So for Linux users it’s a good choice for most use cases.

4 Likes

how to use this? what are the arguments I need to provide to this script when run?

ok it needs the number of random files to be generated

but could you make it so I can name it whatever I want and not the scripts name itself, and also followed by count not a random string

I can understand now why you did it this way…
cause if you rerun the script you dont care if there are already files there and you just name it a random name so the new random files are added to the existing random files without hassle

1 Like