Mojette Transform and SAFE

In the vim/emacs style war between between erasure code and replication, for our best @dirvine wisely(as always) choose replication. Now we have SAFE with is replication infrastructure but what if this is not a war erasure code OR replication? Can we have collaboration between erasure code AND replication?

What happens if we put just one more step in the self encryption of the SAFE network?


Let say we Mojette tranform a file. We will have a bigger file.
Bigger file → More chunks → More nodes need to store but at the same time more node to serve.

Where is the gain? You would say!

Well the chunks of the Mojette transform have an interesting property, they are correlated and you only need n out m chunks of the Mojette Tranform file the reconstruct the original file.
The size of the pool of node for the Mojette Transform file is bigger than for the original file.
The bigger the pool the faster you will receive the file.
Extra Chunks Racing
Let the m chunks of the Mojette Transform race against each other and as soon as we have n chunks we are done and we can reconstruct the original file. It will always be the fastest n chunks out of m that will come to you.

Gain

  1. it may speed-up the SAFE Network
    if the overhead of Mojette transform encoding and decoding + extra traffic congestion is less than the potential gain of the extra chunk racing
  2. make the SAFE Network even more resilient
    On SAFE an attacker “just need” to control one chunk now it must control m-n+1 chunks do have the same effect!

Cost
more space is need and generate more traffic.

There’s lot more to takes into account but it will take a bigger napkin :wink:

Also farmer may want to look at Rozofs (Mojette transform solution) http://rozosystems.com/ for Scale out NAS architecture.

4 Likes

Is this not sia’s method of storage?

Is it not control ALL four copies of that one chunk? And those 4 copies would require an extremely large portion of the SAFE network to be controlled in order to do that.

The larger the file the more work the PC has to do to decode it once it receives the required chunks. This time is not inconsequential. It can take my fast PC many many seconds just to verify a 4GB file with a multi thread verifier, how long for reconstruction while checking???, and increasing time for larger files.

This will occur because you request all m chunks even though you only need n chunks. Critical point for mobiles.

Mobile phones/tablets
Can this work effectively on them for 4 GB files?? Or is the reconstruction going to take too long.

What of my IoT device that has 0.080 GHz (80MHz), 512MB memory, 8GB storage, two core and it is building a number of 4GB files over a month. It has to download each file many times a week since it cannot keep all in its storage and other IoT devices use those files too. Will it even be possible? Considering it takes rough estimate minimum of 10 seconds on a 8 thread machine at 3GHz with fast SSD just to check it (not rebuild), now 2 core IoT at 0.08 GHz with less capable instruction set too (ie more instructions to do same job). How long would it take? More than half an hour just to check it, what time to rebuild while checking it 2, 3 or 6 hours?, and that is after all blocks required are there. At least with SAFE I can start reading the file after the first chunk in the file arrives.

That time really makes it not worth it. There maybe too many data points for the limited storage in the IoT.

[EDIT: (Can some one verify this) With SAFE you can access just a portion of the file without the whole being downloaded, whereas you need to download the whole file to rebuild it.]

your thoughts?

.

Sia
blockchain (to store contract) + reedsolomon? (for erasure code)

Here it’s
SAFE + Mojette Transform (for erasure code)

Yes ALL four copies and maybe more (i’am thinking of node with lower rank that have also have copie). As you say if 4 copies would require an extremely large portion of the SAFE network to be controlled. Now imagine what it would require for 4*(m-n+1)… xkcd:538 :wink:

Yes i know and i choose Mojette Transform because it is less computational expensive than other erasure code, this paper Performance of the Mojette erasure code for fault-tolerant distributed hot data storage
may give you some hope.

I did not perform any kind of metric(just early thoughts) so if for mobile it does not work than a quick solution is to just use SAFE.

Thanks @neo

2 Likes

The other thought is that current SAFE (well I am almost certain) allow people to just access the part of the file they want to. For instance to watch a movie I only need to download the first few chunks (as a buffer) to start and the chunks are requested as needed to continue watching. This is how a lot of tablets/mobiles will use SAFE, to watch/listen to content. If I had to completely download the 4-8GB of the parts first and build the file then watch, I’d probably give up because of both the time to download and time to rebuild. And my tablet might not have the 8+GB free to do this.

The advantage of speed is lost. Just because you only need n of m chunks which are delivered in almost parallel fashion, the need to have all n chunks shoved down your “limited” connection bandwidth does not compare with only requiring a few chunks to start listening/watching my media.

This is very important in my consideration on the usability of the network.

Yes I do understand what the Mojette transformation would add to the system, after all usenet has had their adaptation of it for years, but is it better all around??? That is a important consideration for people adopting the network.

Here is a thought.

  • What if SAFE core remained the same.
  • You have an APP that automates the access of files you want the benefit of Mojette Transformations done on.
  • The APP is only for those files you deem necessary.
  • The APP stores the file as m files of 1 chunk each. Or vary it with a few of the m per file
  • The APP reads these files and outputs on your device the required file.
  • then you have even more redundancy.
  • those who listen/watch media can use media that was stored ordinarily on the core SAFE
  • those who want fast access to the complete important file can use the APP to store the file and retrieve it. In most cases the important files are not as lengthy as media files and have little cost burden to store.
  • win-win We keep core SAFE as is with all its benefits and for those few files requiring the extra security and maybe speed you can use an APP to store/retrieve
2 Likes

It absolutely make sense to me! :smile:
Encoding and decoding could be made at the edge.
An you still have Mojette Transform and SAFE.

Part of file can be a file…

1 Like

I don’t get it. I didn’t notice a free downloads section. What’s the benefit of visiting the site?

you need a window of three consecutive chunks to decrypt the middle one, for files that are bigger than 3 MB (if Im not mistaken :smiley: @brian_s )

2 Likes

I just finished reading the paper and it seems my fears are realised. While it is better than other implementations it still exhibits increased rebuild time at a greater rate than the rate of increase in size.

Figure 2 is possibility the best example. Comparison of CPU cycles required to “process” a file

(6,4) encoding
For 4K the time is 2.353 times the base copy rate.
For 8K the time is 2.702 times the base copy rate

(12,8) encoding
For 4K the time is 3.116 times the base copy rate.
For 8K the time is 5.297 times the base copy rate

This does not bode well for large file and times. The multiplying factor for processing time of the mojette transformation over straight copying is not constant but grows as the size of the file increases.

Not good if you are after a fast network,

There is a definite interest into exploring this area more - at least from my side. Roughly I would have envisioned applying error correcting codes to the chunk, after the self-encryption process has run its course. How does a Mojette transform relate / compare to more traditional error correcting codes (if you have some answer ready to that - I’m not cross-examing you, but you seem already informed on the subject).

A second reason I prefer such “redundancy-measures” (ECC or Mojette T or others) after the encryption process is for vague concerns that redundancy of the data prior to encryption might weaken the encryption itself. Do you know of some relations or literature between encryption and Mojette transformation ?

1 Like

Basically the data is transformed. Only the “error correcting” data is stored

There are created m blocks from n blocks of data, where m > n and all m are what you might refer to as error correcting bocks.

This allows any n of the m blocks to reconstruct the original n blocks.

The downside is the time required to reconstruct the original data. And the larger the data the multiplied time to reconstruct. Time for 1000MB >> 1000 x Time for 1MB

Mojette does not provide encryption. AFAIK It is a reversible transformation, without need for or knowing any keys

I would suggest that Mojette transformations are done above the storage layer and not in the core of SAFE, because this adds complexity, processing overheads incurring a barrier for small devices and perhaps multiplying the CPU load for SAFE

2 Likes

If I’m not mistaken too, the encryption key is already in the data table. No need to download the previous or next chunk to decrypt the chunk you want.

1 Like

There is a link to the github on the site: https://github.com/rozofs/rozofs

1 Like

Thanks. I checked all the menu items, didn’t notice it.

1 Like

I think you just need the hash of three consecutive chunks and they are in the data map as mentioned by @Ghaunt

The way I read it is you only need the hash of the previous chunk or the last one for the first chunk. I don’t see anything about the need of 3 consecutive hashes.

EDIT: I read it in reverse. The next hash is needed not the previous one.
EDIT: For the picture at the bottom I read that it need the previous and the next one to make the hash. Different meaning.

1 Like

For chunk n Key Gen(#n-2 #n-1 #n) (3 consecutive hash) give you AES 256 Encrypted chunk n.

But the Encrypted chunk n is itself XORed with hash of chunk n+1

So #n-2 #n-1 #n and #n+1 (oups 4 hash!)

Please correct me if i am wrong but that’s how i read it.

1 Like

Yeah I misread. I now read 3 consecutive chunk’s hash (FALSE: previous, middle, next). But not four. It’s an iteration example you have to follow the arrow that point to KEY GEN and there are 3 of them.

EDIT: not easy to read. It need C-2, C-1, C to encrypt C.

Yes but i must find it again… :anguished: :wink:

Sorry I’m little slow understanding English people. But where you see the encrypted chunk n is itself XORed with hash of chunk n+1?

EDIT: I think I understand. You combine both presentation together. No wonder it’s hard to understand.

1 Like
                    Ok(compressed) => {        
                                let encrypted = encrypt(&compressed, &pad_key_and_iv.1, &pad_key_and_iv.2);
                                Ok(xor(&encrypted, &pad_key_and_iv.0))

The encrypt function use 3 chunk, 0,1 and 2.

1 Like