How is Data Stored and Retrieved?


#1

When a file is uploaded, the SAFE Network shatters it into 1Mb chunks, then it makes 4 copies of each chunk and encrypts them before planting in vaults throughout the Network. Because of this process, none of the vaults can read any chunk stored with them. Even if they somehow managed to decrypt the 1Mb chunk, they would only have a fragment of the original file and still not know who it belongs too.

Example #1: A user uploads a 10Mb file.
The file is split into 10 chunks (1Mb each) and made into 4 copies. This means there are 40 chunks spread out to 40 vaults. When the user requests that file, they call on 40 vaults. But only the fastest of each (4 vaults per 1Mb chunk) are used to complete the retrieval. The speed at which the user can retrieve their completed file is limited by the fastest copy of the slowest 1Mb chunk arriving at their location.

Example #2: A user uploads a 1Gb file.
The file is split up into 1000 chunks (1Mb each) and made into 4 copies. This means there are 4000 chunks spread out to 4000 vaults. When the user requests that file, they call on 4000 vaults. Only the fastest of each (4 vaults per 1Mb chunk) are used to complete the retrieval. Again the speed at which the user can retrieve their completed file is only limited by the fastest copy of the slowest 1Mb chunk arriving at their location.

Instead of a whole 10Mb file being called from only 4 Vaults… you call 40 (1Mb chunks) from 40 Vaults. This makes a BIG difference in retrieval speed.

Q: What happens in the unlikely event that all 4 vaults which share the same 1Mb chunk are down?

A: The Network solves this problem by duplicating the chunks to a new vault whenever a vault goes offline.


Network speed? Network data structures? Forever storage economics? Challenge-response authentication?
Safecoin Farming 101
#2

This is great!
The only missing step I see is the encryption/obfiscation.


#3

True, but I wrote this for the general public. MaidSafe Wiki docs can go over the specifics on encryption and obfuscation.


#4

I say add it anyway… The FAQs are not just for the general public. @david can you wikify this as well? Great post…


#5

Doesn’t need to give details, but general public should be reminded/told that nothing goes out unencrypted.


#6

I added a link to the docs explaining the encryption process.


#7

Or how the reduplication of encrypted data works.

Thank you.


#9

some files are smaller than1Mb. Even smaller than 4Kb. How do these files get handled?


#10

AFAIK, files less than 1Mb should be encrypted and obfuscated as normal, without the shattering process. The API breaks down files larger than 1Mb to improve retrievability from the network.

Smaller files do lack the “jigsaw puzzle” layer. That may be something the devs can look into once we get underway with Testnet2 and Testnet3.

I recall a discussion on the main dev list. The client can be adjusted to break down files into smaller chunk sizes such as 1Kb. But the default is 1Mb.


#11

excellent. thanx 4 the answer.
yes! more than enough people have sensitive info in files of >1Mb… :slight_smile:
Testnet3 is on the way… i can’t hardly wait :slight_smile: