Churn, number of copies, and perhaps an idea

MartinZ · May 11, 2015, 9:54am

I’ve been studying some of the technical aspects of MaidSafe and am intrigued about the so called google attack possibility. I know a lot has been written about this so forgive me if smth similar has already been answered on this forum.

But from what I understand the number of copies stored by MaidSafe of any single chunk of information is fixed at around 4. Regardless of the likelihood of the google attack, what if this paramater could be chosen by the user depending on data’s perceived value?

To keep it simple, if I store very sensitive stuff, cannot rely on local copies (say I move around a lot or my entire business and apps rely on Maidsafe), I choose to allocate say 10 or 20 times more of my own diskspace than I myself use, expecting a similar ratio of my redundancy (so its 10:1 or 20:1 as opposed to 4:1). So technically my donated storage/usage ratio becomes the insurance rate for my data. There could be sensible limits on this, but in general, the user would get to balance the level of safety against the necessary storage donated, and also the speed of transactions, as higher redundancy would probably slow things down. There could be recommended settings and sweetspots for various applications and uses (or maybe even a fixed ratio “insurance groups” like normal, safe, very safe, insanely safe etc. ), but in general, leaving this for the public to tweak could maybe increase the perceived value/safety and deter attackers, as there would be no fixed or observable redundancy threshold at which the network fails.
Again, since I’m fairly new to this place, forgive me if this has perhaps been discussed.

Btw, to introduce myself a little more, we are a small tech company involved in neural network currency market prediction, blockchain stuff (we’ve built an alternative wallet for BitShares) and are now looking for alliance with a decentralized DB vehicle to host one of our biggest projects. Hence my question is of a somewhat business nature - as if I underwrite my clients data safety with my business, I would like to fully understand worst case scenarios and how are they addressed.

Best
Martin

polpolrene · May 11, 2015, 11:46am

Hello Martin, welcome to the forum. I really don’t know what you mean with insurance rate of data. When you PUT data to the network you pay in SAFEcoin. When you provide space you earn (Farm) SAFEcoin when the data is requested. So when you need more space than you share, you need to pay in SAFECOIN. When you share as much as you take, you don’t really need to probably. When you provide more space than you use, you make SAFEcoin.

Once you got you data online, there is no Google-attack or something like that possible. Each chunk is stored on at least 4 machines that are online (1MB per Chunk, so for video thousands of Chunks). When one of them goes offline another copy is made within seconds. You don’t need to pay to get your data online over time. You pay once. The people who store the Chunks (highly encrypted) have no clue what’s in there. They hope you request the data so they can Farm SAFEcoin.

Hope this helps.

davidpbrown · May 11, 2015, 12:00pm

I wonder OP is worried about one big host going down fast. If a fraction of the network disappears; if for example the US internet cable got severed, would the distribution of only four copies be sufficient to guarantee always data available? If you are living in Iran and the RoW disappears would you be ok or would only those confident of being in the larger fraction of the network be confident?? I guess it’s a question that the testnet will answer more obviously.

MartinZ · May 11, 2015, 12:23pm

Thank you for your explanation.
What I meant by “data insurance” is that by donating more space than I’m using, I could choose to increase my data redundancy instead of obtaining safecoin. Or using the current model (if I understand it correctly) buy higher redundancy for safecoin, previously earned by providing more space than I use.

So suppose I have 10MB of super sensitive data and arbitrarily think 4 copies on the network are not enough. I therefore provide 100MB to the network and in return expect 10 copies to be stored, not 4. That way even if 1/4 of all the nodes are malicious and are taken down at once (the so called “google attack”), my data is still safe.

Forgive me my ignorance, as I am pretty sure you guys have had time to figure stuff out. Still I need to understand it for myslef when building larger structures on a particular data storage model.

Seneca · May 11, 2015, 12:47pm

This is indeed the current model, so that should clear things up.

Basically you want to pay for example double the amount of SafeCoins to get double the amount of redundancy in the network. I’m a proponent of this.

MartinZ · May 11, 2015, 12:55pm

Yes, if it could be achieved through that, that’s fine. Its more the idea of being able to adjust redundancy on the basis of one’s preferences that I see as a potential selling point.

Melvin · May 11, 2015, 1:13pm

I’ve never heard of this before, where can I find this?

Seneca · May 11, 2015, 1:18pm

I was referring to paying SafeCoin to be abe to PUT rather than directly exchanging Proof of Resource. I’m sorry, that was confusing.

polpolrene · May 11, 2015, 1:18pm

The number of 4 Chunks is the least that is stored. And it’s about Vaults that are online right now. A lot of people will only Farm coins when they’re online. So I heard David say before that it’s more likely that 10 to 16 copies are available. But not all of them are online. The Chunks are stored almost random (although there is math to it). So If you have 10MB of super-important data (your personal file on the network is also very important, otherwise, no account!) So 10 Chunks are duplicated and on at least 30 computers all around the globe. If one goes offline, that Chunk will be stored at another place very quick.

Seneca · May 11, 2015, 1:27pm

I’m not sure that this is still true now that vaults are not persistent anymore.

Eh? I think 10 MB is divided in 10 chunks of 1 MB, and each chunk is held in at least 3 vaults at all times (exception being massive network failure), so there are 30 “guaranteed” online copies at all times, usually 40.

The thing is, if the network loses all 4 copies of one particular chunk, the other 9 chunks may be useless because part of the data set is missing. In this regard, the bigger the data set is, the more chunks there are, the higher the chance that you lose all copies of one particular chunk. Simply because there are more chunks.

If people are willing to pay proportionally more for additional redundancy, why not let them?

polpolrene · May 11, 2015, 2:11pm

Indeed I’m way to tired, need some extra sleep.

happybeing · May 11, 2015, 4:02pm

I don’t think this is necessary because the network is designed to be robust and not lose data any more than any cloud service, less in fact. If you bombed an AWS data centre, do you think they’d lose some data? I expect they would. So, are you being paranoid? What would you consider secure from conventional clouds like Amazon?

Anyway, suppose you do need extra security, you can do so without engineering confusing “security levels” into the UX. Just tweak your data (modify the first byte of a file) so the network doesn’t de-duplicate it and PUT it again - you just doubled your security. Repeat and you have tripple security etc.

This works because the network chunks, encrypts, stores/deduplicates. So if the first byte of a file is different, every chunk is different as far as the network is concerned, and so will be stored anew, even though for you, you just stored the same data, twice, thrice etc.

MartinZ · May 11, 2015, 8:53pm

Haha, yeah perhaps it may sound a bit paranoid, but I’m trying to imagine some edge case scenarios. From a strategic point of view, if MaidSafe becomes successful, meaning en masse adoption, there may be an incredible incentive to disrupt it via all potential vulnerabilities. And a stealth digital attack is way more manageable than say a physical bombing, which we rarely witness in civilized countries (fortunately).

As to doubling information, this is sort of what I am getting at, but as opposed to doing this manually and replacing bits to prevent deduplication, I would imagine that automating this process depending on the perceived value of data could serve as an additional safeguard against potential attackers, and also boost confidence in the system as a whole.

This is a fascinating subject and, like I expressed in my first post here, I’m pretty stoked to see MaidSafe taking shape. We are really looking forward to giving this technology a spin (we’re setting it up as we speak). And in any case, thanks for shedding light on how things work, always a pleasure to talk to clever people

BenMS · May 11, 2015, 10:33pm

One suggestion is treating the SAFE network as a single RAID disk. Build a client that uses a duplication of several accounts to store the same data multiple times. You just have to put salt at regular intervals in your binary data (and extract it upon retrieval) to ensure that all chunks are hashed to different names.

davidpbrown · May 11, 2015, 10:52pm

I’m not sure the need for separate accounts to do that… you could just zip the principal along with a different extra oddity and upload those multiple copies for those to be held but if the network is seen to be robust that shouldn’t be needed.

Is the filename part of the hash?.. Duplicate files with different names are different??.. Expect not.

Caleb_Allen · May 11, 2015, 10:52pm

Could I get some clarification on what this “Google” attack would entail?

MartinZ · May 11, 2015, 11:22pm

To my understanding, such attack is based on creating a large number of malicious nodes ran by a single entity willing to disrupt the network. Hence the otherwise unjustifiable name “google attack”, but could be “facebook attack” or any other IT giant willing to disrupt competition. Since “google” is in control of all the nodes it created, it can take them down at once, preventing the network from recovering in time. If the number of malicious nodes is high enough percentage-wise (say >30-40%) it can result in data loss and corruption, and the trustworthiness of the network is compromised. This is especially easy to do early on, when the number of nodes is relatively small.

Regarding the name, I remember this being brought up in several discussions on this forum, and has been referred to as “google attack”. But it may not be a legitimate term, so forgive me if that’s the case.

BenMS · May 11, 2015, 11:24pm

No, only the file content is considered. The file system itself is stored and structured through drive or NFS API

MrAnderson · May 12, 2015, 12:24am

It seems the safe network will immediately be more redundant and secure than any cloud option out there. It is worth mentioning other “back up” methods besides internet storage. External hard drive in a protected safe (the physical one), encrypted data in the block chain, someone with a photographic memory.

Caleb_Allen · May 12, 2015, 4:05pm

Ah, I see. Thanks for the explanation.

Topic		Replies	Views
New to MaidSafe, have some basic questions about the project Beginners	13	2591	August 8, 2014
What about a "Dataflood" attack? Features	5	1104	September 26, 2014
MaidSafe And What It Might Mean For Digital Asset Management Blog Posts	26	3547	March 21, 2016
Have I understood Maidsafe correctly? Beginners	14	1670	September 30, 2017
Idea for a project/Require consultation Development	1	721	February 8, 2015

Churn, number of copies, and perhaps an idea

Related Topics