PARSEC and 99% Fault Tolerance

Huh? everything you have said so far defines a unique machine as a vm with a unique IP. If IP is not important why am I not allowed more than one vault in the same group with the same IP?

That’s total nonsense for me because a vault doesn’t control the number of chunks it must store and it stores many of them.

If all the copies of a chunk are from the same host then this implies you own 8 vaults in the same section meaning you have the 1/3 blocking minority and probably also the 2/3 deciding majority of the section. This is a situation worse than the risk of loosing one chunk and everything done by the team is to prevent this situation.

This is what the currently network does. In fact this is the definition of a group: the 8 vaults storing a chunk.

4 Likes

I could set my vault to only have space for one chunk if it passes ‘proof of resource’ nothing is stopping me from playing the game of minimal vaults.

Thats the point, by running 8 vaults instead of 1 vault you have a ‘chance’ of holding all 8 copies of a chunk. However, its impossible for 1 vault to hold all 8 chunks :thinking:

Which must all be on different IPs (to hopefully mean the data is decenteralised), so VMs are still bad for the network.

1 Like

I’m saying that the IP address limit shouldn’t be the defining way of stopping lots of vaults on one physical host. I am saying that if having lots of vaults on one physical host makes sense for attackers, then it also makes sense for honest folk too. If everyone runs 100s of vaults on a box, then the advantage of the attacker doing the same doesn’t give them an advantage.

5 Likes

I’m saying everyone shouldn’t need to have the overhead of running 100 dockers on there local machine for it to be deemed more ‘useful’ to the network than the simple physical hardware itself. Which is the same for the attacker too. The physical machine is the equal resource provided by 100 dockers (actually more because you don’t have the overhead).

1 Like

Sure, but if 100 dockers get hit for files and the machine with 50 dockers responds more quickly, then the latter is going to earn the host more and will grow in reputation more quickly. There is a physical resource limit underlying the virtual vaults which is always going to be a limiting factor, more so as the network matures (in both users and features).

We have to embrace this. You can’t stop people disabling arbitrary code to prevent many vaults running on a single IP address. They could just as well be many physical boxes behind a proxy. It may result in some undesirable side affects and inefficiencies, but nature and humanity is full of such behaviours.

I am sure the software will mature to reward more efficient and useful behaviour over time. I think what we have is a pretty good foundation though.

7 Likes

The latency of a more utilised machine will obviously be high for the chucks it cannot serve in a timely fashion because it’s serving something else. But more vaults will always mean a higher chance of being first because you have more data for the network to request. And if the network favours a different IP for every chunk, then having lots of tiny vaults on one physical machine and a massive IP block is the most cost effect way to run a farm. In my opinion also the most damaging to the network, because the more VMs (chunks) it’s running the more likely it has all 8 copies. Also you’ll always be first if you hold all 8 chunks.

1 Like

If you segment the world into 8 sections, use ping latency during ‘proof of resource’ testing when the node joins. You can assign the node to one of 8 sections. Nodes in one section can only hold one copy of a chunk. Now VMs won’t have an advantage or cause any damage to the network.

1 Like

You’re still really making this scenario sound easier than it actually is. I could buy 100, or even 1000 lottery tickets and technically then I have a greater “chance” of winning – it doesn’t change the fact that the actual probability is still near absolute zero, though, and that the money I spent buying 1000 tickets was a waste and could have been better spent elsewhere.

The same principle will apply to vaults and network sections.

4 Likes

I’m curious, would you use that same argument if you had ever won the lottery?
Someone wins the lottery every week, and the current way vaults are set up for rewards everyone who try’s to earn the most safe coin, is playing ‘a reverse lottery’ with peoples data even if they don’t realise it. It may rollover for a few weeks, but the moment a large public data set loses a chuck and it corrupts, all those safe coin they earnt will be worthless as the reputation of this network plummets to zero.

Maybe David does not agree with my arguments, but does not make him right, genius people make mistakes all the time. I’m simply challenging the creators of this network to make them double check they didn’t miss anything. It’s easy to get complacent when there is no product in the wild. You can add 1000 security features but it only takes 1 oversight in production for maidsafe reputation to be damaged forever.

3 Likes

That’s his way of saying: yes, statistically minimal chance for any individual, but as the number of individuals grows, the occurrence is bound to be more frequent.

The likelihood of this mentioned scenario is very simple to do a rough estimation of with some basic assumptions. I’m having a sandwich though, so won’t do it just now :slight_smile:

3 Likes

To make some summary I have 4 questions:

1.) If you have 1gbps connection 10 TB HDD and 100 IP addresses would be better to have 1 vault or 100 ?

  • And avarage storage will be 1GB
  • And avarage storage will be 100GB

2.) And if you have 1gbps connection 10 TB HDD and 1 IP address would be better to have 1 vault or 100 ?

  • And avarage storage will be 1GB
  • And avarage storage will be 100GB
1 Like

Let’s say there is global adoption of SAFENetwork, we are a few more people so say 6 billion users. It has become a trend to jam pack the devices with vaults; 1000 per device.

Source 1:
So let’s say we have new chunks coming in at about 1 per minute per user, 100 million chunks per second. Every copy of this has a 1/6mn chance of landing on a particular vault. So with 8 copies that is (1/6mn)^8 of a chance, for all of them to land on the same. With 100m chunks per second we have the probability of 100m*(1/6mn)^8 every second that a vault will end up with all 8 copies.
Over the course of one year, the probability is about 1.9E-39, which is a very small number. And well, you can see that it would take a long time, billions of years for this to come anywhere near “likely”. Earth will be engulfed by the sun long (LONG) before it.

Source 2:
Random relocation of nodes. This here will increase the likelihood a bit. But with the margins from above, I am not delving in to it now :sweat_smile: (but this is fun, so maybe later :slight_smile: )

We can tweak a bit and say there’s one chunk per user per second and things like that, but the order of magnitude is so vast, that it really doesn’t do that much difference.


Ok, so let’s try a little bit less optimistic numbers.
We have 600 million users, and they all are packing 10k vaults each on a device.
1 chunk per user and second is being stored, (let’s say this includes churn for simplicity).
600mn*(1/60k)^8 per second gives likelihood of 1.9E-22 over one year. Still not very likely.
After 100 trillion years, there’d be one billionth of a percents chance, or say after 100 billion trillion years, there would actually be 1% chance that it happened.


Let’s find the number, where we will have at least 1% likelihood within 100 years. That is going to be a little bit more uncomfortable.
100k users, each having 1k vaults, together storing 100k chunks per second, gives 3% chance over 100 years. So, that is a number that is not so nice. But I guess that small network for so long time, wasn’t very important to begin with, so, meh.


Anyone worried?

  • Yes
  • No

0 voters

4 Likes

More of a philosophical question but what does 99% faulty look like in reality? I feel there are some epistemological questions to ponder here.

Additionally, why is it 99%, rather than 99.1%? 1/3 BFT is clear, 99% is not.

There are some nice off topic components to address in this topic:

Only up to the point that the cpu / ram / network can support more vaults. Eventually the vaults get in each others way and they’ll stop being competitive / viable on the network. This is really interesting to consider, since it might start being almost like bitcoin mining where ‘computation speed matters’. Typically we’ve intuitively considered only bandwidth as the primary bottleneck, but the ‘many small vaults’ concept may lead to other ones. More thought on an extremely large network with ‘one chunk per vault’ is warranted, and would have implications to the less extreme idea of ‘many vms and many vaults per machine’.

The size of the network should be balanced, not pushing too small, not too large, but where’s the balance? How is it decided? How can it evolve and change in a useful way? Such a difficult and interesting question…

This is a question I’ve spent a lot of thinking time on and I think it’s possible to build a probabilistic model of the geographic distribution. But so far the overall answer seems to be a resounding ‘no’ for a lot of really practical (rather than theoretical) reasons.

It’s a good one to address since it’s essentially the main (only?) complaint Peter Todd has put forth about this (and other) networks dealing with redundant decentralized storage.

Solving it without trust is a really interesting question.

Not if all 12 are in the same datacenter and all apply clever misdirection with latency adjustments. If you know one of those nodes is on the other side of the world, then sure this would work, but how do you convince anyone else that the knowledge is true? It can only be done on a probability basis.

I am super excited to see these results! I’ve done a lot of testing myself and am excited to gnaw away at these future holes in the fabric of consistency. Please post them on the forum, people love reading about these things even if they seem ‘trivial’ or don’t confirm the initial hypothesis.

It’s hard to imagine ipv6 not being the standard for this network…

Whoever does the segmenting sounds like an authority to me.

Only if you trust the ping. Which you can’t. It might be artificially delayed.

This is a good point. There will be an extremely high frequency of events on the network, so even a low probability means it will happen reasonably often. It’s not enough to hand-wave it away, these things need to be engineered (see @oetyng above). Maybe the probability is low enough, but what’s the cost when it inevitibly does happen (maybe just by bad luck). The cost shouldn’t be outright ignored.

9 Likes

It depends on the size of the network how often it would happen.
If everyone was using many vaults per machine, then in the very beginning of the network, when very small (say about less than 100k vaults), then it could happen now and then. This is accounting for the high frequency of data traffic. So up to this is where it is problematic for real, with regards to costs and public adoption.
But as network grows the chances decrease so much, to such extremely low figures that it will be a practical impossibility, just as we consider it a practical impossibility to brute force SHA256 (even though it is of course not impossible, you just need something along the line of 4 billion galaxies each with 4 billion earths, each earth with 4 billion people each running a thousand times Google’s worth of servers, during roughly 37 times the age of universe. So, yeah.).

In general I think you can always only consider a systems reliability in terms of probability to maintain integrity over a given timespan. Same is done for probability that earth will be wiped by meteors.

Actually the risk and cost ratio for this thing is so low that the humanity risks being extinct to so much higher extent that we could almost call it a certainty, and in such a closer timeframe that we could almost call it right now, if comparing, which gives some perspective to the relevance (for an established network, the infant one is still vulnerable).

3 Likes

Lets define this a little better, you have (minimum) 8 nodes in each of these 8 sections (I guess maidsafe would have to do some initial boot strapping to get this online). Then 8 random nodes of a section are chosen, they ping the new machine (maybe 4 times? each). They vote an average based on response times and write this to a MD. All 8 section do this same process, once the last section has completed, it votes again and assigns the node to the one with the smallest average.

  • Security, if the the results for the 8 nodes pings are inconsistent / can’t be determined with any accuracy. This means the node either has a flaky connection, or is trying to bypass the geo-protection of the network. The node is assigned to section 9. In this section your node can be used for caching or other activities where data already exists on 8 other nodes.
  • A node being pinged won’t know which section the node pinging them is from.

Thanks for linking me to the Peter Todd post. I feel like we both have exactly the same concern about the way the rewards are currently set up and reducing redundancy.

The network at the beginning at some point will only have 100, 1k, 5k, or 25k vaults. The network will grow organically. (How many hardcore users for this forum are there? these vaults could be held by a small amount of users) If we lose data at the beginning of the network, it’s still damage to the networks reputation.

Satoshi didn’t think people would join mining pools. We don’t know in the future how many cores a machine will have. Can I somehow run a vault on a gpu core? or intel phi. With new technology being created to connect whole continents like Africa, bandwidth limits could become a thing of the past. If virtualising a machine into many vaults could reduce redundancy of the network now. Then in the future it could have even bigger consequences. It’s a real problem that should be solved in my opinion.

1 Like

If no fudging is occurring then you are going to create sections that are close to each other geographically. This is not good for the network to have the nodes in a section generally close to each other. Outages, country cut off, one data centre. Also increases the chance for badactor to get their nodes located in a single section.

2 Likes

Probably out of context in this discussion and not possible or very difficult to apply, but really short ‘round-trip times (RTT’s)’/latencies are of course not possible from every part of the world. Maybe there is a way to use that.
If you can ping from ‘source’ devices from different parts of the world to the same target, there should be some who give shorter RTT’s than others. If the RTT is e.g. never lower than a certain value (from any source), you could reject the target. And if there are RTT’s short enough that it has to be in the ‘neighborhood’ of the source device(s) who pinged, you have an rough idea where the target is located. Do that for all 12 targets and check if they are not all in the same ‘neighborhood’.

If a bad actor has all his nodes in the same geo partition section, they can only get a maximum of one copy of all of the data. Which is better for redundancy not worse.