How is Farming Centralization Disincentivized?

I’ve been looking around for an answer to this question and haven’t found one yet.

There is a sigmoid curve that determines a higher reward for average-sized farming resources (data storage), and lower for smaller and bigger.

What is to prevent farming syndicates (centralized storage facilities / data centers) from simply breaking their setups into multiple Safe accounts that dynamically resize themselves to the optimum size on the sigmoid curve? It seems to me a trivial problem to overcome for large farming outfits.

7 Likes

Technically, nothing would prevent them from doing so. I’m not sure it wouldn’t be a good thing.

But if it comes to the point where they feel they need to, they’ve already lost the dominance game.

Remember, the bulk of the network will be made up of people using essentially “spare” resources, i.e., over-capacity that they would likely have anyway and not be using, or at least not be fully using. Data centers would be having to put in place or divert resources that they had targeted to make income in and of themselves.

Yes, the data centers have huge capacities, but even they won’t be able to amass the numbers of nodes necessary to monopolize or control the network, especially once the network gets established, and till they see that happening, I doubt they’ll divert resources from their current profit models. We’re just marginal dreamers on the fringes, as far as they’re concerned. By the time they see it coming it will probably be way too late.

For them it would be relatively expensive compared to what they get out of it. That many nodes running on the network would actually give it pretty good stability and fast functioning. Yeah, they’d be getting a pretty big portion of safecoin, but what’s unfair about that? Look at the amount of resource they’d be contributing!

Others will be using the network just because of its features–as it builds out, anyway. Everybody would be getting safecoin, too, despite data centers getting a large portion of it. But again, the facility they give to the network will make the safecoin worth more, and more individuals will join, both for the use of the network and getting safecoin. Also, all that resource will drive down the farm rate, making their contribution even more expensive to them, while everybody else will have little if any increased overhead.

The point is that even with them putting all that resource into the network, they don’t actually get central control, just a larger part of participation, which means more resources for network functioning.

Centralization is about control. The network doesn’t actually disincentivize centralization. Hopefully it makes it impossible.

11 Likes

I really appreciate your response.

Yes, the data centers have huge capacities, but even they won’t be able
to amass the numbers of nodes necessary to monopolize or control the
network

Upon what evidence do you base this prediction? Economies of scale are the main factor incentivizing centralization in this context. So for a massive datacenter – say, owned by an authoritarian nation-state with exabytes or yottabytes of storage, putting aside a few hundred petabytes would be trivial, and in the early stages of Maidsafe could be multiples of the size of the Safe network. This makes the attack vector of “farm for a while, then pull the rug out, likely destroying a lot of data whose only 4 nodes are all in our datacenter” easy as pie.

For them it would be relatively expensive compared to what they get out
of it. That many nodes running on the network would actually give it
pretty good stability and fast functioning. Yeah, they’d be getting a
pretty big portion of safecoin, but what’s unfair about that? Look at
the amount of resource they’d be contributing!

Large actors threatened by a new system are willing to pay to attack it – their incentive would perforce lie outside Maidsafe.

Another attack vector: dominate the network, getting most of the safecoin, then crash its exchange value by selling all at once. Rinse and repeat.

Can you explain that more? Why would the datacenter (which, remember, just looks to the Safe network like a lot of average-sized farmers) have higher overhead than small farmers? The cost per resource in the big operation would be a lot lower, and their reward would be identical to the small farmers’, right?

The point is that even with them putting all that resource into the
network, they don’t actually get central control, just a larger part of
participation, which means more resources for network functioning.

I think the “pull the rug and destroy the data” attack vector described above could be defined as “central control” (fatally compromising network reliability), right?

Then what’s the point of the sigmoid curve?

Hopefully it makes it impossible.

Now you’ve lost me. If a malicious actor can shunt on 500% of the farmer resource instantly, the farming is 83% centralized, no?

3 Likes

Not sure where to wade in.

I definitely think that a large-scale attack of well-behaving vaults then a shutdown of a huge percentage would be a potential existential problem, especially early on. I was taking the point of “How is Farming Centralization Disincentivized?” as referring to “monopolizing farming” (as in bitcoin mining centralization). The attack vector is a different matter.

So let’s set aside the nuclear attack possibility for a moment. I think it’s definitely possible, though I don’t think it likely. Definitely possible.

I think I gave a reasonable argument regarding why masses of data center vault instances getting lots of safecoin is not so much a problem.

So, aside from “central control equals ability to do indiscriminate damage,” my point is that I believe the design of the network is such that in aside from a pull-the-plug type attack, etc., even a group of actors running a hefty majority percentage of nodes doesn’t have a way of knowing enough about what individual users are doing (or even who they are) to be able to coordinate actions to do anything meaningful to steal safecoin, corrupt specific data, censor or otherwise prevent specific communications, etc.

Such nodes could be triggered to all start generally misbehaving at the same time, which would definitely make a mess and possibly take down the network but, again, that’s a different line of discussion. Even then, though, misbehavior in a meaningfully coordinated way so as to “take over” or some such, seems very unlikely from where I’m sitting.

My main point is that I think there is not a “control” lever from this vector, aside from general carnage, and even that should be superable if the network can get up to size before huge resources are leveled at it.

I know, I’m both a technical simpleton and an optimist. But that’s my input.

The point of the sigmoid curve is to target consumer devices which will be available to lots and lots of people, thus encouraging popular participation of individuals. Data center attacks could definitely cause problem, even fatal ones, especially before there is a really broad base of individuals running one or more nodes. The Red Army could also occupy Alaska very easily. I just don’t think either party will do it.

2 Likes

@GabrielDVine You’ve got it pretty right.

First, as @fergish mentioned, the sigmoid curve reward design is rather to encourage small vaults, increasing the number of vaults on the network. We’re always attempting to achieve and maintain critical mass.

There is a uniformly random distribution that the data follows. This theoretically ensures that the data is spread out as physically far as possible.

This is implemented on two levels. The easiest level to understand is that data will be equally distributed by vault XOR address. The lower level is the actual vault XOR address itself:

In an XOR network, it’s desirable to have uniform distribution of nodes. However, using 100% random algorithm to define node placement will likely cause localised uneven distribution.

A node’s placement will be defined by the name of its PMID key.

PMID Create Store - Maidsafe Archives

Then, with a critical amount of vaults, it is hard to retain all four copies of any given piece of data on the network. Certainly not impossible (especially in the face of a philosophical actors) but difficult.

Don’t fall into the thinking that users won’t use the network without running a vault - if only temporarily.

To do anything else other than just view the Network, users must necessarily use the currency of the Network - Safecoin.

Take bitcoin - it’s hard to get into (without companies) because there’s only one way to get bitcoins - by exchanging fiat for bitcoins. Alternatively, the Network provides another way to receive currency, by providing resources by running a vault.

Most of the attacks that you detail are reliant upon a low number of vaults. In your posts, it’s approached from a different angle, but isn’t it number of vaults that dictate the distribution of the data that is stored on the network - and therefore the security of that data?

P.S. Hopes and positive thinking have never prevented an attack. Only practical safeguards.

EDIT:

I just call that good vault administration!

7 Likes

This is an interesting attack vector, why not maintain IP blacklists of known datacenters for a period of time to keep moneyed interests from gaining critical mass? When the network has reached an organic growth that isnt easily threatened by a google datacenter then the blacklists are pulled.

And disavow more than x vaults per IP?

1 Like

While I hate the idea of blacklists (since they usually target Tor), I’d support blocking Google and NSA addresses at start.

The plain truth is that bitcoin and Tor network nodes are public which allows anyone to recognize when an attack is imminent - Tor uses directory authorities to block IP addresses acting suspiciously but this allows other attacks to target the directory authorities themselves to disrupt the network. Because network nodes on SAFE are obfuscated, recognizing a sybil attack becomes much more difficult. That said, I think forcing an attack to require a flood of over 100% of the network size substantially reduces probability that this attack will take place.

Edit: remove false statement

4 Likes

Could you elaborate on the “payment for nodes” part of this mitigation strategy? I do not know what you are referring to.


@ioptio, I know that you’ve read the birthday paradox/sybil attack section of the systemdocs, but that can only come into play when the Network has reached critical mass (what is it again? 10,000? 10 mil?)

So if we constrain ourselves to talking about the beginning stages of the Network, I do believe that Maidsafe (the company) would be wise to cultivate and maintain a blacklist that a vault may choose to (or to not) implement.

This (as I envision it) would prevent the inclusion of any blacklisted IP addr into the DHT of a individual node. However, due to the nature of the Network, it cannot be banned or blocked per se.

Any attempt at banning IP addresses globally can - and should - be seen as a threat to the autonimity (I still don’t think that’s a word) of the Network. Rather, curating a blacklist of malicious actors can be done by anyone on the Network, with any type of motivation. Therefore, if a feature was added into safe_vault to integrate such a blacklist (as would be the choice of the farmer) I would back it.

Oops! Had been doing a lot of back-reading on the forum (old discussions and such) on debates around certain designs of the network and got node v. client mixed up in my head when I typed that. I edited the comment to remove the false statement. Thanks for pointing that out.

Regarding sybil attack… yes, the security in the network depends on a threshold of nodes. I thought it was 10k but don’t quote me on that. Eitherway, the design of the network is brilliant in that it requires any attacker a ton of resources to fake messages to other nodes… but I was thinking sybil attack in the sense that instead of faking personas/messages, they turn off their nodes all at once which requires <100% to do damage via network-wide data loss. So perhaps it’s not explicitly “sybil” but sybil-esque.

Again, I think this is very unlikely after a critical point in the network but it’s still something to consider especially when all the other decentralized networks we have as reference for sybil style attacks have mostly publicly known nodes and a far simpler task at recognizing such an attack.

3 Likes

Well, this is still an attack surface for a critic of the Network to point to; there should be some sort of informal mitigation strategy that we can at least point to and say “hey, we’re aware of this, and we did X to help the situation”.

Along with the curated blacklists - however independent thereof - would you see any value in establishing a site where farmers can self-report? For instance:

  • How many vaults they’re running
  • What persona they are farming on behalf of
  • How much space they’re contributing

And stuff like that. It can even be a script that runs alongside a vault.

Once again, something like this would not be canonical, as it would be voluntary and self-reporting, but the benefits would be massive.

Imagine being able to contact these farmers to ask them to perform in unofficial experiments, gather volunteered data, and share types of hardware tips and tricks.

Even more interesting being the ability to require a registration - but a registration performed by submitting their vault data - uptime, etc. that otherwise would not be collected by the Network. This would only be for committed enthusiasts of course, but beneficial to the Network as a whole by providing a place to come together and talk about the issues that are facing vaults at the time.

Also - it’s something we can point to and say: “Here are the farmers - count them yourself”

2 Likes

Nubits/peercoin uses a sort of voluntary voting system to regulate the network, in the case of nubits its to act as a decentralized central-bank to maintain the dollar peg, and its worked successfully for over a year so far.

As it relates to this theres no reason maidsafe could not allow a form of voting based on resources contributed, this would promote self-reporting and hold certain nodes accountable as custodians of the network. Have the threshold for submitting proposals lower than the threshold for adopting them to avoid big players from colluding to pass their own, and have the maidsafe developers use that to prioritize features and changes.

To simplify how this would be done a parallel proxy safe-coin would be granted to farmers along with any safecoin they farmed, these would be their voting tokens that are spent in the normal way. Otherwise if contribution statistics could be reliably and independently gathered then users could be given one vote per proposal that would bear a certain weight proportional to their contribution.

Honestly without burdening the developers any further a blacklist might be in call, but thats not to stop a bad actor for proxying their node.

This is already done with vault rating. What’s there to change?

Nice! @AlKafir and I spoke about something like this a little while ago. I imagine that a FARM ONLY votecoin could be used to shape the future of many things. Sad thing is, these can be sold and concentrated. Giving the rich an unfair advantage.

It’s possible a vote manager could limit the exchange of these coins to only one use. This would make buying them useless as they would immediately lose their value once received.

Still kicking around the idea.:slightly_smiling:

Well the topic is disincentivizing centralization, while voting on a blacklist may be democratic in the end you may just end up with centralization of vote power which ends in a feedback loop.

My understanding of the network topology is hazy, as one poster mentioned a google datacenter could be made to look like a thousand medium nodes, if thats the case then shy of a blacklist (and that would be hard to determine if it was google running all those nodes or a thousand people renting space on their servers), the most plausable way of approaching this would be to blacklist servers in general by using a network diagnostic technique to feel them out. This may have to be an additional network reporting mechanism to exploit some advantage servers have such as enhanced raid disk read speed or above residential bandwidth and use that to kick them.

On that note you could just cap the bandwidth of all nodes to be that of average residential speeds, so a datacenter would be heavily limited in what it can do. But if its response then is to spawn a thousand more instances then that will do very little.

So basically it comes down to “in what way is a high end server different from a home computer that can be detected by the network and not be spoofed?”

If such a thing could be achieved the odds of a malicious actor runnning a few million desktops in lieu of a datacenter in order to own the network tend towards 0.

I don’t think this is doable. I think centralisation in the sense that’s being talked about here won’t be a problem, and as I see it the algorithms of the network won’t try to stop such centralisation.

What the algorithms will do is prevent centralisation in XOR space. By incentivising large farmers to create multiple vaults rather than a single big one, it prevents their close group(s) from being overwhelmed when such a large farmer goes offline. The resulting churn will be spread over multiple close groups rather than being on the shoulders of one. It also makes sure that the large farmer takes on a proportionally larger share of the routing responsibilities that comes with any particular position in XOR space.

It is most optimal for the network to reward farmers exactly proportional to the resources they provide. Introducing diminishing returns for providing more is not only impractical, it’d hurt the ecosystem as a whole. The excessive centralisation risks in BTC mining are peculiar to blockchain mining. SAFE farming will see some centralisation for sure, but likely not in the degree that BTC mining does. I’m confident it can be “contained” without radical changes to the current plans.

7 Likes

Thank you for making this distinction between centralization in physical farms and centralization in XOR space. Seems to be a very important angle for judging the validity of farming design.

However, if this XOR centralization is what wants to be minimized, what is the point of the sigmoid curve in farming rewards?

After this thread, I’m convinced the sigmoid curve is a meaningless hand wave – irrelevant and ineffectual at reducing centralization of any type, while actually punishing the smallest farmers, i.e. regular individual users.

Perhaps it’s time to drop this counterproductive bandaid from the Safe network spec.

3 Likes

Uh, that’s what it does? The sigmoid curve (or something functionally equivalent) would prevent centralisation in XOR space. As you pointed out yourself in your original post, it doesn’t stop centralisation in physical space because you can simply run multiple vaults on the same machine.

That’s true, and probably not desirable. Still, I believe some factor should be in the farming reward algorithm to prevent huge vaults (centralisation in XOR space).

1 Like

You will find that it punishes both large and small for a similar reason, just different aspects of that reason.

  • Large - to discourage big players from running one vault and centralising XOR space
  • V.Small - to discourage big players from running millions of tiny vaults and grabbing a disproportionate amount of the chunks. Physical centralisation.

Obviously this affects home people with a ton of disk space spare who upload their data and turn their disks into vault(s). But they can simply run multiple vaults which uses more computing resources.

The downside is the very small vaults belonging to “phone” users etc. But remember that they will still receive rewards, just its 20 or so percent less. For the mega farmer who sets up millions of tiny vaults, this is significant and should modify their behaviour, but for the “phone” user they still get rewarded and encouraged to add a larger SD card to their “phone”.

The average vault size will still be relatively small compared to the current large disk size simply because the majority of “home”, and hopefully most, will be using only a portion of their medium sized disk.

Imagine a billion phones (& IoT) devices farming with 16GB (eg SD card). (1E18 bytes) That would ensure the average does not get too high :wink:

######*phones in the world (obviously not all wil be able to run SAFE) List of countries by number of mobile phones in use - Wikipedia >6 billion (2013)

2 Likes

Regardless of the Sigmoid Curve, these are my incentives as a farmer.

  • The chunk collection rate is the main reason I will not use a single large vault. If the chunk collection rate is 1MB per hour, then 1 vault collects 24 chunks… 2 vaults collect 48 chunks… and 3 vaults collect 72 chunks per day. Since my internet connection is not reliable, I need to fill my storage as fast as possible, which means creating as many vaults as possible.

  • The resources per vault is the main reason I cannot not create too many vaults. I’m limited because each vault has to process: routing, consensus, caching, and GETS… all of which use bandwith!

The Sigmoid Curve doesn’t matter to me. Once I find out the max number of vaults I can support, I’ll just allocate my total storage space among each vault. And if they end up being below the Average, I’ll add more storage.

7 Likes