Testnets with catastrophies and malice

There should, at some point, be testnets where clients/vaults are rigged to misbehave.

Like, every day at 4pm, 50% of the vaults could delete all their data at exactly the same time, to simulate a catastrophe. Or, 30% of the vaults could be malicious in some way, to simulate a government attack.

Claims can be made about the resilience of the network, but these claims would be best tested in a public, interactive way.

See also a few of my other relevant posts:

(But please try to keep discussion specific to those topics in the relevant original topic and not here.)

14 Likes

If you’re making a list then I’d add tests for any common single points of failure

  • Electric grid => tests for risk from brownout in a location… if the US electric grid goes rolling brownout, what happens to the dynamics… does the whole US stagnate as those nodes are recovering.
  • Operating systems => tests for risk of any OS update that forces reboots
  • Sunny days with no clouds => AWS or other single cloud host goes offline with too many nodes hosted there.
  • Governments being stupid => less likely short term
  • Hostile competitors or hackers with money motive => How does the network cope with spam or volume, does it always accept requests or refuse them when busy?

The more metrics and sense we have for the real limits, the better… if there’s sense the network is within normal parameters and has capability +/- that could become a useful long term statement of stability.

Also, if it’s known that in case of attack the network will switch mode to making more copies of data, then that would introduce another risk of whatever can prompt that being done deliberately.

Days since any data was lost = ++

8 Likes

There’s a chaos mode for nodes behind the “chaos” feature flag:

5 Likes

Yup. Chaos mode currently doesn’t do heaps and isn’t actively used in the pipeline yet (or wasn’t a few weeks ago). As we get stabilised we’ll hopefully be able to start adding more of this in to the regular test processes (or indeed test nets).

The basic impl as it was was just to drop some connections at random.

It could be made as complicated as needed (and with the feature flag is easy enough to include or no).

——-

Nice start of a list of desired bits of chaos to simulate, @davidpbrown :+1:

7 Likes

I can see that chaos mode being useful for some things. As far as I can see, it isn’t a time bomb though, which will be important as synchronised-then-consistent chaos could be more catastrophic than randomly occurring chaos.

1 Like

it can be configured.

if we want X to happen at Y time on Z platform… we could set that up. Or on with specific network signals… I guess. Should not be difficult.

But whatever we call such testing, it’s good to be identifying what strategies are useful :+1:

4 Likes