[Doubt] system management of a decentralised system


#1

I have a doubt: how does a decentralised network without any central overview and system management, stay up and running. :slight_smile: I know of (some of) the self-healing mechanisms built in to SAFE, but to me it seems implausible that alpha version 0.1 will be able to run for the next 10 years. Then again, it worked for TCP/IP over the past 40 years :).

I think it would be a good thing to include into project ‘SAFEspace’ (or any other low-level client application) very clear and powerful vault management tools. We need to cater to a wide range of farmers, from hardcore techies to unknowing grannies that just have desktop running because their grandson asked them to do so. :smile:

My point is: this is about as cool as it gets for challenges :smile: (Anticipating) the health of the network is crucial, once people trust it with their safecoins, personal DNA sequences/medical records, and social network ;-). It can’t go offline, ever !


#2

Actually, as an immediate after thought, it’s probably best to have a wide range of different health-monitoring tools to be created by many people to best cater to the wide need for different people with different backgrounds and wishes. Of course that is not to say that SAFEspace, or other apps are freed from providing an interface to monitor the health of their own vaults.


#3

Agreed, although much of the work is to make this so. I think the testnet testing will be crucial, if it is relatively bug free then the SLA should be 100%. There are issues we consider such as worldwide electricity outage and how the network recovers from that (not simple) as all close groups are a huge distance apart and the reconfiguration is very very fast as nodes come back on.

It should be a critical measure of the network though, to test from a fully working system, trigger a power off in an instant and then random power ups to check the assumptions are correct. Nodes initially should be responsible for huge data range, but with little data, as nodes connect they bring more data, but a closer range between them (so less responsibility). Huge data movements will occur though.

Another aspect is huge segmentation, for safecoin for instance the spending on the small side should stop as the large side will carry the weight of consensus on rejoin, there are some cool parts there though as the holder of any coin will be on one half of the segment though, except when he gets a plane or changes provider of course (and edge case). (NB bitcoin has this also, but no answer I know of yet)

This is only one aspect though, there is also the upgrade aspects which are more tricky. We cannot repeat the skype outage where all win servers updated and rebooted with a fault bringing the network down in a flood of data (avalanche failure). So upgrades handle randomly, but hit means we have to watch API breaking changes and occasionally have nodes be able to act as old nodes on old data and messages etc.

There are a few other areas like this, quite a lot to consider. The testnets will be crucial to find such flaws, especially as they will be small and therefore weak networks.

Soon everyone will have the mushy head syndrome of a maidsafe developer :smile: we occasionally walk into door frames missing the opening completely and some of these things are why :smiley:


#4

Might be interesting for the network to automatically identify major outages and enter a stunned state when it happens. In those cases those huge data movements can be delayed, allowing more time for nodes to come back up before re-adjusting the data.

Although a worldwide power outage is unlikely (if it did happen, the integrity of computer networks would probably be the least of our concerns), city-wide blackouts are not unheard of.


#5

Yes we looked at a 10 minute backoff period. You know me I hate magic numbers (my enemy). A really good mechanism will be when we can get somewhere near an idea of network population (based on distance of nodes in your routing table) then we can do some very cool things. One of these would be recognition the network hash somehow shrunk (great for segmentation, hacking detection etc.). We have not had a network large enough to get a decent value of population, when we do the network anti hacking etc. will be transformed.


#6

This is an interesting idea, but i was thinking it can also open the network to attacks. If you can achieve to locally stun the network, it might be easier to spread the ‘stun’ and attack the network. This is a rather vague reasoning, but it is mainly floating on the idea that ‘the network is strongest in a dynamical, flowing state’ where it can react and adjust.

Maybe it should not so much be a ‘stunned’ state as a ‘self-defence’ state. It is still reactive, but it prioritises healing over serving. While I write this, I also feel we might regret this day when the network takes control over the world and tries to enslave the human race :wink:


#7

A good feature of the network is that it is organised in XOR-space, not in a geographical layout. So a regional breakdown, think hurricane Sandy or this 2003 blackout, might be a big readjustment to the network; but it could rearrange itself quite quickly - maybe, pure guessing here!

A more dangerous attack would be a coordinated attack in XOR space, where somehow an attacker was able to trace PMIDs to local machines, controls those enough to disrupt the network. Let it be clear that I don’t see how an attack could trace these PMIDs, but that doesn’t mean it’s not possible.