Step-by-step: the road to Fleming, 5: Network upgrades

Liking the fast upgrade approach, it is so @maidsafe. We take the thing that brought down skype, consider it carefully and craft that weakness into a strength by having network nodes design for such “catastrophic” events. It feels just right, natural and very powerful as an initial mechanism for upgrades. Kudos routing team :smiley:

18 Likes

But how do you want to achieve that? The “proposed solution” seem to be pretty centralized (signed binaries; other peers drop the connection if you’r not using a “certified” version; …).

Could this be a platform independent wasm file? That get’s executed via eg wasmtime?

4 Likes

Just some intitial thoughts before the wizards come, except @dirvine, who already is here. :wink::grinning:
In my mind I see like 5% of a section being allowed to upgrade, then the section verifies the upgrade, then continues to upgrade the next 5% and so on. The nodes with least amount of responsibility would be upgraded first, When half of a section has been upgraded than it would speed up and upgrade like 30% of remaining nodes at the time. Important that nodes keep a copy of the older software version until it is verified that the new one works, on critical error they would default back to the older version and then retry upgrading.

1 Like

But how do you want to achieve that? The “proposed solution” seem to be pretty centralized (signed binaries; other peers drop the connection if you’r not using a “certified” version; …).

The idea is to start with a solution that works technically without worrying about the ideals, so we can deliver Fleming where upgrades are not the focus: resilient routing with Parsec + Node ageing to enable vaults from home is the focus of this one.

With any mechanism to allow upgrades, we can then go ahead and upgrade to a better upgrade mechanism once we’ve put the appropriate amount of effort into designing one that meets our standards.

So really, this post is more about having a high level idea of the challenges we’ll be facing and proposing a “good enough for a start” solution :smile:

19 Likes

Yeah I think for an initial mechanism that would be sufficient. For post Fleming I think this should include some kind of P2P voting where peers can vote if they would like to accept an upgrade or not.

3 Likes

There was recently discussion about same issue here:

https://safenetforum.org/t/step-by-step-the-road-to-fleming-1-the-big-questions-safe-fleming-and-beyond/27560/7?u=mendrit

This approach makes sense, and I couldn’t agree more.

For the finalized upgrades approach though, I think it’ll have to be the “slow” options. In some situations, slow change is good. But for the short term Fleming target, as you so well put it, the fast approach is optimal.

2 Likes

On the fly network upgrades seems like a really challenging task to me. How will you ensure decentralization? Who decides what to upgrade and when in the network? Doesn’t that require some form of centralized authority? If not, how do you prevent attackers from upgrading the network with malicious intent?

1 Like

Initially it is centralised as only us really are coding and providing the binaries.We also sign for security to make sure it is us. That is not decentralised and there is a lot we can do like reproducible builds, stuff like musl really helps there (easy change), but that needs a lot more teams/indi devs working on core. I think that will happen naturally, especially with dev rewards.

For now we are pushing for speed, so the network launches, gets devs working on it and more. So initially the network upgrade check is, is the binary signed by maidsafe? if so then upgrade.

Later it should not do that, but allow decentralised devs to update work and for that to be agreed, by farmers upgrading (like bitcoin etc.). Even better though would be network health checks on any upgrades. That will require a wee bit of AI (I think) and some move towards formal verification. Use of bulletproofs and snarks for proving correct execution of events and such like also will become prominent.

So yes initially centralised dev (it is) and work towards more indi devs, but with formal proof of the upgrade hopefully made by the network as the ultimate goal, poss intermediary steps like we see in other projects in the middle.

19 Likes

My guess is that it will be difficult to remove an upgrade function once it has been introduced. I think it’s better to do upgrades of the network up to and including beta versions, and after that the production version should be set in stone. Otherwise big companies and even governments and lots of individuals would think of the network as shaky as it is then susceptible to changes into who knows what. Nothing other than true decentralization and fixed standard will work in the long run.

1 Like

I see what you did there😊

2 Likes

For every single decentralised project. We are not alone.

2 Likes

You are alone in the sense of the scope of your project. Bitcoin minus its politburo of control is a decentralized system, but the SAFE network is a much bigger project. Ethereum is only semi-decentralized IMO since they need some authorities to change its specifications to deal with performance issues etc. Not good.

4 Likes

I mean no decentralised project of any size is currently decentralised in it’s upgrade process. We have identified that as an issue with potential solutions, but right now we are pushing ahead with stage 1 of that. So when I say we are not alone I do mean all decentralised projects need to solve many issues, upgrades being one of them. Centralised data structures being another (like the blockchain) as well as a few other niggles. However upgrades are vital as they alone define the future of any codebase and this is where I believe we are not alone in having to solve that. I do feel we are alone in seeing it as a major issue that needs solved properly and to me that means without human involvement in the decision process of what do we upgrade to, not only how do we upgrade (say a security fix).

This is stage 1. Reproducible builds etc. could be stage 2 (still not decentralised though) and for me stage 3 means the network evaluates any upgrade and it alone decides on acceptance of that upgrade.

some projects try governance, but that always IMO leads to centralised control, so I feel we must go a different path and that path has to be more automatic. That is not simple, say the network can evaluate completely and upgrade as improving it. Then it needs to also decide on a new feature? Will people like that feature? do they want it? and so on. These are questions the network actually can answer IMO but the complexity there will be higher than a human dev can calculate, that is where I think we do have to lean on some AI (neuroevolution or SGD type thing)

So regardless of the scope of a project, if it is to be truly decentralised then I think it ends up in the same place, trust humans and trust them not to collude and control, or don’t. However the problem to be solved is the same, regardless of scope.

16 Likes

Wow. That’s pretty ambitious. I like it! And I came to think that Maidsafe can keep full control over the SAFE network during the entire Beta phase to ensure stability, which could be several years as the network grows and increasingly becomes used for real projects and with real farmers.

5 Likes

Thanks for that :+1 I think for us to declare SAFE a success is the usual usage, user uptake, saving data in apparent perpetuity (we need infinity to prove perpetual :slight_smile: ) etc. but most importantly when we MaidSafe are not required and really that has to be a big priority after we prove the rest of the goals of truly decentralised networking. That last part could take a while, but I sincerely hope not. During that phase though openness at least will help a lot and that will mean indi devs quickly to openly debate all code going into the system.

8 Likes

And BFU will act like political parties ? 95% will not understand so they will trust somebody else who understand well ? It is delegated voting than.

After the beta release not only open source developers will be interested in the details and be able to come up with proposals for upgrades but also big tech companies like Google and Huawei! Because with 5G and IoT the tech companies will need some commonly agreed upon platform to run their solutions on, and the SAFE network is a prime candidate for that.

2 Likes

Deepmind surely seems super bored, maybe they would be interested in a different game :wink:

Another nice read :clap::clap::clap:

:stuck_out_tongue:

1 Like

Question I would be asking is how many times are you going to be changing any of the actual protocols? Bug fixes to protocol does not usually require any changes to the protocol version since it must have been working well enough previously. If like for tcp/ip then 4 bits would be enough, but if you expect regular changes then maybe 16 bits would be best as really you do not want to be returning to zero if at all possible. Cause that will bite you hard one day. Well maybe not you but your children or later programmers.

Most definitely and even to the point of knowing if the peer can accept the later version of that protocol. Maybe in the handshake you send the highest & lowest version you can handle. And so the 2 peers talk in the highest that each recognises.
And later version can remove code for lower versions that are no longer in the network.

Obviously this may end up sometimes with an elder or farmer unable to function if they “never” upgrade and some features of the new protocol are essential. I guess at that stage the upgraded nodes would not accept those many versions old packets/messages.

You will potentially have a collapse of the network on each upgrade. And probability of one (any one) section failing is way too high.

You need to support the previous version of the protocols no matter the method of upgrade. And I’d suggest supporting more than one version of the protocols if the changes are more frequent. This is extremely important since restarts of nodes can be after a significant time period (eg a large block of the internet is segmented by cable cuts or government) You definitely need to support the previous version and any versions less than 6 or 12 months old (excepting a seriously faulty one)

Ah you recognise it too. Restarts can also be from other things than an upgrade and after a period of time too as happens with a cutoff block of the internet.


Here is a suggestion.

Since you are storing the state of the node in case a restart of the node s/w is needed and you want a seamless cutover then do what is done in the power industry when spinning down a generator and replacing it with another. You have both generators spun up and synchronise them then remove the generator you wanted spun down when the voltage/current is crossing the zero line.

This translated in terms of the nodes is

  • current node is running
  • An upgrade available message propagates through the network with details of location, checksum, authentication etc
  • current node initiates a download of that software which includes an install script.
    • the installation uses a version specific directory so as not to interfere with running node.
    • the state is kept in another directory so as it does not live in the node s/w directory
  • The current node verifies the new version using the details that is in the update messages.
  • The current node starts the new version in a special idle state
    • the new node is not communicating but initialises itself ready to start
    • it is reading the current state so its state matches the current node’s state
  • Once the current node receives a signal from the new node that it has synchronised then it waits for a suitable moment
  • At the moment current node determines that it can hand over operations to the new node it signals the new node to take over
    • the current node does no more communicating with the other nodes on the network
    • the new node now does the communications to the network
  • At this point you could get creative and have the old node watch the new node to see if it continues to function.
    • If the new node dies or does some unexpected behaviour then the now previous node could kill -9 the new node and resume operations.

EDITS: fix my (engineer) bad grammer & speeling, still not perfect but hopefully readable.

17 Likes