Step-by-step: the road to Fleming, 5: Network upgrades

Question I would be asking is how many times are you going to be changing any of the actual protocols? Bug fixes to protocol does not usually require any changes to the protocol version since it must have been working well enough previously. If like for tcp/ip then 4 bits would be enough, but if you expect regular changes then maybe 16 bits would be best as really you do not want to be returning to zero if at all possible. Cause that will bite you hard one day. Well maybe not you but your children or later programmers.

Most definitely and even to the point of knowing if the peer can accept the later version of that protocol. Maybe in the handshake you send the highest & lowest version you can handle. And so the 2 peers talk in the highest that each recognises.
And later version can remove code for lower versions that are no longer in the network.

Obviously this may end up sometimes with an elder or farmer unable to function if they “never” upgrade and some features of the new protocol are essential. I guess at that stage the upgraded nodes would not accept those many versions old packets/messages.

You will potentially have a collapse of the network on each upgrade. And probability of one (any one) section failing is way too high.

You need to support the previous version of the protocols no matter the method of upgrade. And I’d suggest supporting more than one version of the protocols if the changes are more frequent. This is extremely important since restarts of nodes can be after a significant time period (eg a large block of the internet is segmented by cable cuts or government) You definitely need to support the previous version and any versions less than 6 or 12 months old (excepting a seriously faulty one)

Ah you recognise it too. Restarts can also be from other things than an upgrade and after a period of time too as happens with a cutoff block of the internet.


Here is a suggestion.

Since you are storing the state of the node in case a restart of the node s/w is needed and you want a seamless cutover then do what is done in the power industry when spinning down a generator and replacing it with another. You have both generators spun up and synchronise them then remove the generator you wanted spun down when the voltage/current is crossing the zero line.

This translated in terms of the nodes is

  • current node is running
  • An upgrade available message propagates through the network with details of location, checksum, authentication etc
  • current node initiates a download of that software which includes an install script.
    • the installation uses a version specific directory so as not to interfere with running node.
    • the state is kept in another directory so as it does not live in the node s/w directory
  • The current node verifies the new version using the details that is in the update messages.
  • The current node starts the new version in a special idle state
    • the new node is not communicating but initialises itself ready to start
    • it is reading the current state so its state matches the current node’s state
  • Once the current node receives a signal from the new node that it has synchronised then it waits for a suitable moment
  • At the moment current node determines that it can hand over operations to the new node it signals the new node to take over
    • the current node does no more communicating with the other nodes on the network
    • the new node now does the communications to the network
  • At this point you could get creative and have the old node watch the new node to see if it continues to function.
    • If the new node dies or does some unexpected behaviour then the now previous node could kill -9 the new node and resume operations.

EDITS: fix my (engineer) bad grammer & speeling, still not perfect but hopefully readable.

17 Likes

Some thoughts. Sorry in advance if they are already answered (partly) in the replies above, which I read rather quickly.

  • You can provide all the necessary tools/techniques to make the upgrade process as decentralized as possible theoretically. Naturally, you want to have 1 version of the software active/running for a specific ‘operation instance’, otherwise chances that it won’t work are high. That has to come from 1 ‘source’ ( == centralized). The next update doesn’t have to come from that same source. But I’m guessing a lot of vault owners just want the thing to work and trust a certain party (Maidsafe in the beginning) to take care of the really necessary updates automatically. Of course you should provide the possibility for a vault owner to decide which updates he wants/doesn’t want. The default update procedure delivered to most of the vault owners has a great impact I think. I’m thinking of organ donation when you die as an example: opt out versus opt in. The default can change over time, but the initial choice has an impact.
  • At least the meaning of the value of the version byte number has to be ‘centralized’?
    Edit: if you want that decentralized, is e.g. a an array of at least 2 (current and proposed version) hashes, identifying the version, a better idea?
  • What about multiple non related update proposals at once. I think about something like Segwit and 1->2 Mb block change with Bitcoin a while back. e.g. value version byte = 20 for the current version, 21: only update A, 22 only update B and 23 both updates? What if after a certain time proposal A made it and proposal B didn’t (and when is that moment?)? The vaults with version byte = 23 have to go to the lower 21 or they are kicked out of the network?
  • The moment an update becomes active: is it on vault, section or complete network level? Or is there a choice between (some of) these 3 options, depending on the impact of the update?
1 Like

I was thinking about exactly this just a few days ago. I started with the idea that we can’t rely on any central authority, so all of the checks must be done independently.

I imagined something like this: When a new version comes out, vaults start a second instance with the new code and start running tests on it from one of the large and ever growing independent test case repositories from the network. (Vaults hosted by friends could share the results and thus make the process faster, but it’s just an idea.)

After the new vault passed the tests, it would be switched to shadow the old version of the vault for some time: it would receive the same messages, use the same random seeds*, and it would have to return the same results except when it’s specified (how?) that it needs to return something else, such as the version number and new behavior that was just implemented.

After some time, when the vault decided that the new code is sufficiently reliable, it would just flip the switch and start putting the output from the new code on the wire, simple as that. We could even keep the old code around for some time, running in the background parallel to the new code, just in case.

* I’m not sure how it’s done right now, but I suggest each thing that needs randomness should have its separately seeded random generator or else the “shadowing” I’m describing won’t be possible.

This belongs to the type of things that may end up being huge pain in the ass down the road when somebody discovers an (obviously unexpected) way to exploit them. Make it 4 bytes and sleep in peace.

Even better, make it major/minor/patch (2+1+1?) version bytes, as usual, with full API-level compatibility between patch-level versions and downward compatibility between minor versions. I mean, why reinvent the wheel?

Only major versions could introduce breaking features (“hard forks”). There are benefits to mark them in an unambiguous way. The updater could automatically update patch-level and minor versions (performing rigorous testing for full compatibility), but it would have to ask the user for permission for major versions.

4 Likes

This could use the sandboxing ability of wasm. So we would have a “root runner”, that is offering basic functionality to an signed wasm executable, which would contain the actual code for the vault. The runner could then do a seamless switchover from the old executable to the new one. Maybe even do the evaluation (running it in a sandboxed test network).

That’s racist! Use at least the non-racist form: ni :b: :b: les

You use conditional tense, so why take the risk?

Exactly! Only 3 bytes more, but a standard and cleaner solution.

5 Likes

Maybe I’m not understanding something, but how can such an increasing version numbering work in a decentralized way?
See git and mercural versioning for example: Why does Git use SHA-1 as version numbers? - Stack Overflow.

2 Likes

Decentralization doesn’t mean heterogeneity. Unless there’s a common protocol (thus the simple linear version numbering), the network can’t function anymore than a group of people can have a meaningful conversation without speaking the same language and using the same terminology.

But you highlight a valid problem. How can multiple groups develop new features (not apps, core changes) parallel to each other? One adds feature A, the other adds feature B, they both roll out their new version with the next available version number (that is, the same for both). Predictably chaotic.

I doubt hash based versioning would be the solution. I may well be wrong on this, but multitudes of commit-level forks fighting for acceptance continuously isn’t a pretty picture. I prefer the rare update with long uneventful periods before and after.

Decentralization or not, we’ll need coordination between the different future groups of core devs or else we’ll have several, if not many, incompatible forks, each with its hard core community of believers just like bitcoin, bitcoin cash, bitcoin fake, and so on. At least nobody will be able to pretend they are the real David Irvine…

8 Likes

I also assume some level of coordination is required and multiple branches like possible in Git is not really applicable here. So using a hash version number to have an unique identifier is probably overkill. But maybe it helps to have something more than 1 byte (with 2 bytes you have already ~64k instead of 255 versions you can pick from) so that the chances are smaller to have 2 parallel updates with the same version number.
It is not explicitely mentioned in the OP, but the assumption is here that there can be gaps between the version number of consecutive accepted updates.

This is one of the biggest downsides about automatic updates. It’s not just a theoretical vulnerability. This happened recently to users of Asus motherboards - malware was signed by Asus’ key and delivered to users through the automatic update mechanism.

I generally think that automatic updates should be an opt-in feature (if at all available). For users who don’t opt in, a notification of an available update should be fine. It means that new version releases will roll out more slowly, but I don’t think that’s a bad thing. Notifications for security updates can be made to look different (more urgent) than update notifications for features/improvements releases.

4 Likes

Agreed

BTW I opt out of ASUS auto updates every time I install a asus motherboard (or any other). Thanks for the warning.

1 Like

They should not exist at all, in my opinion. Automatic updates allow a path for a government to pressure a development team into sabotaging the network. The bitcoin update process seems fine. Post an update, have a big argument about whether to implement it, and finally some percentage of people choose to implement it or not. It’s messy, but it minimizes the possibility of somebody taking down the network by threatening the dev team(s). It also protects the dev team from such threats, since it will be obvious that they won’t be effective.

5 Likes

I was surprised to find that IPFS is now decentralized. When I first learned about IPFS, users had to choose specific nodes to store their data on. That’s semi-centralized IMO. However the pay-for version of IPFS is still like that it seems, where users have to choose which node to store their data on and pay every month forever, like a cloud service, so that still sucks.

What people use most is the simple and free IPFS version. Similarly, the SAFE network could maybe have a simple and fixed protocol as a foundation, and then add Safecoin and Auto upgrades as layers on top of the changeless foundation with a solid specification.

Without a simple-as-possible foundation, the risk is that the SAFE network becomes unmanageable bloatware, with ever growing complexity.

One solution could be this: the last update resides safely on the update centre on the network, protected from changes. When the new update is ready for release, it is saved in the update centre but not available yet. Then a build in script extracts all the code that is different from the previous ver. and posts that public. The community then validates this new code and vote an yes or no to as to accept or reject it. If accepted, the update is released and anybody can safely choose to update automatically or not.

3 Likes

Auto upgrades pose a small but real security/stability risk. Also, I don’t see a need for the SAFE network to be free (to perform PUT operations on). Affordable, sure. IMO it doesn’t need to be free-as-in-beer, but definitely free-as-in-speech. Also, I don’t expect the underlying network protocols to change so rapidly/drastically as to necessitate frequent updates. A lot of hard and time-consuming work goes into the underlying code/design. If it takes the majority of the vault node operators 1 week to click “Yes” to an update prompt (or even 1 month), I believe that is not a huge hindrance to development. Quality of updates over quantity of updates and all.

That seems a tad complicated. It shouldn’t matter what version/flavor vault operators choose to run. The software needs to be resilient against old versions, malicious versions, etc. Forks might happen in the future. That’s not necessarily a bad thing, but forks ought to be able to exist/work at the same time, even if they don’t talk to each-other.

I would be happy having updates delivered with all my other system updates (I’m a Linux user). I get that it is a bit more painful for Windows users (and Mac users who install software outside of the App Store). With Linux distros, updates are released continuously at random times and added to the update queue. Periodically, when I find it convenient, I’ll hit the button to let all the updates apply (or uncheck some, if I have a reason to delay any particular updates).

2 Likes

You could be correct. IPFS doesn’t guarantee free permanent storage. For that Filecoin is needed or people will have to run their own IPFS nodes. So truly permanent storage on IPFS is not free yet I admit, but I wonder how long it will take for IPFS to become a truly free storage considering its huge popularity. So how could the SAFE network compete with the possibility that IPFS becomes truly free?

1 Like

I’m not sure the upgrade process needs to be as complicated and automatic.

Can vaults not just have a list of one or more repositories, then alert the vault owner in some way to initiate the update? A repository could just be a safe URL and the file names could just be versioned by date. The vault would then just check the repos once a day or some such and alert (and maybe download) the update.

Tbh, I suppose I am just thinking about how this works in the linux world, using yum or apt. Storing the repositories on safe network would add a layer of security, but the concept seems pretty similar.

Perhaps efforts would be better spent to leverage these existing systems? Adding safe support to yum and apt could potentially be straightforward (just a different transfer protocol really) and it would work in the same standard way.

Obviously, the need to restart the vault seamlessly remains, but vaults are likely to upgrade at different times around the world anyway.

1 Like

An automatic update improvement could be to enforce a notification for a ‘to be installed’ update and an enforced delay (2 weeks) before automatic install, if no manual refusal happened before that time.

Thank you very much for all the feedback, this is definitely bringing a lot of good things to light :slight_smile:

One thing though, the very first version of the upgrade is primarily to support fast development of new features to reach feature complete.

This has very different constraint from the final version of the network. As you suggest, after this phase, once things have become more stable, breaking changes and updates would be less frequent. It would also become sensible to support backward compatibility for longer.

When we are reaching that point all the considerations you are highlighting will take center stage. But during initial development phase, we will want as early feedback as possible, and reduce as much as possible the cost of making change for each TestNet versions.

16 Likes

@Jean-Philippe, that makes sense

4 Likes

My first thoughts may be too simplistic or obvious or duplicating and rehashing things already mentioned, but I’ll risk it: The portion of code (tests) that defines “what” the network must do in the most critical and valuable areas such as protecting data and access to data forever, must be set in stone. “How” the “what” should be realized most efficiently, over time adapting to a changing environment, could then perhaps be left up to any developer, and the network which would accept or reject upgrades of the “how”. Even in case of a fork in the “how” code, it would be good if the two (or more) different versions would still be forced to (co)operate on the same data so that forks in the code would never result in splits of the body of data stored or accessible through each network version. The reason for setting core parts in stone is that no AI will be able to judge what upgrades will be in best interest to human beings under future circumstances that we cannot predict today. No idea what portion of Safenet should be considered holy immutable ground but at least some should be fixed to avoid the need of transfering data from one safenet to another, or having to make safenet snapshots or backups to be able to recover from a series of future upgrades that turn out to be malicious or just disastrous.
The US constitution and the Supreme Court may be a helpful analogy to above fixed tests and upgrades. Is it possible to isolate a minimal portion of safenet, and code a constitution for it that could stand the test of time and survive all imaginable exploits by generalized AI? Or alternatively, is it possible to replace a Safenet Supreme Court by a truly distributed alternative that could never be manipulated into destruction of Safenet?