Step-by-step: the road to Fleming, 1: The Big Questions: SAFE Fleming and Beyond

Great! I like this different view of the project and where it’s going.
Can’t wait for the next installment and to see how the whole series rolls out.

16 Likes

I think there is a typo here. It should be:

Now we’ve given some context to the questions that we need to take account of on the Road to Beta. Whilst they may not be implemented today, each needs to be considered now in order to ensure that the system fits together in the most efficient way after Fleming is out in the wild.

Somebody correct me if I am wrong.

4 Likes

You are both right :smiley: Some of these things will not be in Fleming, but do have to be considered for Fleming. For instance we had 2 hangouts today and yesterday on some of the proposed solutions for these, but post Fleming to take them further. So yes these are road to Beta maybe in terms of doing them, but hopefully we can answer them all beforehand.

24 Likes

I hope you don’t plan for automatic updates? They are a big NO for me, simply for security reason.

Voluntary network shutdown for an upgrade could be another reason to restart a network but no, I don’t want Maidsafe implementing a kill switch.

If the network is really decentralized, then Maidsafe should have no powers to force updates or to stop the network. This limits the possible modifications to compatible ones, but this doesn’t block the evolution of the network as illustrated by the bitcoin network.

7 Likes

Hello :slight_smile:

Whether the peer upgrade automatically or manually, we will probably want a mechanism to provide updates. This is one aspect we were looking at.

We are more looking at the way the network can recover when heavily disrupted, as it could during a mass upgrade (depending on the process).

1 Like

In my considered opinion the network has to be able to function with at least previous version and current (new) version. Ideally it would work even if there are a few versions variations in the nodes.

To attempt to run all nodes on the one version, especially after releasing the new version, is folly and doomed to massive failure of the network.

  • it would only take one mistake of a certain kind and the network never recovers in a suitable way. It would require people to restart nodes with different version in an attempt to recover.
  • Common security is to NEVER accept an update without verifying its worth, security and viability first. No matter how good the automatic checking systems are. Some people may accept it and trust the “system”, but anyone who has had anything to do with network updates or security will not.
  • The logistics of trying to run only one version ever at a time is overwhelming and amounts to a network restart at a coordinated time. <---- This is in violation of one of the fundamentals concerning Time It would require the nodes/protocols to then have a respect to actual time in order to coordinate the restart time and cannot work in the real world.
  • etc

The best situation is to have upgrades tolerant to various versions. So the question should be “How many versions back can we support”

That requires updates to be written in a tolerant way, to specify for each upgrade what versions will not be supported anymore. And of course be at least many months of upgrades.

It is also possible for the network to be segmented for a long period due to a country being segregated and the chances of some data not being accessible is high. So the network needs to be able to recover that data once the nodes in that country return. (IIRC @tfa was one of those who showed this probably is very high). If the nodes are not tolerant of older versions (at least 6-12 months) then that data is likely to be lost since those returning nodes can no longer be a part of the network and permanent loss of perpetually of that data/files occurs.

16 Likes

Do you think versioning could work / use solutions (e.g. canary deployments) in the same way micro-services are updated?

100% agree with Neo.

As I watch some other decentralized network, if they are going to update to non compatibile version, first they release one or two version before to make it compatibile step by step. They always see if majority have latest version, so they can wait with release of each update to have first the latest proven as stabilised and about 90% ready for non compatible update.

2 Likes

That of course excludes the one single version concept for the whole network at any particular time since a subset is running the new version with the others running the old version running on the network.

It is though a method that could be used and David has suggested similar as a potential method.

It is great to see these challenges being openly considered.

There is the old adage: “Be conservative in what you send and liberal in what you accept.” This may well be relevant to the topic of upgrades. I agree with the consensus expressed in this thread that single-event migrations are not feasible or desirable in a decentralised network.

One possible option would be to consider a sliding window of inter-operable versions such that any peer running a particular version is permitted to fully inter-operate with any other node running a version within the permitted window. Extending this concept to include a wider sliding window that allowed nodes running older versions (up to a point) to continue to be part of the network without losing their identity, but which would only permit them be a receiver and not a transmitter, may facilitate a more graceful upgrade process with less opportunity for the network to become disjointed.

Using the concept of a sliding window would afford the potential that a distant part of the network (in terms of peer hop connectivity) could be running a version that is incompatible with the local (pioneer) neighbourhood - but be bridged by a majority of nodes running versions whose sliding windows overlap both the pioneer and legacy versions.

What will be the motivation for a node to upgrade? Will this be a stick or a carrot or both? It sounds like @Jean-Philippe is suggesting a potential stick of losing identity - so what might be the carrot?

How will upgrade releases be controlled? Can anyone initiate the injection of an upgrade? In a truly decentralised network then the answer probably should be “yes”. Will consensus be used to agree that an upgrade be accepted into the network and for it to be rolled out? Again, this brings my thoughts back to the need for incentives: both for the proposer node (of an accepted upgrade) as well as pioneer nodes whose administrators put in the effort to assess and validate an upgrade’s worth, stability, security etc. As @tfa stated: automatic upgrades are a big NO - and I suspect that this may well be the position of a large proportion of administrators of non-consumer operated nodes - then my thoughts come back to the incentives for testing and validation.

Will there be the concept of roll-back? I can see the desirability for this in more than once reason:

  • a proposed upgrade does not gain consensus
  • a previously accepted upgrade is found to be less beneficial (stable, performant, secure etc.) than the previous version after it has reached consensus (but who can have the authority to make this decision to roll back? - or should it just roll forwards?)

I guess the network will distribute its own upgrade code. If the network were to test and validate the code automatically and, by consensus, upgrade automatically then it would have the potential to autonomously evolve - that is if nodes were to propose upgrades without human intervention. (Perhaps the incentive for submitting and testing upgrades shouldn’t be financial!) This may even lead to the network evolving in a way that purposefully excludes groups of nodes and culls certain content in a manner that we can not conceive at this time.

10 Likes

PS. On the topic of node ageing and (trusted) elders - what is being considered to guard against the espionage concept of sleepers?
rfcs0045 leaves things a little open - even its mention of the potential to use blacklists raises questions about whether this could be used as an attack vector.

Consideration of the Tesla Model would be useful here:

  1. Roll out the update to a small control group first. In Tesla’s case this is car owners who work for them. Analyze the feedback and make any necessary changes after a short period of time then re-release to the control group if necessary.
  2. Then expand the rollout to a slightly larger group, analyze feedback then, finally, send to the entire network.
  3. Updates should be voluntary for a certain period of time. After that time has expired the client will be required to update or be “orphaned”.

Not sure if this “controlled” rollout would be feasible in the Safe Network but maybe a variation of it could be implemented.

1 Like

The punishment of sleepers nodes is covered in the Datachain RFC.

Routing must punish nodes ASAP on failure to transmit a Link NodeBlock on a churn event. Links will validate on majority, but routing will require to maintain security of the chain by ensuring all nodes participate effectively. These messages should be high priority.

1 Like

Thx - though we might be talking at cross purposes as the type of sleeper I was thinking about is more akin to the discussion in the section on Archive nodes in the Datachain RFC).

These more reliable nodes and will have a vote weight higher than a less capable node within a group. There will still require to be a majority of group members who agree on votes though, regardless of these high weighted nodes. This is to prevent attacks where nodes lasting for long periods in a group cannot collude via some out of band method such as publishing ID’s on a website and soliciting other nodes in the group to collude and attack that group.

If nodes can get in and gain a higher level of trust over time then they have the potential to be more disruptive in the future if they go rogue (esp. with collusion).

Routing must punish nodes ASAP on failure to transmit a Link NodeBlock on a churn event. Links will validate on majority, but routing will require to maintain security of the chain by ensuring all nodes participate effectively. These messages should be high priority.

Whilst it makes reference to “Routing must punish …” it doesn’t state the form of the punishment. Can you point me to where the consequences are documented?

P.S. The hypertext reference from the word NodeBlock in the section of frcs/0029-data-chains.md quoted is broken.

1 Like

Will it be possible for the network to provide the update, while the nodes are updating?

In case I’m not asking that well… I hope that the updates will be available on the network… Could the network sort of roll over from <50% nodes updated to >50% nodes updated, all on the fly, while providing/hosting the update?

If updates happen automatically, would it likely be staggered? Something like 20% of the nodes at a time?

Well, this is inherent in the concept of Node Ageing. Only those who have demonstrated, over time, their good behaviour reach the status of elders and participate in the consensus. Of course an elder has a greater capacity to harm but, also, is much more unlikely that it will.
In the end, preventing a enough number of evil elders colluding in the same section is what will ensure the network.

This part is under development and the punishment, as far as I know, has yet to be defined (removal, age reduction, warning,…)

2 Likes

Thanks a lot for all the feedback, I’m really happy to see it :slight_smile:

Definitively a lot of good points and we are keeping many of them in mind during our investigations.
With this overview post, I did not go into so much detail, in each aspects.

One thing I tried to do is to sketch the challenges and not really discuss the solutions we are considering as we are still in the process, but also because each topic likely warrant its own post. It is also great to see all the direction this conversation is taking, and bringing new thought to our team.

13 Likes

I’m a really on-technical, non-code literate sort of a person, but I read a bit about the subject and I have tried NixOS lately. I don’t know if this will have any bearing on the issues you are discussing, but I have a feeling it may be worth reading as you consider possible update mechanisms for the SAFE network.
https://nixos.org/nix/about.html

2 Likes

An interesting philosophical question would be to consider the following two positions:

A. in a decentralized network you trust no one
B. you have to trust someone to use their updates

Sounds paradoxical? The premise of update seems to suggest there has to be some central figure of trust - let’s say MaidSafe. But again, if we start to introduce this sort of requirement on who can trust, some niceties of decentralized network start to break down. If we stick to the principle and don’t do that, then it means anyone can try to advertise updates and the consensus algorithm needs to be able to figure out which update is ‘real’.

Another way of solving this paradox might be to build update as a mechanism outside the autonomous Network itself - e.g., you have to manually download the update program from somewhere, making sure it’s legit and then running it across your client apps. And of course this doesn’t sound very nice or safe.

2 Likes

OR we have to trust the network to tell us if the update is worth it. The update may come from someone we don’t know and therefore can’t trust but if the network can run the update in parallel, benchmark, test, then perhaps we can trust the networks decision. That is all easier said than done though :slightly_smiling_face:

3 Likes