Launch of a community safe network


#182

@bzee, it would, but alpha-2 version Vault won’t do UPnP requests by itself. So personally I specified a static port in crust config:

$ cat safe_vault.crust.config 
{
  "hard_coded_contacts": [
    "95.216.189.149:5483",
    "116.202.19.254:5483",
    "95.216.175.157:5483",
    "116.203.47.164:5483",
    "116.202.22.75:5483",
    "95.216.170.146:5483",
    "159.69.82.188:5483",
    "116.202.22.137:5483",
    "116.203.25.212:5483"
  ],
  "whitelisted_node_ips": null,
  "whitelisted_client_ips": null,
  "tcp_acceptor_port": 12345,
  "force_acceptor_port_in_ext_ep": true,
  "service_discovery_port": null,
  "bootstrap_cache_name": null,
  "network_name": "tfa",
  "dev": {
    "disable_external_reachability_requirement": false
  }
}

And then used pyigd to setup port forwarding rule: igd add -e 12345 in my case. Then router passes all incoming traffic with destination port 12345 to my home machine running the vault :slight_smile:


#183

I’ll gladly take the log files :slight_smile: but disclaimer, might not be able to pin-point accurately.

my suspicion would be some out of sync churn consensus handling without PARSEC, but yeh not sure just yet.

With logs at-least we’d know if these were post merge or …

well considering the node is in an unexpected state, yeh it may help to restart but again that’d be just a guess without knowing the rest of the state of the system if this node is an exception or the problem is larger.


#184

Thanks Viv for the explanation, sounds like a possible event of what could have happened. If I close the vault, will I be able to find a log files in the directory of the binaries? You only have to look at the logs Viv if you think it will be of help or interesting, as it is a older test-net then what you guys are working on now with Parsec and other things, then I don’t know the relevancy of the logs.

Just so you don’t spend a long time to figure out what the problem is if it isn’t of some kind of interest or importance. :slight_smile:


#185

In theory, there is a way to get the logs of a Docker vault (but inferring its health from them may not be obvious).

The procedure is not simple and I need some time to define it and test it (at worst I will do this this weekend).


#186

Thanks! It works on Odroid (HC2) as well :sunglasses:


#187

Could it be a problem of a disconnection and reconnection of a node? In this case we would have both node loss and node added events of the same node.

No, merges never took place in this network because version of both sections is 1 (section version is the number in parenthesis when prefixes are displayed in the galaxy)


#188

hmm you should be able to as long as they’re also logging to disk. Maybe extract it before restarting the node in-case the log files aren’t timestamped they would get replaced.

Nw :wink: It wouldn’t really be relevant but might help confirm a suspicion.

not in this case as this is specifically about the NodeLoss trigerring this node realising its not responsible for a chunk, just that sequence isnt expected regardless of the order but ofc NodeLoss before NodeAdded itself shouldn’t be a valid event sequence passed from routing anyways. Just std NodeAdded & NodeLoss of the same node is just normal churn.

:+1:


#189

Are you sure of that?

This reminds me of an old problem in test network 16. A correction is mentioned at the end of the topic, but I don’t know what has been done about it.

I don’t store node ids of current community network, I just display them in the web app and I don’t store them in files or in a database. So, I can’t prove it, but I have seen nodes that have disappeared and reappeared at the same place in the galaxy.


#190

Thanks, I hadn’t looked into switching the port yet. I just tried by changing the port to 5484 but running the node still seems to block any other device from connecting to the alpha network. I just get the ‘Could not connect to the SAFE Network’ message (e.g. in the SAFE browser).


#191

If you’d like me to help you debug connectivity issues, DM me :wink:


#192

One VPS is very unstable and I had to restart it several times in the last two weeks and, sometimes, it seemed to me that it was reappearing in the same address.
In fact, if I’m not confused which is possible, I even remember seeing a message from the Vault that said so.


#193

Yep, in terms of not being a valid/expected event sequence as in seeing a NodeLoss event for a given node X before we see a corresponding NodeAdded for X before this event. I can see the confusion when taken further with restarts, then yeh we could see a NodeLoss followed by NodeAdded, but still the first NodeLoss should be expected to have a NodeAdded before that if I’m making any sense :expressionless: . The linked issue and the PR from the comment introduced Peer::valid which does exist in master and that exists in the alpha-2 branch too. It’s been deprecated now in the fleming branch tho cos of the Chain ofc. That state also ensured NodeLost wasnt triggered for peers we didnt have consensus of as a valid peer.

could be a re-connect after connection loss like what Andreas mentioned in the linked thread? or a bug ofc, just aint intentional, we dont have node restart features with identity caches across sessions yet until node ageing would bring that on ideally to reclaim age(well portion of)…


#194

Do you happen to have the logs for the two runs by any chance? I’m wondering if we do print the full identity anywhere in logs tho.


#195

@bzee I discussed with Spandan and double checked the code and turns out that Vault actually does IGD/UPnP automatically :slight_smile:


#196

yes just NAT Traversal is disabled by default as alpha-2 was on Droplets but the direct-listener/tcp-acceptor itself will try and get an IGD mapped port if UPnP is enabled and router’s fine (some routers misbehave/don’t support).

Somewhere here


#197

Here comes an update. :slight_smile: Last night I checked the log as I was about to restart the the system and send the logs too you. :slight_smile: When I checked the log in the command window it seems like it maybe recovered and might be alive again. So I let it run through the night and report my findings before shutting down and restart the vault. If you want to then I can shut it down and send the logs, but didn’t want to do it last night because I thought I should let you know before I take action.

When I checked to day then there is many “lost nodes events” but it also shows similar to the log I saw last night. How do I see which name my vault represents in the log?


#198

This might be the same issue I encountered and reported to Spandan and Povilas - when Crust bootstraps, it first finds the vault on the LAN and tries to use it to connect to the network, but if you are trying to connect to the Alpha 2 network, the expected network name doesn’t match the name the vault reports, which makes it error out and stop trying. I believe it could still work if it kept trying with the bootstrap cache and hardcoded contacts, but the Crust version in use doesn’t do that.

Spandan or Povilas would know better, but I think they already fixed that in latest versions of Crust.


#199

You cannot miss it, because it appears at the beginning of almost all messages. For example in your last screen shot it is 3a879f.


#200

Thanks, that gave me peace of mind. :slight_smile:


#201

I think now that series of messages like "Moved out of close group of b4777e.. in a NodeLost event" are completely normal because such a series always appears 10 or 20 ms after a message like "Dropped b4777e.. from the routing table".

It simply means that the referenced node (here b4777e) is not responsible of data chunks anymore because it has been dropped.

So, IMO everything is normal.