[Offline] Fleming Testnet v6.2 Release - Node Support

Create keys is hanging for me :confused:

Trying then

$ $HOME/.safe/node/sn_node --public-addr 82.41.99.238:3047 --local-addr 192.168.0.15:3047 --hard-coded-contacts=[\"159.65.27.95:56248\",\"209.97.176.93:57677\",\"209.97.176.123:55911\",\"46.101.6.111:40836\",\"209.97.140.34:59824\",\"209.97.176.138:59795\",\"206.189.125.78:60401\",\"209.97.176.127:55047\",\"209.97.176.134:53102\",\"206.189.123.106:43685\",\"209.97.138.128:59206\",\"209.97.138.80:34447\",\"209.97.141.179:39630\",\"139.59.178.75:57133\",\"209.97.133.247:48201\",\"46.101.95.158:12000\",\"209.97.143.117:42097\",\"206.189.127.108:51563\",\"209.97.133.206:38243\",\"209.97.176.145:57380\",\"209.97.176.100:54768\",\"209.97.131.130:35410\",\"209.97.176.91:33956\",\"209.97.133.65:40289\",\"209.97.176.97:60599\",\"209.97.176.98:39100\",\"167.99.198.192:56527\",\"209.97.131.245:49877\",\"209.97.129.240:48304\",\"209.97.141.103:52713\",\"159.65.94.13:57349\",\"209.97.176.99:50115\",\"178.128.40.158:42845\"]

gives something new as error detail to the terminal

[sn_node] ERROR 2021-06-30T13:22:39.623148681+01:00 [src/routing/core/bootstrap/join.rs:422] Failed to send message Routing { msg: RoutingMsg { id: 4960c326.., src: Node { public_key: PublicKey(CompressedEdwardsY: [30, 193, 189, 69, 135, 11, 40, 244, 46, 146, 180, 191, 70, 208, 130, 36, 240, 253, 146, 122, 172, 51, 187, 163, 186, 191, 181, 231, 231, 182, 102, 5]), EdwardsPoint{
	X: FieldElement51([1339626378285353, 1422257739952121, 1773412462988668, 854700734940154, 4948749485183]),
	Y: FieldElement51([1737646033162753, 1329396263222078, 2186462723071741, 911451935725592, 1563157389875352]),
	Z: FieldElement51([1819641264889322, 1946439798226173, 1997465299890218, 1178809842119306, 1667648634849677]),
	T: FieldElement51([1558138978118476, 378888274307812, 1693945987817623, 882624393770689, 1206782704585459])
}), signature: ed25519::Signature([151, 109, 105, 198, 242, 196, 151, 65, 66, 193, 164, 38, 111, 70, 21, 34, 102, 215, 196, 127, 1, 13, 167, 184, 6, 239, 21, 64, 209, 255, 237, 127, 0, 121, 74, 179, 189, 128, 215, 132, 104, 185, 175, 227, 36, 82, 221, 252, 118, 181, 38, 190, 12, 34, 234, 24, 167, 21, 186, 130, 136, 152, 253, 6]) }, dst: DirectAndUnrouted, variant: JoinRequest(JoinRequest { section_key: PublicKey(02b8..ecf7), resource_proof_response: None }) }, dst_info: DstInfo { dst: 0b1002(00001011).., dst_section_pk: PublicKey(02b8..ecf7) } } to [209.97.176.99:50115, 209.97.176.134:53102, 206.189.125.78:60401, 209.97.138.80:34447, 46.101.95.158:12000, 209.97.140.34:59824, 209.97.176.100:54768]

and then it’s just

Encountered a timeout while trying to join the network. Retrying after 3 minutes.

but the log file is not looking more useful - nothing of that above put to the log??..

$ tail -f $HOME/.safe/node/local-node/sn_node_rCURRENT.log
[sn_node] INFO 2021-06-30T13:16:18.507154556+01:00 [src/node/bin/sn_node.rs:116] 

Running safe_network v0.4.0
===========================

One trivial diff I note the process detail suggests as if the flag for --hard-coded-contacts does not need an equals sign - expect that is nothing though.

ps aux | grep ‘node’
safe 6533 1.0 0.1 54168 20148 pts/4 Sl 13:16 0:00 /home/safe/.safe/node/sn_node -vv --hard-coded-contacts [“159.65.27.95:56248”,“209.97.176.93:57677”,“209.97.176.123:55911”,“46.101.6.111:40836”,“209.97.140.34:59824”,“209.97.176.138:59795”,“206.189.125.78:60401”,“209.97.176.127:55047”,“209.97.176.134:53102”,“206.189.123.106:43685”,“209.97.138.128:59206”,“209.97.138.80:34447”,“209.97.141.179:39630”,“139.59.178.75:57133”,“209.97.133.247:48201”,“46.101.95.158:12000”,“209.97.143.117:42097”,“206.189.127.108:51563”,“209.97.133.206:38243”,“209.97.176.145:57380”,“209.97.176.100:54768”,“209.97.131.130:35410”,“209.97.176.91:33956”,“209.97.133.65:40289”,“209.97.176.97:60599”,“209.97.176.98:39100”,“167.99.198.192:56527”,“209.97.131.245:49877”,“209.97.129.240:48304”,“209.97.141.103:52713”,“159.65.94.13:57349”,“209.97.176.99:50115”,“178.128.40.158:42845”] --root-dir /home/safe/.safe/node/local-node --log-dir /home/safe/.safe/node/local-node

2 Likes

Safe keys balance request is hanging too,
related I guess ?

On the other side, my node has joined the network, received a single chunk, then end with this message :
Failed to read incoming message on bi-stream for peer 209.97.140.34:59824 with error: TimedOut

3 Likes

Yay… I don’t know what it means but I’ve got

Node PID: 6749, prefix: Prefix(1), name: f2f2cf.., connection info:
"82.41.99.238:3047"

oddly in the terminal and not on the log file though…

3 Likes

I had this recently on my cloud instance.

I’ve updated vdash (v0.6.3) and it shows both PUTs and GETs so long as you start the node with sufficient logging as follows:

RUST_LOG=info,quinn=off safe node join

To update and start vdash:

cargo install vdash
vdash ~/.safe/node/local-node/sn_node_rCURRENT.log

Zoom the timelines in and out using ‘i’ and ‘o’. To quit press ‘q’.

Example:

8 Likes

In parallel then I get nothing obviously occurring for
safe keys balance
or an upload… and node terminal and log not busy - perhaps as there are no keys no uploads and then no activity is expected…
Will leave what i take to be my node, up for a while yet and see what occurs…

1 Like

and so key create just failed with

 safe keys create --test-coins --for-cli
[qp2p::api] ERROR 2021-06-30T13:45:29.996830035 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/api.rs:266] Failed to bootstrap to the network: Connection lost
Error: Failed to connect: ConnectionError: Failed to connect to the SAFE Network: QuicP2p(BootstrapFailure)

and the other for balance and upload failed just now with the same.
?Is network down now then or just a long wait to error?

Noting two many threads!..

1 Like

I see you use advanced comands and not the OP. Perhaps you don’t have the correct contacts (I see you pass some direct on command line when trying node? so I am trying to follow your steps, but not clear). Can you try

- rm -rf $HOME/.safe
-  curl -so- https://sn-api.s3.amazonaws.com/install.sh | bash
-  safe node killall
- safe networks add fleming-testnet https://sn-node.s3.eu-west-2.amazonaws.com/config/node_connection_info.config
- safe networks switch fleming-testnet

Then try to use cli and ignore node for now?

After the initial burst of 16 PUTs (200M which seems odd) and 8 GETs my cloud node shows no activity for over 90 minutes.

Also my recursive put on the same instance has stalled since about 30 minutes ago. Last pair of messages in the log (of many similar message pairs):

[safe_network::client::connections::messaging] WARN 2021-06-30T13:45:04.027291757 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/safe_network-0.4.0/src/client/connections/messaging.rs:58] Disconnected from elder 209.97.133.206:38243. Attempting to reconnect
[safe_network::client::connections::messaging] WARN 2021-06-30T13:46:04.028908251 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/safe_network-0.4.0/src/client/connections/messaging.rs:65] Could not reconnect to 209.97.133.206:38243, error: Connection(TimedOut)
1 Like

I did the usual simple safe node join first but that didn’t seem to work; so, just adopted what had been suggested previously for that

There’s a bit of spot the difference between each OP; so, just spotting a difference above that this now doesn’t have the --hard-coded-contacts but had taken those from the copy of addresses from the ps was using… and just checking it’s the same list as the current node_connection_info.config

So, trying again now - both safe keys create --test-coins --for-cli and safe files put --recursive ./hello/ look hung… but then is the network up?

3 Likes

think we are down administrator@Test-Node:~/.safe$ safe keys create --test-coins --for-cli [qp2p::api] ERROR 2021-06-30T15:23:42.068549617 [/home/administrator/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/api.rs:266] Failed to bootstrap to the network: Connection lost Error: Failed to connect: ConnectionError: Failed to connect to the SAFE Network: QuicP2p(BootstrapFailure) administrator@Test-Node:~/.safe$

4 Likes

Great work team, getting this up and running fast!

For me I can’t install a node I get:

Error: No asset found in latest release for the target platform x86_64-unknown-linux-musl

System:

3.7 GiB
Intel® Celeron(R) CPU J3455 @ 1.50GHz × 4
Ubuntu 20.04.2 LTS
64-bit

This as never appened before. Onward

1 Like

if you just tried, there has been another release and perhaps the latest node is not there.

(on which, the now “latest” node wont be compatible. we need to add some specifity to the install command so we can keep moving in the bg to prevent this)

6 Likes

Hi guys. An update from the team. We’ve been monitoring the nodes running on our droplets and looks like we’ve hit the OOM issue again. We have downloaded the logs and are taking a look at the issue. We’ll keep you updated. Thanks for all your help :sparkles:

20 Likes

Hi Lionel, as a non-tech person, I have a question about this:

Was it to be expected that we’d hit the OOM issue again? And if so, is it improving?

1 Like

No, it’s a loop somewhere, possibly a looping message. We detected a message repeated over 10,000 times so it will be a simple fix.

18 Likes

Any chance of T v6.3 when it is fixed? Be nice to see if a tesnet can stay up while you are beavering away on v7.

13 Likes

Observation regarding chunks, PUTs and log messages…

$ ls -l chunks/immutable/ | wc -l
287
$ grep "Writing chunk succeeded" *.log | wc -l
350

For my cloud node the chunk store shows capacity of 212M and 287 chunks are present, which seems reasonable. To count PUTs, vdash is looking in the logs for “Writing chunk succeeded” and using grep shows this occurs 350 times across all the rotated logs.

So there are more “success” messages than chunks stored, and vdash will therefore overcount PUTs. Is this expected, and is there a better way to count PUTs?

4 Likes

There was only one successful join in the logs, right?

1 Like

Our “plan” is several smallish testnets as we get to T7, the idea is diehards help us. So some small iterations incoming.

11 Likes