[Offline] Fleming Testnet v6.2 Release - Node Support

davidpbrown · June 30, 2021, 12:26pm

Create keys is hanging for me

Trying then

$ $HOME/.safe/node/sn_node --public-addr 82.41.99.238:3047 --local-addr 192.168.0.15:3047 --hard-coded-contacts=[\"159.65.27.95:56248\",\"209.97.176.93:57677\",\"209.97.176.123:55911\",\"46.101.6.111:40836\",\"209.97.140.34:59824\",\"209.97.176.138:59795\",\"206.189.125.78:60401\",\"209.97.176.127:55047\",\"209.97.176.134:53102\",\"206.189.123.106:43685\",\"209.97.138.128:59206\",\"209.97.138.80:34447\",\"209.97.141.179:39630\",\"139.59.178.75:57133\",\"209.97.133.247:48201\",\"46.101.95.158:12000\",\"209.97.143.117:42097\",\"206.189.127.108:51563\",\"209.97.133.206:38243\",\"209.97.176.145:57380\",\"209.97.176.100:54768\",\"209.97.131.130:35410\",\"209.97.176.91:33956\",\"209.97.133.65:40289\",\"209.97.176.97:60599\",\"209.97.176.98:39100\",\"167.99.198.192:56527\",\"209.97.131.245:49877\",\"209.97.129.240:48304\",\"209.97.141.103:52713\",\"159.65.94.13:57349\",\"209.97.176.99:50115\",\"178.128.40.158:42845\"]

gives something new as error detail to the terminal

[sn_node] ERROR 2021-06-30T13:22:39.623148681+01:00 [src/routing/core/bootstrap/join.rs:422] Failed to send message Routing { msg: RoutingMsg { id: 4960c326.., src: Node { public_key: PublicKey(CompressedEdwardsY: [30, 193, 189, 69, 135, 11, 40, 244, 46, 146, 180, 191, 70, 208, 130, 36, 240, 253, 146, 122, 172, 51, 187, 163, 186, 191, 181, 231, 231, 182, 102, 5]), EdwardsPoint{
	X: FieldElement51([1339626378285353, 1422257739952121, 1773412462988668, 854700734940154, 4948749485183]),
	Y: FieldElement51([1737646033162753, 1329396263222078, 2186462723071741, 911451935725592, 1563157389875352]),
	Z: FieldElement51([1819641264889322, 1946439798226173, 1997465299890218, 1178809842119306, 1667648634849677]),
	T: FieldElement51([1558138978118476, 378888274307812, 1693945987817623, 882624393770689, 1206782704585459])
}), signature: ed25519::Signature([151, 109, 105, 198, 242, 196, 151, 65, 66, 193, 164, 38, 111, 70, 21, 34, 102, 215, 196, 127, 1, 13, 167, 184, 6, 239, 21, 64, 209, 255, 237, 127, 0, 121, 74, 179, 189, 128, 215, 132, 104, 185, 175, 227, 36, 82, 221, 252, 118, 181, 38, 190, 12, 34, 234, 24, 167, 21, 186, 130, 136, 152, 253, 6]) }, dst: DirectAndUnrouted, variant: JoinRequest(JoinRequest { section_key: PublicKey(02b8..ecf7), resource_proof_response: None }) }, dst_info: DstInfo { dst: 0b1002(00001011).., dst_section_pk: PublicKey(02b8..ecf7) } } to [209.97.176.99:50115, 209.97.176.134:53102, 206.189.125.78:60401, 209.97.138.80:34447, 46.101.95.158:12000, 209.97.140.34:59824, 209.97.176.100:54768]

and then it’s just

Encountered a timeout while trying to join the network. Retrying after 3 minutes.

but the log file is not looking more useful - nothing of that above put to the log??..

$ tail -f $HOME/.safe/node/local-node/sn_node_rCURRENT.log
[sn_node] INFO 2021-06-30T13:16:18.507154556+01:00 [src/node/bin/sn_node.rs:116] 

Running safe_network v0.4.0
===========================

One trivial diff I note the process detail suggests as if the flag for --hard-coded-contacts does not need an equals sign - expect that is nothing though.

ps aux | grep ‘node’
safe 6533 1.0 0.1 54168 20148 pts/4 Sl 13:16 0:00 /home/safe/.safe/node/sn_node -vv --hard-coded-contacts [“159.65.27.95:56248”,“209.97.176.93:57677”,“209.97.176.123:55911”,“46.101.6.111:40836”,“209.97.140.34:59824”,“209.97.176.138:59795”,“206.189.125.78:60401”,“209.97.176.127:55047”,“209.97.176.134:53102”,“206.189.123.106:43685”,“209.97.138.128:59206”,“209.97.138.80:34447”,“209.97.141.179:39630”,“139.59.178.75:57133”,“209.97.133.247:48201”,“46.101.95.158:12000”,“209.97.143.117:42097”,“206.189.127.108:51563”,“209.97.133.206:38243”,“209.97.176.145:57380”,“209.97.176.100:54768”,“209.97.131.130:35410”,“209.97.176.91:33956”,“209.97.133.65:40289”,“209.97.176.97:60599”,“209.97.176.98:39100”,“167.99.198.192:56527”,“209.97.131.245:49877”,“209.97.129.240:48304”,“209.97.141.103:52713”,“159.65.94.13:57349”,“209.97.176.99:50115”,“178.128.40.158:42845”] --root-dir /home/safe/.safe/node/local-node --log-dir /home/safe/.safe/node/local-node

Aragorn · June 30, 2021, 12:29pm

Safe keys balance request is hanging too,
related I guess ?

On the other side, my node has joined the network, received a single chunk, then end with this message :
Failed to read incoming message on bi-stream for peer 209.97.140.34:59824 with error: TimedOut

davidpbrown · June 30, 2021, 12:30pm

Yay… I don’t know what it means but I’ve got

Node PID: 6749, prefix: Prefix(1), name: f2f2cf.., connection info:
"82.41.99.238:3047"

oddly in the terminal and not on the log file though…

happybeing · June 30, 2021, 12:33pm

I had this recently on my cloud instance.

happybeing · June 30, 2021, 12:40pm

I’ve updated vdash (v0.6.3) and it shows both PUTs and GETs so long as you start the node with sufficient logging as follows:

RUST_LOG=info,quinn=off safe node join

To update and start vdash:

cargo install vdash
vdash ~/.safe/node/local-node/sn_node_rCURRENT.log

Zoom the timelines in and out using ‘i’ and ‘o’. To quit press ‘q’.

Example:

davidpbrown · June 30, 2021, 12:40pm

In parallel then I get nothing obviously occurring for
safe keys balance
or an upload… and node terminal and log not busy - perhaps as there are no keys no uploads and then no activity is expected…
Will leave what i take to be my node, up for a while yet and see what occurs…

davidpbrown · June 30, 2021, 12:51pm

and so key create just failed with

 safe keys create --test-coins --for-cli
[qp2p::api] ERROR 2021-06-30T13:45:29.996830035 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/api.rs:266] Failed to bootstrap to the network: Connection lost
Error: Failed to connect: ConnectionError: Failed to connect to the SAFE Network: QuicP2p(BootstrapFailure)

and the other for balance and upload failed just now with the same.
?Is network down now then or just a long wait to error?

Noting two many threads!..

dirvine · June 30, 2021, 1:08pm

I see you use advanced comands and not the OP. Perhaps you don’t have the correct contacts (I see you pass some direct on command line when trying node? so I am trying to follow your steps, but not clear). Can you try

- rm -rf $HOME/.safe
-  curl -so- https://sn-api.s3.amazonaws.com/install.sh | bash
-  safe node killall
- safe networks add fleming-testnet https://sn-node.s3.eu-west-2.amazonaws.com/config/node_connection_info.config
- safe networks switch fleming-testnet

Then try to use cli and ignore node for now?

happybeing · June 30, 2021, 1:14pm

After the initial burst of 16 PUTs (200M which seems odd) and 8 GETs my cloud node shows no activity for over 90 minutes.

Also my recursive put on the same instance has stalled since about 30 minutes ago. Last pair of messages in the log (of many similar message pairs):

[safe_network::client::connections::messaging] WARN 2021-06-30T13:45:04.027291757 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/safe_network-0.4.0/src/client/connections/messaging.rs:58] Disconnected from elder 209.97.133.206:38243. Attempting to reconnect
[safe_network::client::connections::messaging] WARN 2021-06-30T13:46:04.028908251 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/safe_network-0.4.0/src/client/connections/messaging.rs:65] Could not reconnect to 209.97.133.206:38243, error: Connection(TimedOut)

davidpbrown · June 30, 2021, 1:54pm

I did the usual simple safe node join first but that didn’t seem to work; so, just adopted what had been suggested previously for that

There’s a bit of spot the difference between each OP; so, just spotting a difference above that this now doesn’t have the --hard-coded-contacts but had taken those from the copy of addresses from the ps was using… and just checking it’s the same list as the current node_connection_info.config

So, trying again now - both safe keys create --test-coins --for-cli and safe files put --recursive ./hello/ look hung… but then is the network up?

neik · June 30, 2021, 2:26pm

think we are down administrator@Test-Node:~/.safe$ safe keys create --test-coins --for-cli [qp2p::api] ERROR 2021-06-30T15:23:42.068549617 [/home/administrator/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/api.rs:266] Failed to bootstrap to the network: Connection lost Error: Failed to connect: ConnectionError: Failed to connect to the SAFE Network: QuicP2p(BootstrapFailure) administrator@Test-Node:~/.safe$

Michael_Hills · June 30, 2021, 2:38pm

Great work team, getting this up and running fast!

For me I can’t install a node I get:

Error: No asset found in latest release for the target platform x86_64-unknown-linux-musl

System:

3.7 GiB
Intel® Celeron(R) CPU J3455 @ 1.50GHz × 4
Ubuntu 20.04.2 LTS
64-bit

This as never appened before. Onward

joshuef · June 30, 2021, 2:43pm

if you just tried, there has been another release and perhaps the latest node is not there.

(on which, the now “latest” node wont be compatible. we need to add some specifity to the install command so we can keep moving in the bg to prevent this)

lionel.faber · June 30, 2021, 3:46pm

Hi guys. An update from the team. We’ve been monitoring the nodes running on our droplets and looks like we’ve hit the OOM issue again. We have downloaded the logs and are taking a look at the issue. We’ll keep you updated. Thanks for all your help

andreruigrok · June 30, 2021, 4:22pm

Hi Lionel, as a non-tech person, I have a question about this:

Was it to be expected that we’d hit the OOM issue again? And if so, is it improving?

dirvine · June 30, 2021, 5:03pm

No, it’s a loop somewhere, possibly a looping message. We detected a message repeated over 10,000 times so it will be a simple fix.

happybeing · June 30, 2021, 5:21pm

Any chance of T v6.3 when it is fixed? Be nice to see if a tesnet can stay up while you are beavering away on v7.

happybeing · June 30, 2021, 5:36pm

Observation regarding chunks, PUTs and log messages…

$ ls -l chunks/immutable/ | wc -l
287
$ grep "Writing chunk succeeded" *.log | wc -l
350

For my cloud node the chunk store shows capacity of 212M and 287 chunks are present, which seems reasonable. To count PUTs, vdash is looking in the logs for “Writing chunk succeeded” and using grep shows this occurs 350 times across all the rotated logs.

So there are more “success” messages than chunks stored, and vdash will therefore overcount PUTs. Is this expected, and is there a better way to count PUTs?

Vort · June 30, 2021, 5:49pm

There was only one successful join in the logs, right?

dirvine · June 30, 2021, 10:44pm

Our “plan” is several smallish testnets as we get to T7, the idea is diehards help us. So some small iterations incoming.

Topic		Replies	Views
[Offline] Fleming Testnet v6.2 Release - General & CLI Support Releases	173	3389	July 10, 2021
Fleming Testnet v6 Release - OFFLINE Releases	207	5455	June 21, 2021
[Offline] Fleming Testnet v6.1 Release - Node Support Releases	64	3014	June 29, 2021
Fleming Testnet v1 Release - NOW OFFLINE IN PREPARATION FOR TESTNET V2 Releases	703	15445	April 27, 2021
[Offline] Fleming Testnet v6.1 Release - General & CLI Support Releases	103	2783	July 2, 2021

[Offline] Fleming Testnet v6.2 Release - Node Support

Related Topics