Royalties2 [Testnet 13/11/23 ] [Offline]

@Shu did you looked at gossip / royalties data?
I don’t see it in your charts (or I’m just missing it for some reason).
I wonder how such activity correlates with other data.

2 Likes

No, I have to catch up on exactly where the metrics are on that front (royalties & gossip area) and how the message looks like in the logs to build up the panels on the dashboard against it. I am behind on the code base, :smiling_face: .

I may shift through the existing logs on disk and see what I can uncover a bit later in the day today, and if so, re-parse the data, and maybe re-create that timeline.

5 Likes

I have had 20 nodes running for ~45 mins and just started 10 more

safe@SAFEnodes-southside:~$ du -h safehome/node/*/record_store
116M	safehome/node/12D3KooW9u2VJAXe5xzgREjyjrQy59WiRhZqfj9Sw92hB1R8c5ND/record_store
846M	safehome/node/12D3KooWAjyXCP6TKpiKhGYd8mu9KoPWXXFQqeLBi7F2zaX7LX4v/record_store
855M	safehome/node/12D3KooWAnMWs3UpR1bzRhhjNmtnLLRymXY23Rh4tEAawW696Awz/record_store
404M	safehome/node/12D3KooWBKpMerNWCDfCHejvoNPQcWcyEDzCHwqxce6HU4UWxhZ3/record_store
849M	safehome/node/12D3KooWD3ny2BUdRbMsHemqKQFHsiqNQA31eEByXBcjDrYiHGrp/record_store
31M	safehome/node/12D3KooWDt9vMxHgMMqinjCa8LXWchmkjFpsAuCGbhZjN7sxziUw/record_store
852M	safehome/node/12D3KooWEYqSBUWfN5XrbXW4PxA1tUKjnLuPVmo2GYdvC1Yc18ew/record_store
852M	safehome/node/12D3KooWEqUtvkEFJTmvqMZurcLi4V7G5Q32oBXcneJQ2ZA9FVev/record_store
4.0K	safehome/node/12D3KooWFMzXf78qUDUUWNAj8T5rfiZZJRJ4Me3UoMBSBa6FmvsK/record_store
301M	safehome/node/12D3KooWFZXiKzjQfkJDuegNhSTVjUHntZJj7z2GWxubdNsrzvfT/record_store
856M	safehome/node/12D3KooWGDcfu3negZVmRJmH9Q2vm44QqjPLW7xDyZgT4mcCcv9D/record_store
193M	safehome/node/12D3KooWHLC13FpcEEkF14k9q29DrH3UaSb4tfx6mPfhVrq7fzta/record_store
853M	safehome/node/12D3KooWHjBtzy6Hy1dMb2nfvis1nmh9NR5hr13fRnFzqhWXNvba/record_store
89M	safehome/node/12D3KooWHtfg9Le7wktLZCs9dScByA433KBfNr3Yt1VWZgdhH6VB/record_store
853M	safehome/node/12D3KooWJ2HwMVaZycacNowv6SVMseDi3sz1utAgMsPqy9fpuS1h/record_store
849M	safehome/node/12D3KooWJ9Fm1z9AiSehzGuYwcZRC1ZxnDg3UhWdiSGhSCjeou9q/record_store
779M	safehome/node/12D3KooWK9N7QkguNVywi6Qixqa3T321fcPjK4BKJmz8qLrEwFJZ/record_store
841M	safehome/node/12D3KooWK9pZudFeEXBs9K5jCzqo8btnpUjytZxVaQ7AhBxwU1ej/record_store
849M	safehome/node/12D3KooWKcjgqbbzinLspNz7zPNry7PN7Kyhi7VEBhqkzR98ZRqk/record_store
851M	safehome/node/12D3KooWLSgU6f1n1e2s73NDmDCmmfGBzZfV1L9cX7FZ18C5iVAf/record_store
845M	safehome/node/12D3KooWLfKBD1GhgAe6EAr28rKEYDqdGGVoTSg7yDN6iymFrHVe/record_store
849M	safehome/node/12D3KooWMABeWx2WyT3Xz7bxmhZenegCT1cxBtbHDjFDNrTYyta9/record_store
851M	safehome/node/12D3KooWMwtAfDyVatuvXqN9pHSJpgBojvhQ2rCPaZKMTAEP1zvc/record_store
848M	safehome/node/12D3KooWN2oHnc313RvLaDqmz7gwG7Zk7p6DteA7QKfd3Hq6zKfg/record_store
104M	safehome/node/12D3KooWQ4GpgMrDtpsC9Z2QExTDNY4JFBCA6sVRuUWJ1zPzdf6v/record_store
852M	safehome/node/12D3KooWQQwi4Wzno9yXm8TTmaL9koq78XK2YU2HHqGzJcto7FLN/record_store
855M	safehome/node/12D3KooWQYmbWvZ9r6ZuBYpJZs5sxQLBiYbTJ6SqXX9KynJFcmUa/record_store
855M	safehome/node/12D3KooWQqtZT5uuAcPp7nxqVSy6TL42Ff3maUnHnpDnKKBAn7M6/record_store
849M	safehome/node/12D3KooWQzTCxQ5sjwYYoQFgUTyvRqJt7gobZLUiCaBmbrFFYRgo/record_store
852M	safehome/node/12D3KooWRctLzdRVDeL2CWSU3keQvJpBEYWvw1XM81btbGdmEUSw/record_store

The original nodes are all full or very close to it and the new ones are filling up fast - but zero earnings…

So perhaps we need more nodes, not fewer?
Total memory use for 30 nodes is only slightly less than I was seeing for 50.

I think it is a death spiral. As more nodes die the replication backlog grows, everything gets slower, more nodes die,…
In log of one of my dead nodes there is a line saying it was over 11k records behind.

4 Likes

Me too problem with faucet…(Windows)

Failed to get tokens from faucet, server responded with: Failed to send tokens: Transfer Error Failed to send tokens due to The transfer was not successfully registered in the network: CouldNotSendMoney(“Network Error Could not get enough peers (5) to satisfy the request, found 1.”).

2 Likes
[2023-11-13T20:59:32.678238Z ERROR sn_networking] SwarmCmd channel is full. Await capacity to send: SwarmCmd::SendRequest

What does this mean? Full queue somewhere on my side or on the other side?
I see lot of those in log.

1 Like

Can anyone spare some SNT?

b072baca351ab6c7923183e47aee9dceee079ff131f2907b677958cd9ca43e435fdcfcbbeb62c8c950ee0812c6b379b5

2 Likes

nope, sorry :frowning:

willie@gagarin:~$ safe wallet send 20 b072baca351ab6c7923183e47aee9dceee079ff131f2907b677958cd9ca43e435fdcfcbbeb62c8c950ee0812c6b379b5
Logging to directory: "/home/willie/.local/share/safe/client/logs/log_2023-11-13_21-11-52"
Built with git version: de52073 / main / de52073
Instantiating a SAFE client...

🔗 Connected to the Network                                                                                                                           Failed to send NanoTokens(20000000000) to MainPubkey(PublicKey(1072..c6af)) due to Transfers(CouldNotSendMoney("The transfer was not successfully registered in the network: CouldNotSendMoney(\"Network Error Could not retrieve the record after storing it: fd6ea8fe2116e105f612c0ac5473c3ce78201c3b4fac814525d9e1c9add2ad06(01a8201bdddabc30932ee02db081f72329699d4d65ece33748ebfd4ef410dcd5).\")")).
Error: 
   0: Transfer Error Failed to send tokens due to The transfer was not successfully registered in the network: CouldNotSendMoney("Network Error Could not retrieve the record after storing it: fd6ea8fe2116e105f612c0ac5473c3ce78201c3b4fac814525d9e1c9add2ad06(01a8201bdddabc30932ee02db081f72329699d4d65ece33748ebfd4ef410dcd5).").
   1: Failed to send tokens due to The transfer was not successfully registered in the network: CouldNotSendMoney("Network Error Could not retrieve the record after storing it: fd6ea8fe2116e105f612c0ac5473c3ce78201c3b4fac814525d9e1c9add2ad06(01a8201bdddabc30932ee02db081f72329699d4d65ece33748ebfd4ef410dcd5).")

Location:
   /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/convert/mod.rs:716

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Contrary to what I said further up, more uploads are NOT what we need right now. All my nodes are very rapidly running out of space as (nearly) each one has stored its max no of chunks.

3 Likes

I failed to send twice as well…sorry :person_shrugging:

2 Likes

Thanks for trying…

Houston…We have a problem…

2 Likes

Is it possible the network is prioritising chunk replication right now?
I think we urgently need more nodes - or less replication - or the ability to store more chunks on each node - or more beer

At least one of the above…

1 Like

I have 100 nodes started on a 4GB VPS

Memory usage is actually quite low. Load average is high but I have seen much higher.
If I start vdash - and leave it alone - cos otherwise it gets killed with OOM - it will eventually start to show me the status of some nodes

I suspect - and could well be wrong - that the nodes shown as INACTIVE are actually full. I doubt @happybeing ever thought that vdash would be used in such a hectic environment so this is no criticism

I have definitely got plenty network traffic and counting (roughly) the no of chunks stored shows a steady increase

safe@SAFEnodes-southside:~$ ls -l  safehome/node/*/record_store/ |wc -l
108662

I am going to leave well alone and read a book for the next 15 mins and then see what it shows.

1 Like

Some time later vdash is showing a somewhat more complete picture

its still only handled less than half the nodes so far.

@happybeing Is the default behaviour to show “Stopped” for a node that has not yet been interrogated yet?

I’d have to check the code. You can of course. IIRC “Stopped” results from a particular logfile message.

“Inactive” means it hasn’t seen an update to the logfile for 20s (I think). Maybe that’s too short for a heavily loaded system, but as soon as the logfile updates that should be cleared.

I’ve not done any testing with large numbers of nodes so you are pushing into unknown vdash territory. Good luck :wink:

4 Likes

This is the Burden of a HammerLord :slight_smile:

To boldly poke where none have poked before.

3 Likes

Upgraded to 0.85.14 - No change unfortunately. It’s still looping on verifying the last 15 chunks

1 Like

As I suspected. You see every problem as a nail.

2 Likes

No, some are Torx countersinks and all I have are Allen keys.

Actually I prefer my Stillsons - can be used for shifting tough nuts of various sizes but heavy enough to use as a hammer when need be


Only one node on this machine. I killed it when it reached 10GB of memory.
If anyone wants to look at logs why this node went crazy, be my guest…
https://filetransfer.io/data-package/H9Bu3JKG

5 Likes

Few more observations:

  • Looks like the amount of logging activity and events generated pertaining to gossip sub messages were order of magnitude higher than any other existing log message types on disk (overall and per unit of time (frequency))
  • The GET Requests started at a much later time frame (16:40 UTC)+ (around same time records started being stored in record_store directory), and continued to increase more later on in the timeline
  • The other message types, the frequency of them remained roughly consistent (flat lined) throughout life of safenode pid, i.e. peer info sent & received etc.
  • In addition, error messages tied with LIBP2P Identity Errors & LIBP2P Swarm Errors were roughly consistent (spread throughout the timeline), and did not seem to have a rapid amplification effect.

  • The gossip sub messages seemed to have rapidly increased in count per fixed time interval right from the start

Note: I am still back filling the data from the log activity, hence the gap post 18:30+ UTC, but there is a clear trend upwards on the gossip sub activity (zoomed in above). I will re-update the images above once the back-fill is done.

I wonder if and what is causing a multiplier effect for the sheer amount of gossip sub activity occurring right from the start, where as other log messages end up plateauing (count of messages per fixed unit of time).

Update: Back-fill complete. Re-updated the images above.

9 Likes