Well, it was bound to happen some time. After a run of successes, the RoyaltiesPaymentNet was cursed by high memory usage which killed off many nodes before they could even start, and left the rest pretty much zombified. The spooky thing is it all worked fine on our internal testnets (albeit with some slightly raised memory levels). Could poor RoyaltiesPaymentNet have been struck down by dark forces beyond our ken?
Or perhaps there is a logical explanation. Chief in our sights is
GossipSub, the system by which nodes performing transactions propagate the fact to foundation nodes which then take their share.
GossipSub is dealing with many more messages than anticipated. It’s not yet clear if that’s looping, or client top-ups resending royalty payments, or something else.
One issue is that all nodes try to decode all transfers, causing a lot of unnecessary activity, another is that
libp2p has been allocating quite generously… We’ve some PRs in to help there and are hopeful this will yet come together!
There are some other fixes to go in too, including
libp2p fixes, encrypted transfers, and replication on put changes which should reduce load when we launch another testnet.
We’re grateful that the
libp2p team is responsive and open to helping us. This week @dirvine contacted them about building in Sybil defences based on some recent research, and they’ve said they’re open to the idea.
@chriso worked on the node management side of things. Windows is always more difficult in this regard and he ran into some issues, but it’s mostly sorted now.
@joshuef investigated high memory usage and looping messages in
GossipSub which may have caused the testnet failure, as well as other small fixes, and is looking to implement pay one node which should speed up the validation process and improve performance.
We’ve been experiencing a few payment failures in testing as we move to only paying one node. @anselme is digging into those, and working to make the issue easier to debug.
@qi_ma has been fixing some other internal tests that were failing.
Feel free to reply below with links to translations of this dev update and moderators will add them here:
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!