Over the past few years we’ve often sympathised with Sisyphus in the Greek legend. Sisyphus had to push a boulder up a hill for eternity only for it to go rolling back down again as soon as he neared the summit.
Not wishing to tempt the notoriously vengeful Greek gods, but we’re increasingly certain that this time we’ve finally cracked it, and we’re sitting pretty on the plateau. Why the confidence, oh ye who have stuck with us through thick and thin, oh ye who are maybe experiencing a faint feeling of deja vu?
Well because the bugs are getting smaller and we’re fixing them quicker. Because the whole team is fixing bugs together, rather than each having a specialisation. Because testnets are lasting longer and yielding results that we can understand and deal with. Because we can iterate on the fly with tangible improvements. Because we’re collaborating with like-minded people. And because the community are mucking in with their own fixes. We’ve moved from the theoretical to the practical, and boy does that feel good.
A whole heap of PRs went in this week, from the team, to the team and to some other projects. In summary:
- A fix to return a majority of nodes rather than all of them.
- A PR to just pay one node rather than all of them, while still replicating to the close group.
- Another on replication, this one to use
tokio::Intervalfor forced replication instead of Instant, to deal with traffic spikes that were causing blockages.
- A change to the client to only verify chunks that reach majority for replication.
- A fix for a peer duplication issue in replication triggering.
- And another that experiments with expanding the replication range.
- Then there’s one that removes slow content_hash logging for large records - a probable memory leak.
- Again on logging, there’s one that adds SwarmCmd logging for performance profiling, another that adds logging of node’s KBucketKey and another that fixes timing logs.
- Plus, there’s @bzee’s PR to
libp2pto address an ever-growing address vector store - another memory leak candidate.
- Then there’s a fix for reward test failures by emitting NetworkEvent on GossipSub publishes
- Plus there are several more waiting to land from the team’s individual branches.
Thanks to @southside for his helpful PR for a simple output improvement and to
shuoer86 for some typo corrections. Everyone else, don’t be shy. If you spot something that could be tweaked or improved, drop in a PR or let us know on the forum.
@joshuef has been looking at store cost variations, and how clients are looping unnecessarily over increasing prices, and paying for data that’s already stored. (PRs 887/888). We’re tightening up the payments system by verifying who has the chunk and repaying them if necessary, not repaying all. In turn, this reduces stress on the verification process, meaning less pointless activity and improved performance.
Related improvements include eliminating redundant content hashing, and only verifying chunks in a majority of the close group rather than all of them to avoid unnecessary work.
@bochaco has been working on documentation changes for new cli/rpc-client commands, and testing
testnet-deploy to verify that CashNotes can be downloaded and deposited to a local wallet. He also finalised the process to pay Foundation nodes and prepared the latest testnet to put it through its paces.
@bzee has been looking at paying a single node for data storage. As discussed last week, this could be a nice cheap and dirty option for storage without redundancy, so long as it proves sufficiently reliable. He’s also hopefully fixed another data leak around an ever-growing store in
libp2p that holds identities of known nodes. The
libp2p team are on that one now.
Meanwhile, @anselme revamped stored payments with CashNotes and Transfers.
@roland has been tuning his attention to replication, delaying it from instant to checking every 10 seconds to avoid unwanted blockages elsewhere. In addition, Roland has been optimising how nodes record and store their close peers to avoid duplication.
Replicating registers has been another holdout issue. If a register changes during the replication process, different nodes can end up holding different versions, with problems arising due to the CRDT convergence not happening in time. @Qi_ma has put in a fix for this now, so that’s another one ticked off.
And @chriso continues to improve the testnet automation process, including an install command for the node manager.
Feel free to reply below with links to translations of this dev update and moderators will add them here:
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!