Yup, it may well eventually split. But it may also fall over at that point. But this should let us test relative stability w/ 50gb nodes and uploads < 10mb. As it fills up, we can test joins in a sane fashion… and if we get there, splits too.
It’s not delusional to aim for the best.
It’s an error to paint me as thinking there won’t be any barriers, of course there will be barriers.
It’s defeatist to say we should accept such a big barrier - because some other projects have done so - when to do so conflicts with one of the most important aims of the project.
Here’s the promised logs from the (1st) failed testnet.
Still investigating what happened and will share any findings.
Thank you David.
Are we any closer to an ELK solution that would allow comnet participants to post their logs?
File under useless info…
My IP appears nearly half a million times in these logs.
Nearly 700 times it appears in an ERROR msg
15 times my node was marked as dysfunctional after 70s timeout
@chriso has been looking to secure out ELK setup, (but don’t expect that to land before the NY)
I wonder @joshuef @davidrusu that this is an example of a node that cannot be connected to (no igd) but we don’t test for connectivity and it joins. If there are a few then it’s massive churn and membership is hammered and …
goes off? If there is a few like this we have a constant join/kill/join kill loop happening and the nodes are in a tizzy.
So we likely need to use the quinn connectivity test here for joining nodes and then also blacklist them when they are killed (the key revocation I keep murmering on about)
I’m setting up some resources on Hetzner (yes I know)
Should I add a 50GB volume to each instance?
If anyone else wants to join in, I can share an Ubuntu snapshot with the latest release?
I joined last night with --skip-auto-port-forwarding on both main and main2.
“There is no such thing as a failed experiment, only experiments with unexpected outcomes.”
— R. Buckminster Fuller
Balls - utter buckyballs
Yeah, I was working on it this week. I was trying to get a stack up on Kubernetes with SSL and authentication for both Elasticsearch and Kibana, and that was quite a bit more complex than I thought it would be.
I do believe though that this will be the best hosting option for us, as it can scale up if need be. Maybe we will have much more nodes participating in future tests.
I also think we’ll be able to do some cool stuff with Kibana for log filtering and other things. Maybe we’ll be able to report offline nodes and stuff like that.
However, I’m on holiday for a couple of weeks after today, so we won’t see anything significant with this until the new year.
Edit: also, other surrounding utility applications that may pop up should also be able to be hosted on the same Kubernetes setup.
Yes this was why I gave up earlier on in the year
Enjoy your holidays - well earned ![]()
A quick update on the failed test net. So the most likely theory for what happened was a bunch of nodes joined who are not accessible from the public internet (i.e. folks tried to join from behind a NAT).
This lead to a bunch of nodes joining who were logically expected to hold data and run Adult node duties but were actually not able to be reached for requests.
We’ve added some logic to detect this case and reject any join requests that are coming from behind a NAT. We may attempt to tackle NAT holepunching again at some point but likely not anytime soon:
Hopefully the next test-net that supports churn won’t fall over due to NAT’s ![]()
Would this possibly mean a situation, where one would get chunks, but not be able to deliver them on request?
Where you running your node at home behind a NAT?
Well I’m not sure what NAT means, but definitely from home from a laptop.
Maybe better continue this in the “Another testnet…” thread, as everything else relating to the situation is there.
I am pretty certain you were behind a NAT (router) in this case.
Yes, you would get some chunks of nodes you connected to but none could have connected back to you to give you any more chunks.
Just curious the reasoning to avoiding NAT so hard when if holepunching was available then the level of participation could be much higher? Because if people can’t bypass NAT then they likely aren’t comfortable with CLI and we just aren’t at that stage or priority level yet?
It’s purely complexity. Not just code, but that is an issue, but the plethora of different routers and behaviours of them. Then double NAT etc. It’s a hellscape really. So ensuring everything else works first avoids those external issues that are hard to handle.
Also, and not coping out, but the quic specification has some hole punching goals and if that progresses we can get NAT traversal almost free to us as we let those guys do the testing etc. for us. I am optimistic about this one
Not coping out at all. I love freebies
and that’d be a win for everyone.
I see your points, thanks.