While joining more nodes after "timout" and "retrying after 30 seconds" nodes are shuting down

while joining more nodes than needed i get the message “timout” and “retrying after 30 seconds” but then nodes are shuting down.

expected behavior: retrying to join after 30 seconds

1 Like

can you let us know what code you’re using :bowing_man:

also more info about how many nodes etc and your system would be grand. cheers @dreamerchris !

2 Likes

hetzner 2x debian 11 16vcpu 32gb ram 200gb
I am using my scripts to spin 15 nodes in each server

to launch the network

#!/bin/bash

NODE_NUM=14
USER=$(whoami)

ACTIVE_IF=$( ( cd /sys/class/net || exit; echo *)|awk '{print $1;}')
LOCAL_IP=$(echo $(ifdata -pa "$ACTIVE_IF"))
PUBLIC_IP=$(echo $(curl -s ifconfig.me))
SAFE_PORT=12000
sleep 1
mkdir -p $HOME/.safe/node/local_node0/
sleep 1
CURRENT_ROOT_DIR=$HOME/.safe/node/local_node0/
CURRENT_LOG_DIR=$HOME/.safe/node/local_node0/
CURRENT_NODE=0
echo -n "#!/bin/bash
RUST_LOG=sn_node=trace \
        $HOME/.safe/node/sn_node --first \
        --local-addr '$LOCAL_IP':$SAFE_PORT \
        --public-addr '$PUBLIC_IP':$SAFE_PORT \
        --skip-auto-port-forwarding \
        --root-dir '$CURRENT_ROOT_DIR' \
        --log-dir '$CURRENT_LOG_DIR' & disown" \
| tee $HOME/.safe/node/start-node$CURRENT_NODE.sh
sleep 1
chmod u+x $HOME/.safe/node/start-node$CURRENT_NODE.sh
echo ""
echo -n "[Unit]
Description=Safe Local Node $CURRENT_NODE
[Service]
User=$USER
ExecStart=$HOME/.safe/node/start-node$CURRENT_NODE.sh
Type=forking
[Install]
WantedBy=multi-user.target"\
| sudo tee /etc/systemd/system/sn_node$CURRENT_NODE.service
sleep 1
sudo systemctl start sn_node$CURRENT_NODE.service
sleep 1
safe networks add mynet
safe networks switch mynet

for CURRENT_NODE in  $(seq $NODE_NUM)
do
SAFE_PORT=$((12000+$CURRENT_NODE))
CURRENT_ROOT_DIR=$HOME/.safe/node/local_node$CURRENT_NODE/
CURRENT_LOG_DIR=$HOME/.safe/node/local_node$CURRENT_NODE/
sleep 1
mkdir $CURRENT_ROOT_DIR
sleep 1
echo -n "#!/bin/bash
RUST_LOG=sn_node=trace \
        $HOME/.safe/node/sn_node \
        --local-addr '$LOCAL_IP':$SAFE_PORT \
        --public-addr '$PUBLIC_IP':$SAFE_PORT \
        --skip-auto-port-forwarding \
        --root-dir '$CURRENT_ROOT_DIR' \
        --log-dir '$CURRENT_LOG_DIR' & disown" \
| tee $HOME/.safe/node/start-node$CURRENT_NODE.sh
sleep 1
chmod u+x $HOME/.safe/node/start-node$CURRENT_NODE.sh
echo ""
echo -n "[Unit]
Description=Safe Local Node $CURRENT_NODE
[Service]
User=$USER
ExecStart=$HOME/.safe/node/start-node$CURRENT_NODE.sh
Type=forking
[Install]
WantedBy=multi-user.target"\
| sudo tee /etc/systemd/system/sn_node$CURRENT_NODE.service
sleep 1
sudo systemctl start sn_node$CURRENT_NODE.service
sleep 1
done
echo ""
echo "copy the following to your testnet config!"
echo ""
cat $HOME/.safe/node/node_connection_info.config

echo ""
echo "End of multi sn node joiner script. Copy and paste the following to load vdash!"
echo ""
echo "$HOME/.cargo/bin/vdash $HOME/.safe/node/local_node*/sn_node.log"

and to load more nodes in second server:

#!/bin/bash
SAFENET="dreamnet"
CONFIG_URL="https://nx23255.your-storageshare.de/s/F7e2QaDLNC2z94z/download/dreamnet.config"
NODE_NUM=15
USER=$(whoami)

safe networks add $SAFENET "$CONFIG_URL"
safe networks switch $SAFENET

ACTIVE_IF=$( ( cd /sys/class/net || exit; echo *)|awk '{print $1;}')
LOCAL_IP=$(echo $(ifdata -pa "$ACTIVE_IF"))
PUBLIC_IP=$(echo $(curl -s ifconfig.me))

for CURRENT_NODE in  $(seq $NODE_NUM)
do
SAFE_PORT=$((12000+$CURRENT_NODE))
CURRENT_ROOT_DIR=$HOME/.safe/node/local_node$CURRENT_NODE/
CURRENT_LOG_DIR=$HOME/.safe/node/local_node$CURRENT_NODE/
mkdir $CURRENT_ROOT_DIR

echo -n "#!/bin/bash
RUST_LOG=sn_node=trace \
        $HOME/.safe/node/sn_node \
        --local-addr '$LOCAL_IP':$SAFE_PORT \
        --public-addr '$PUBLIC_IP':$SAFE_PORT \
        --skip-auto-port-forwarding \
        --root-dir '$CURRENT_ROOT_DIR' \
        --log-dir '$CURRENT_LOG_DIR' & disown" \
| tee $HOME/.safe/node/start-node$CURRENT_NODE.sh

chmod u+x $HOME/.safe/node/start-node$CURRENT_NODE.sh

echo -n "[Unit]
Description=Safe Local Node $CURRENT_NODE
[Service]
User=$USER
ExecStart=$HOME/.safe/node/start-node$CURRENT_NODE.sh
Type=forking
[Install]
WantedBy=multi-user.target"\
| sudo tee /etc/systemd/system/sn_node$CURRENT_NODE.service

sudo systemctl start sn_node$CURRENT_NODE.service

done
echo ""
echo "End of multi sn node joiner script. Starting vdash!"
echo ""
$HOME/.cargo/bin/vdash $HOME/.safe/node/local_node*/sn_node.log
1 Like

And what v of the code are you using? a release, or main?

2 Likes

release with the given maidsafe script

@dreamerchris to be able to help, i need to know exactly what code version you’re running.

Which maidsafe script are we talking about? When was it run? What code would it have downloaded for you? (what was the release V at the time?). Have you logged the sn_node -V version anywhere for example?

Without this I can’t know the state of the codebase you were running (eg, this issue should have been fixed on what’s in main… but I don’t know if you’re running code from before that, or if the bug still exists in some form).

5 Likes

i will test again once home as it was a quick test and no logs!

3 Likes

ok I have to report that it is fixed! all nodes that have not joined are retrying after 30 seconds!

I believe the version that was causing it had code that would randomly shutdown nodes and it was for testing and I guess you removed that bit of code in latest release!

all good! thanks for elaborating with me!

5 Likes