A wanted fire up and forget app

From the experience of mining Ethereum the most horrible thing is every day having to monitor if the miner is still going, updates and network problems can happen at anytime of the day.
I have a vision of a Safe node that is fire up and forget, you launch it and then it just runs, you might even forget that that you run it, it just keeps going.

The reason for having such app is to make it easy for people to run and maintain a node, maybe a network that is not so dependent on profit, people just have a node to pay for using the Safe-network. That would add increased security and stability to the network if a large % of people running nodes which just runs no matter underlying price/profit.

Solution could be an app that monitors if the Safe app process is running and sending a mail, push notification or similar to your phone if something happens to the node or updates available. I don’t know which solution might be optimal. As some health problems leaves me with very limited energy levels I might not be able to finish such app or it will take +6 months or similar.

If anyone finds it interesting and wants to give it ago or similar, just go for it.

10 Likes

In linux/bash it is a simple script that does

  • a ps on the node program name.
  • If no process running then it launches the node
  • then sleep for say 1 minute and check again in a loop
13 Likes

Like the addition that the app could start a node automatic if non is running. Thinking of a python app that loops for if a Safe node process is running, every second, should not take much cpu power, then adding if statement trying to start/restart node and sending messages if node is not running. Something on my mind is how to send push notifications to phone through local network or internet and if what that requiers an app on Iphone/Android to be able to get the messages through.

Thankful for all inputs, keep em coming. Will try to make a specification list later and research with ChatGPT and similar.

3 Likes

We should be able to run many nodes per computer here. So having each of them watch each other may be doable. Restarting a node will depend on privileges though and that is where the rubber hits the road. We don’t really want safe node run with high privileges

7 Likes

Most scenarios that involve loss of a node centre around connectivity. With dozens of nodes in one box its likely that a loss of connectivity would affect them all. I am still thinking that datacentre style setups with multiple redundant connectivity and power are going to have the advantage here. Maybe Im thinking wrong…

Hence I still am drawn to local wi-fi mesh networks with multiple fibre backbone connections and multiple satellite gateways.

2 Likes

this could be implemented easily with a program on the node that sends a tweet/toot/discord bot that hey I am online every day or any user selected interval, maybe not even sending hey I am online, but the discord bot only sending notification when connection to the node is lost or the node itself crashes. of course if a node crashes and relaunches this could also be a message to the bot

I should think about how it will be engineered as if the node itself losses e.g. connectivity or the system becomes unresponsive you cannot have a program on that system to notify about downtime.

having a program on your android/iphone/laptop/desktop(or even if you own a server on that) that can check every [user set interval] and making a push notification/tweet/toot/discord bot. sounds easy

1 Like

The bash script does it real simple and can be run as an app from a shortcut. Also it can just as easily check for many nodes to be running. The same can be done on windows. No need for code that people cannot easily change if they want. Script is easily understood by anyone who knows command line stuff, more so than a programming language program that could break with a windows update.

Also doing this in the command line scripting is simpler since doing it in python script is just trying to get python script to do the command line operations anyhow - its another level of complexity.

And if you want to go further like @SmoothOperatorGR suggests then the command processor script can he modified to do that too. Command line processor scripts are very often underestimated and can be set to run on startup, or login, or adhoc.

2 Likes

Is there any desire to get this working with systemd? I know that’s tantamount to a curse word in many Linux circles, but it would simplify this problem for many people running on systemd distros.

Also, is there even a need to get a Windows binary at this point? Just have WSL2 as a requirement to run a node on Windows. Then you can just keep everything Linux/Unix based.

1 Like

@neo 100% agree K.I.S.S.

From ChatGPT
here’s an example Bash script that checks if a job is running and restarts it if it’s not. It also includes a feature to issue an alert if the job fails more than once every 5 minutes:


#!/bin/bash

# Set the maximum number of failures before alerting
MAX_FAILURES=3

# Set the time to wait before retrying the job (in seconds)
RETRY_TIME=10

# Set the time period for the alert check (in seconds)
ALERT_TIME=300

# Set the log file location for the job
LOG_FILE=/path/to/job/log/file.log

# Set the command to start the job
START_COMMAND=/path/to/start/command

# Set the command to stop the job
STOP_COMMAND=/path/to/stop/command

# Initialize the failure count to 0
FAILURE_COUNT=0

# Initialize the last failure time to 0
LAST_FAILURE_TIME=0

# Check if the job is running
if pgrep -f "$START_COMMAND" > /dev/null
then
  echo "Job is running"
else
  # Job is not running, so restart it
  echo "Job is not running, restarting..."

  # Execute the start command
  $START_COMMAND

  # Increment the failure count
  ((FAILURE_COUNT++))

  # Check if the failure count has exceeded the maximum
  if [ $FAILURE_COUNT -ge $MAX_FAILURES ]
  then
    # Check if the last failure was more than 5 minutes ago
    CURRENT_TIME=$(date +%s)
    ELAPSED_TIME=$((CURRENT_TIME - LAST_FAILURE_TIME))

    if [ $ELAPSED_TIME -ge $ALERT_TIME ]
    then
      # Send an alert
      echo "Job has failed $FAILURE_COUNT times in the last 5 minutes" | mail -s "Job Failure Alert" your@email.com

      # Reset the failure count and last failure time
      FAILURE_COUNT=0
      LAST_FAILURE_TIME=$CURRENT_TIME
    fi
  fi
fi

# Write the current time to the log file
echo "$(date) - Job checked" >> $LOG_FILE

To use this script, you would need to replace the values for MAX_FAILURES , RETRY_TIME , ALERT_TIME , LOG_FILE , START_COMMAND , and STOP_COMMAND with the appropriate values for your specific job. You can then run the script on a schedule using a cron job or a similar scheduling mechanism.

This is straightforward and simple - all a Bash n00b would need to do is read a little on the pgrep -f flag to totally grok this.

Edit: Is “totally grok” a tautology?

4 Likes

As dirvine wrote it might be good to have as few apps with admin privileges as possible. The Safe network app should probably be able to reconnect if example connection lost. The main purpose is to make a monitor app that let people with low knowledge have an app with UI that tells if Safe node app lose connection, critical error, status update or upgrades available. The main reason is an app that once a Safe node is launched people should not have to check if the Node is running, the app should give updates example to a phone about the Safe Node status and similar. Example people could place a raspberry Pi under the table, start a node, the monitor app and then forget about that they run a node, it should just be running and tell regular status and if something goes wrong.

But creating a script is good for people who wants that, so that is also great.

Well the script doesn’t need any special privileges. Only needs to be able to run the node. ps reports your tasks or use pgrep

It is actually a lot easier than chatgpt wrote. But if you want bells and whistles then its a decent attempt.

To have a nice looking UI then the app that reports on node stats is the perfect place to put this tiny bit of code. It can have parameters of how often to check, when to declare there is a major problem, etc. To place it in any other UI app seems silly since that app does so much stats and if you are going for a UI rather than a set and forget app then use the stats UI app.

I was answering the set and forget aspect and graphics interface by its nature is not a set and forget since its using a window on the screen (or minimised)

2 Likes

safe/node is not made for wsl2, I tried it but there are networking issues that means that safe/node fails in wsl2 havent tried wsl1

WSL is nice for expert users and people who are used to Linux, but it is a pain for common Windows. Windows users (even advanced) are used to different thinking. Even if Safe worked good on WSL I would be strongly against going that way.

Anyway on both Windows and Linux it is not unusual to want some program run as system service with monitoring and restart if needed. There are built in procedures for that and it is easy to also make it send alert to remote devices.
I think now is too early to prepare some fancy 1click setup script, but as soon as there is a stable testnet that runs at least for a week I would be happy to help create how-to or script to help with that.

6 Likes