Plot of Real Test Network Data

I like this idea and it has me wonder what each node knows by default.

One obvious option is to see the current Routing Table size

Routing Table size:  27

but there must be more that’s possible. …thinking…

1 Like

I’m running it right now on the current test network and it is not finding any IP addresses in the live Node.log.

Funny… I’m sure I’ve seen IP addresses in there in the past.

Most of the lines in Node.log begin like this

     INFO 17:11:45.787311970

That long number isn’t an IP, as far as I can tell, even though it has nine digits. If it was then I would expect to be able to search for the IPs from the config file (minus dots) but nothing is found. Is that just an arbitrary message identifier? If the vault knows its group by their IPs then where is that information kept?

It’s a timestamp :wink:

4 Likes

Oh well, for now I’ll have to revamp it to count PeerId names first four digits (which is a more useful metric of number of vaults) instead of IPs.

The lines that say “added ‘blah’ to routing table” can be filtered for ‘blah’ and totalled. With network churn taht should eventually encompass every other node. then have a sliding time window for ones that don’t show up after some time.

Also, the lines that are marked “stats”

1 Like

This is what I have it doing now… a bit more useful than the zero flatline of before:

http://91.121.173.204/plot.svg

3 Likes

This is the code:

First the shell script:

#! /bin/sh
# plot the number of unique nodes added to the routing table over time

# Count the number of nodes that have been added in Node.log
# and add that number to a data file.

# Create a x-y data set ready for plotting
printf $(date +"%H:%M")" " >> ~/bin/output.dat

# Node that if you are going to use UTC as your timezone then
# you will need to set it beforehand with:
# $ sudo cp /usr/share/zoneinfo/UTC /etc/localtime

grep -o 'Added [0-9,a-f]\{4\}.. to routing table' $1 |\
 grep -o '[0-9,a-f]\{4\}' | sort -n | uniq | wc -l >> ~/bin/output.dat

# Generate the plot image from output.dat
~/bin/plot.gnu

# Copy the plot image to the web folder.
# Note that for this to work you will need to
# grant the user write privileges there.
cp -f ~/bin/plot.svg /var/www/html/

This is the plotter script that is called by the shell script:

#! /usr/bin/gnuplot
# (gnuplot version: 4.6 patchlevel 6)
#
# Plotting the data of ouput.dat

set key noautotitle

# svg
set terminal svg size 1024,768 fname 'Verdana, Helvetica, Arial, sans-serif'\
 fsize '10'
set output '~/bin/plot.svg'

# Axes label
set xlabel 'Time (UTC)'
set ylabel 'No. of Unique Nodes Added to Routing Table'

set title "SAFE TEST 3, 2016-05-17: Unique Nodes Added to Routing Table"\
 font "Arial,14"

# color definitions
set border linewidth 1.5
# set style line 1 lc rgb '#0060ad' lt 1 lw 2 pt 7 ps 1.5 # --- blue

set grid nopolar
set grid xtics nomxtics ytics nomytics noztics nomztics \
 nox2tics nomx2tics noy2tics nomy2tics nocbtics nomcbtics
set grid layerdefault   lt 0 linewidth 0.500,  lt 0 linewidth 0.500

set ytics 10
# I haven't set xtics because gnuplot takes care of it well enough.

set tics scale 1

set xdata time
set timefmt "%H:%M"
set format x "%H:%M"
set xrange ["18:00":"23:59"]
set yrange [50:250]

# set the grid lines
set style line 12 lc rgb 'blue' lt 1 lw 1.5
set style line 13 lc rgb 'blue' lt 1 lw 0.5
set grid xtics ytics mxtics mytics ls 12, ls 13

# set the frequency of the minor tics
set mxtics 6
set mytics 2

plot "~/bin/output.dat" using 1:2

This is what the output.dat file looks like after a few cycles:

18:21 64
18:22 64
18:23 64
18:24 64

This is the crontab:

    */1 * * * * /home/user/bin/run /home/user/safe_vault1/Node.log

Manual usage is (although the cron job normally takes care of it):

    $ ./run ../safe_vault1/Node.log

If you are running safe_vault as a service, as I describe here, then make the argument in the cron job, and the manual usage, point to /opt/safe_vault1/Node.log, or wherever you have installed it. That’s the only change you would need to make, since neither of the two scripts hard-wire that path.

4 Likes

Still getting gaps, due to something stopping and starting, and I just noticed that the time scale is a bit wonky, since there are not 80 minutes in an hour, for example.

EDIT: Both the gaps and the incorrect minutes are due to the x axis being formatted as integer rather than time. The time formatting is a bit arcane but I’ll have it sorted shortly.

I don’t know how you’re creating the .svg. I have awareness and less capability of d3 and some sense that can be put out as .svg, which might be easier and more flexible. One step beyond would be for users to run d3 and be passed the data for it but that’s beyond me.

It’s working correctly now!

It’s alive!! Mwahaha! :smile:

3 Likes

hahaha you can see how new datapoints are added :slight_smile: thats Awesome!

I’ve updated the code posted above, to incorporate the latest pretty formatting. If anyone can’t view the plot or there’s any flaw then please let me know. I’m not sure what will happen once it hits midnight UTC.

If you go to the main page then you can view today’s plot as well as access a link to yesterday’s plot:

http://91.121.173.204/

1 Like

The vault was hung and I restarted the server a few minutes ago, which is why you see the spike.

Actually, I’m not sure if it was hung, the connection was not responding and I restarted the machine by the provider’s control panel. It might simply have been network issues.

But the restart certainly picks up new connections, so I might include that in a future iteration.

1 Like

This is great! Thanks for putting this up :thumbsup:.

Here’s something significant: Even though it has added 60 vaults in the last few minutes, the spike (in unique node names) is only 10 vaults or so, so the other 50 are already in the log.

There are only 6 hard-coded contacts (IP addresses) in the config, and three mapper servers.

I’m not sure if that can be used as a metric of network size.

A comment on the “Test 3” topic reminds me (duh) that tools other than the safe_vault log can be used to gather IP addresses of connecting machines. In the case of the comment it was client machines (launchers) but it should be possible to do that for vaults.

But today I have to catch up on work and leave this fun for a while.

Here’s a thought:

I’m all for Maidsafe shutting down the current test network once they’re finished testing.

However, in other contexts a vault operator might do the following:

  1. Use Wireshark or some other packet sniffer to collect the IPs of all the vaults that connect to his vault.

  2. Add those IPs as hard-coded contacts to a customized, vault config. This would be much more robust than having one or two seeds.

  3. Restart the vault using the customized config.

  4. Voila! The unkillable test network. :slight_smile:

This is functionality we have, currently disabled in these tests. It’s called bootstrap cache. So every vault will check connections and those which are “directly connectable” are stored in this cache. Up to 1500 entries allowed in order of last seen.

7 Likes

Here’s a pattern I see in the plot:

It has stepwise shape, where it is flat for a period and then spikes up a little with a flurry of activity in the console. The period is about 45-60 minutes.

My guess that this periodic “waking up” is finding out who on the routing table is still active, clearing out inactive entries and acquiring some replacements via the still-active entries.

So as a first attempt at the sliding window I mentioned earlier in the topic, as a way of getting closer to a list of active nodes (and not merely nodes that existed at some time) would be to:

  1. Collect the time when each node in the routing table was last heard from.

  2. Drop from the (plotting) data file those members that haven’t been heard from in less than an hour.

  3. Force a rebuild (by restarting the vault) of the routing table on a period well under an hour: 10 minutes, say.

Since the boot cache is not yet active, my understanding is that a newly (re)started vault has only the config file to use to rebuild its routing table. So each restart is starting from exactly the same data. Since all the vaults in the test network are (presumably) initializing from the droplets in the default config file, which (again, presumably) have built routing tables much bigger than the typical community vault (I’m seeing 64 entries right now, but I would expect a droplet to have much more tahn that), and since my vault will (presumably) learn a random selection of the entries in some droplets’ routing tables, then such a sliding window of ten minutes* should eventually collect all active nodes.

Note: Or a double-sliding window: ten minutes for each rebuild, but if a node doesn’t show up in n restarts then it is dropped.

There are geographical trends in the current test net (EDIT: plot of this morning saved here.)

On today’s plot there is a huge rise in new nodes around 0700 UTC, probably due to community members on this side of the pond coming online.

EDIT: But that is evidence that the size of ones routing table is a function of the number of vaults in the network. Along with the fact that the routing table started off with 30-40 nodes and now, after nearly one day, has 60-70. This is consistent with the design goal of making stronger routing capability in the current version of the software (nodes stay on the network longer, so the total number of active nodes increases over time as more members join or start additional vaults).

1 Like