Log viewer for SAFE nodes

I’ve been finding it difficult to really get my hands dirty with safe_vault because the main interface we have is the log files, which are great but difficult to use especially when there are multiple of them as there always are.

So I made log_viewer_for_safe_vaults. Code is very prototype (read: messy).

Quick Start Instructions.

I’m still not totally sure how it will be useful, and features are very thin just now but will be expanded as needed (so if you have suggestions I’d love to hear them).

Some things I’ve found from it so far that would be extremely difficult just from the logs.

Aggregated log lines

Having all log lines in chronological order is extremely helpful since it shows the connections of events between vaults much more clearly than side-by-side logs.

Visual Patterns

Seeing visually the density and arrangement of data on the timeline is very helpful.

Groups of points in a vertical line show ‘simultaneous’ events.

Single long horizontal lines show possible stalling points.

Dense clusters with large white space between show inactivity (sometimes intentional, sometimes not).

Vertical time marker shows which events happen before / after that time, which is unusually helpful when looking at lots of very closely spaced events.

Examples

From the example images above, some things that are seen which are not easily done from logs.

First image:

Vaults are seen to join at 20s intervals, one at a time, and each join event takes a little longer than the one before. This means that I probably don’t want to set a very rapid join rate or these would begin to overlap. This would be extremely hard to tell just by looking at logs.

The last vaults to join generate very little activity, suggesting elders + parsec may be the primary factor in workload during network start.

There’s a long gap of no activity. Vaults do not seem to be doing ‘idle work’, they really do seem to be at rest when nothing is happening. If I was seeing continuous log messages I would wonder why and be able to investigate.

At the end some burst of activity that only affects elders. Makes me wonder about the joining / status of the non-elder vaults. Again, very hard to see this easily from just log files.

Second image (zoomed into the time where the blue vault at the bottom of the timeline is joining):

The activity during vault joining is not dominated by any one vault.

It took about 2.5s (see the heading).

Messages seem to be mostly around 10-30 ms apart, there’s not a lot of parallel activity except just prior to the time marker.

There’s a few darker dots (which indicate multiple very closely timed events) which may be of interest.


Most of all, the tool is just interesting to play with. As I play with it more I’m sure other useful things will come up. Hopefully this tool is useful to others, and if you feel anything is missing I’m keen to add more stuff to make it extra useful.

I’m thinking a filter to show/hide specific messages will be essential?

Maybe some kind of lines joining related events so the flow of, say, one chunk upload becomes clearer?

Maybe the need for new log messages will become more apparent?

Maybe include client logs as well as vault logs?

Who knows what else…

29 Likes

Ah, @mav that is great stuff :clap: I will be getting this set up locally!

:+1:

That would be great.

8 Likes

After some faffing, I am starting to get places with this.
My problem was choosing log files from different directories - as in the standard baby-fleming output
so I did this

`cd /home/$USER/.vault/baby-fleming
mkdir logs
for i in genesis 2 3 4 5 6 7 8 ; do cp safe-vault-$i/safe_vault.log logs/vault$i.log; done`

then I can go to baby-fleming/logs and drag all 8 logs onto the log viewer UI.

I used a set of logs that ran for 22 hours, Im not seeing much down below except some coloured dots as yet. I’ll keep trying - thanks @mav

2 Likes

Thanks for having a look!

Yeah this will cause issues because of the amount of data. I’ve not done any optimisation and it’s a bit sluggish even with small logs. I’ll get to that soon, but for now I’m focused on making it useful first, then focus on making it work fast and for large data sets. I’ll add some note about this in the readme.

For example, it plots all events on the timeline when probably the overlapping points could be left out of the timeline which would improve performance a lot.

Maybe some pre-processing cli tools might also be handy, eg remove-before-time and remove-after-time and some pattern filtering, Can be done with grep etc but this could potentially make it much easier.

Also refactor the tool so it displays info progressively as it loads rather than only after completely parsed.

Thanks for the feedback.

5 Likes

Sorry I havent spent more time on this today - distracted with family stuf and trying to get FaH working on an Ubuntu 20.04 box – Just Say No, kids.

FaH is wonderful but fahcontrol still relies on python2.7 :frowning: and now they are super busy, I dont see this getting sorted soon.

Anyhow I’ll try to look at it again tonight with smaller logs

2 Likes

This is what I get see when I initialise the vaults and then send 50 lots of 90kb random data. The vault initialisation phase is clearly seen followed by the bulk PUT testing.

Great work @maz - very impressed. I look forward to seeing how this tool evolves.

6 Likes

Will we expect to host multiple vaults… or would it be better to consider what one vault is outputting and what more that could be??

3 Likes

I expect most of us will host only one vault, but its good to see how these will interoperate.

1 Like

Tiny Pr just updating against current log format: fix: updates to work with nodes by joshuef · Pull Request #1 · iancoleman/log_viewer_for_safe_nodes · GitHub

5 Likes

Nice! Coincidentally I just started using this tool myself again today :slight_smile:

renamed the repo too to log_viewer_for_safe_nodes

I have also been including authd log too, which is a different format, so I changed to detect date on either index 1 or 2, I’ll push some changes in the next day or two hopefully. Hopefully performance improvements too… depending how my time pans out.

4 Likes

Nice. I have a branch where I’m doing some basic filtering and things too. I’ll be getting that tidied up and PRing shortlish.

We’re starting to use this for some detective work :female_detective:

7 Likes

The change to binary search is on my radar for tomorrow, I’m doing some long tests these days so the tool becomes very laggy with those big logs. Hopefully fixing the search will help it:

3 Likes

Awesome.

Just fired in: Filtering for node events and make log line parts optional by joshuef · Pull Request #2 · iancoleman/log_viewer_for_safe_nodes · GitHub

Let me know how that’s looking (adds basic text filtering; ability to exclude parts of log msgs for ease of reading).

One thing to note is that I didn’t find how/where to get the drawChart to go along with the filtered allLines so that is not currently representative of the log lines itself.

If you’ve some pointers on how/where to tackle that I’ll get that straightened out asap.

3 Likes

The chart now shows only the points matching the filter, see db2d81d.

Also some other stuff.

2 Likes

tiny pr so multi-section networks’ chart doesnt take over my screen: Limit chart height. by joshuef · Pull Request #3 · iancoleman/log_viewer_for_safe_nodes · GitHub

5 Likes