tl;dr just wanna make nice graphs and help provide some useful data.
Looking at upload speeds and performance and I have written a wee script to get some performance data.
The aim being to build a body of test results to inform the devs of immediate problems now and to monitor performance enhancements in the future.
When I think harder about this, I wonder if I should take a step back and ensure that the data captured is in a agreed JSON schema for sharing and analysis by the devs and the brainier members of the community.
Another much bigger point is that a time in seconds on its own is no use without the context of filesize, iteration number, delay between iterations, software versions, PUT or GET, CPU type and frequency, RAM fitted, disk type (SSD, sata), local or remote, connection type(ADSL, cloud etc) and a host of other parameters that need to be captured to make each individual point meaningful. Should I be capturing real, sys and user times instead of just the real time? - probably…
So which of these parameters should I capture and although most of the data points will be numeric, would it make more sense to store them all as JSON strings?
Not being a data scientist, I’m going to be trying to learn quickly probably using python about data visualisation. However if there is any point to this, I’d like to use an agreed schema - which may well already exist internally in the project but I dont know about it yet.
Also right now I collect the real time formatted in seconds only using GNU time 1.7 NOT the bash builtin time command. See man time /usr/bin/time -f "\t%e" safe files put
willie@gagarin:~/tmp/testnet-file-uploads-20210703_1345$ /usr/bin/time -V
GNU time 1.7
willie@gagarin:~/tmp/testnet-file-uploads-20210703_1345$ time -V
-V: command not found
real 0m0.458s
user 0m0.039s
sys 0m0.383s
see the difference? The problem here is GNU time 1.7 is not POSIX-compliant which will have implications perhaps for Windows users but it does let me format the time in seconds only instead of as 0m0.458s. I suppose this can be worked around with regexps/sed but I’m lazy…
HELP NEEDED!!!
While I wait to get some feedback on the above post, it’s moot unless I can output the important data properly.
Anybody know how I can get the output of the time command here but suppress the output from safe (/usr/bin/time -f "\t%e" safe files $SAFE_COMMAND $DEST_DIR/$FILESIZE.dat)?
/usr/bin/time writes to stderr whereas safe writes to stdout so the redirect to dev/null works well here. Check the -o --output flag in man /usr/bin/time
If works then all the data in whatever delimited format will be interesting for spotting what is material to performance.
Yes - for what is not affected then by other processes and users; I guess cloud servers might have other users sharing real hardware etc…
All context might be useful - OS and kernel version - whether the CPU maxes out or not might cause a delinear performance too?
Then what the file is that is being uploaded might factor… if it’s pure true random or some content that will be de-duplicated client side. Perhaps a pure random versus a pure non-random might be interesting; so, a file that is pure duplication of text, we might expect to be as fast as possible and a pure random as slow/MB as practical.
Tests of each CLI function too would be most interesting - the response to each type of request seems to be different. Is create keys significantly different to put etc. Perhaps most useful from this type of performance data will be a sense of what is consistent and a measure of any performance that less consistent… boundary of performance important too. Perhaps the network will be more responsive at times and less at others naturally relative to how many users are active… initially a lot of activity might push the limits of what is possible; so, there might be difference over 24 hours and over 7 days.
tldr; it’s all interesting - so, more the better.
So far I have this output - and despite help from @mav Im still not displaying the time correctly just yet - soon, real soon now
{"testrun":" Sun 4 Jul 00:49:20 UTC 2021", "safe_version": "sn_cli 0.31.1", "node_version":"safe_network 0.6.1", "safe_command":"put", "filesize":"5", "no-of-iterations": "5", "delay": "2" , "data":[{
"iteration": "1", "elapsed time": "
Command terminated by signal 2
178.01
"iteration": "2", "elapsed time": "
Command terminated by signal 2
8.56
"iteration": "3", "elapsed time": "
Command terminated by signal 2
5.32
"iteration": "4", "elapsed time": "
Command terminated by signal 2
6.42
"iteration": "5", "elapsed time": "
Command terminated by signal 2
4.09
"}
Indeed and to get a body of data that will allow this means I need to make the data collection process extremely simple and as easy as possible for a wide range of users across a similarly wide range of kit. I can parse the relevant CPU info from cat /proc/cpuinfo
on linux but I dont have (easy) access to a windows box so I will need help on that as well.
This script is actually adapted from some work I did earlier which would see the user uploading some standard files of varying sizes and then timing the downloads. The intent here was to examine what effect deduplication would have on downloading these sets of common chunks. For now the script generates a (supposedly) random file of the desired size from /dev/urandom which should be good enough for our purposes. I’m not researching nation-state class encryption so I think we can forgive the not abslutely random nature of /dev/urandom.
Uncoupling what is client side de-duplication time and what is network response might be useful too. Could do with more log file detail for noting timestamps of local CLI actions??
This is my proposed schema for now. I debated adding a field for random vs structured data but rejected it as deciding what kind of structured data - text, image, audio would be too involved for now
It could be useful to print date -u in every iteration. In case of perfomance drop or something unusual, it bet it will be nice to see if it hit different clients at once or how it spread across the network.
if you pull that branch, you’re basic run is (with a network already set up; could be local or WAN; you just need the node_connection_info file) cargo bench, and there’s some simple CLI output there, but more in depth charts and things are generated in the target/criterion/<bencmark> folder.
I wasn’t planning to get this into CI as yet, but just proposing it for the codebase inc ase we can work up an expanded benchmark suite that we want to use in the future.
In every iteration? There is already a timestamp at the top of the test run. Tests of course can take several hours, perhaps days to complete for large no of iterations and/or for very large filesizes but I don’t know how much value this has for for identifying bottlenecks/trends. Please convince me otherwise
As ever, PRs are very welcome indeed, especially to help identify hardware on Windows machines.
Hardware detection - well CPU model and no of cores for starters - is working now and the output should be valid JSON. Please give safenetwork-community/ upload-performance a try if you can run a local test network on linux.
EDIT: Tacking this on here so as to not totally monopolise the thread
The relatively simple hardware detection we can do is processor model, core count and RAM. the free command gives us a good summary of memory usage but what vital parameter(s) should we capture? Total and available - or simply just available?
willie@gagarin:~/projects/maidsafe/testnet-scripts/upload-performance$ free -h --mega
total used free shared buff/cache available
Mem: 15G 5.6G 1.9G 262M 8.5G 9.8G
Swap: 32G 708M 31G
Note this is captured before the test run commences. Is there any value in reading this at the conclusion of the test run - or even every iteration or is this just getting silly now?
Or put the code in the loop but disable it by default?
EDIT: Hardware detection was not working properly. shellcheck.net insulted my pet and claimed my cat was useless. So I changed it as suggested and broke the hardware detection.
Reverted to the “deprecated” way f doing things
CPU_MODEL=`cat /proc/cpuinfo| grep name|head -n1|cut -c 14-`
^-- [SC2006](https://github.com/koalaman/shellcheck/wiki/SC2006): Use $(...) notation instead of legacy backticked `...`.
^-- [SC2002](https://github.com/koalaman/shellcheck/wiki/SC2002): Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.
Did you mean: ([apply this](javascript:applyFixIndex([0])), apply [all SC2006](javascript:applyFixCode(2006)))
CPU_MODEL=$(cat /proc/cpuinfo| grep name|head -n1|cut -c 14-)