Profiling node performance

The network performance has felt somewhat slower than I would have expected (although testnet6 has been pretty impressive). So to find where the bottlenecks may be, I decided to run a profiler on a private safe network.

  1. Start a private safe network (using SafeInABox)
  2. Start a vault and record the performance profile
$ perf record -F 99 ./safe_vault
  1. Upload a 7MB file.
  2. View the result using perf report
  13.00%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_fe_mul                                                                                                                                       ▒
   9.48%  Node thread  safe_vault          [.] SHA512_Transform                                                                                                                                                          ▒
   8.87%  Node thread  safe_vault          [.] SHA256_Transform                                                                                                                                                          ▒
   8.51%  Node thread  safe_vault          [.] _$LT$std..vec..Vec$LT$T$GT$$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h55f3b8f473db3546                                                                          ▒
   6.50%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_fe_sq                                                                                                                                        ▒
   5.16%  Node thread  safe_vault          [.] _$LT$u8$u20$as$u20$rustc_serialize..Decodable$GT$::decode::he30d85359a60fabe                                                                                              ▒
   4.07%  Node thread  safe_vault          [.] _$LT$messages..RoutingMessage$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h7035b51be7c0c236                                                                        ▒
   3.46%  Node thread  safe_vault          [.] _$LT$rustc_serialize..writer..SizeChecker$u20$as$u20$rustc_serialize_crate..Encoder$GT$::emit_u8::h5643e95fac8714ec                                                       ▒
   3.22%  Node thread  [unknown]           [k] 0xffffffff8182d565                                                                                                                                                        ▒
   3.10%  Node thread  safe_vault          [.] _$LT$std..vec..Vec$LT$T$GT$$u20$as$u20$rustc_serialize..Decodable$GT$::decode::hca5a9fb63841af69                                                                          ▒
   2.67%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_fe_cmov                                                                                                                                      ▒
   1.34%  Node thread  [unknown]           [k] 0xffffffff810a975d                                                                                                                                                        ▒
   1.22%  Node thread  safe_vault          [.] _$LT$hash..sip..SipHasher$u20$as$u20$hash..Hasher$GT$::write::h44537f0d2d069bd3                                                                                           ▒
   1.09%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_fe_sub                                                                                                                                       ▒
   0.97%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_fe_sq2                                                                                                                                       ▒
   0.67%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_fe_add                                                                                                                                       ▒
   0.67%  Node thread  safe_vault          [.] slide                                                                                                                                                                     ▒
   0.67%  Node thread  [unknown]           [k] 0xffffffff81085ad1                                                                                                                                                        ▒
   0.55%  Node thread  libc-2.23.so        [.] 0x000000000009eed6                                                                                                                                                        ▒
   0.43%  Node thread  safe_vault          [.] _$LT$data..Data$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h7eec61c793e1f9ba                                                                                      ▒
   0.36%  Node thread  safe_vault          [.] _$LT$authority..Authority$u20$as$u20$std..cmp..PartialOrd$GT$::lt::hec5f4c0023bc1ed8                                                                                      ▒
   0.36%  Node thread  safe_vault          [.] cmov                                                                                                                                                                      ▒
   0.36%  Node thread  safe_vault          [.] routing::core::Core::handle_hop_message::hf95bcd68d577c27c                                                                                                                ▒
   0.30%  Node thread  safe_vault          [.] ._ladder_loop                                                                                                                                                             ▒
   0.30%  Node thread  safe_vault          [.] _$LT$authority..Authority$u20$as$u20$std..cmp..PartialOrd$GT$::gt::h6f67a70edcc95a01                                                                                      ▒
   0.30%  Node thread  safe_vault          [.] sodium_memzero                                                                                                                                                            ▒
   0.30%  Node thread  [unknown]           [k] 0xffffffff81196658                                                                                                                                                        ▒
   0.24%  Node thread  libc-2.23.so        [.] 0x000000000009eedb                                                                                                                                                        ▒
   0.24%  Node thread  safe_vault          [.] _$LT$authority..Authority$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h8eb168b01b9e1607                                                                            ▒
   0.24%  Node thread  safe_vault          [.] _$LT$message_filter..MessageFilter$LT$Message$GT$$GT$::insert::h3909c146fb499331                                                                                          ▒
   0.24%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_fe_tobytes                                                                                                                                   ▒
   0.24%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_ge_double_scalarmult_vartime                                                                                                                 ▒
   0.24%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_ge_p1p1_to_p2                                                                                                                                ▒
   0.24%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_sc_muladd                                                                                                                                    ▒
   0.18%  Node thread  libc-2.23.so        [.] 0x000000000008e8ae                                                                                                                                                        ▒
   0.18%  Node thread  libc-2.23.so        [.] 0x000000000009eee1                                                                                                                                                        ▒
   0.18%  Node thread  libc-2.23.so        [.] 0x000000000009efa0                                                                                                                                                        ▒
   0.18%  Node thread  libc-2.23.so        [.] 0x000000000009efb1                                                                                                                                                        ▒
   0.18%  Node thread  safe_vault          [.] crypto_core_curve25519_ref10_ge_p2_dbl                                                                                                                                    ▒
   0.18%  Node thread  safe_vault          [.] ge_select                                                                                                                                                                 ▒
   0.18%  Node thread  safe_vault          [.] rallocx                                                                                                                                                                   ▒
   0.18%  Node thread  safe_vault          [.] routing::core::Core::handle_event::hba15935371e0ae4c

Info about test conditions:

  • Virtualbox vm running ubuntu 16.04 and a private instance of the safe network
  • VM is allocated 3 CPU and 2048MB RAM
  • Vaults have a group size of 3, quorum size of 2
  • Network consists of 6 vaults running in docker containers on the vm
  • perf run on a 7th vault directly on the vm, ie not in a docker container

My initial thoughts:

  • It’s pleasing to see crypto operations consuming the most cpu. This makes intuitive sense. The serialization operations starting from the 4th row are the main red-flag.
  • perf is probably not the right tool to use here. I’ve been playing with automatically inserting timing log messages at the start and end of every function. I think that will generate more useful output than perf. It also allows logging within a consistent network-wide time context, which facilitates profiling the impact of messaging and bandwidth.
  • cpu is probably not the main bottleneck. It probably will be when the network is entirely in a vm using shared resources, but on a ‘live’ network this is probably not the case.

It would be great to hear your thoughts if you have experience with profiling or ideas about how to tackle this. Performance is important for user experience, but it’s no use optimizing the wrong things.

22 Likes

Having recently built a cluster of pine64s, it was time to do some minimum-viable-testing of the safe network.

This post outlines the result of ‘the simplest test that would work’ to see if the pines and software were behaving.

Parameters

File size: 243 MB file (254388820 bytes) - already a compressed file
Vault version: 0.10.5
Launcher version: 0.7.1
DemoApp version: 0.5.1
Vaults: 28 (7 pine64s, quad core, 2GB RAM, 64GB microsd, 64 bit debian, gigabit ethernet)
Group size: 3
Quorum size: 2
All limits on upload / account size etc removed.
Routing table size (min/avg/max): 11 / 15.8 / 20
Upload device: Laptop using wifi connected at 300 Mbps

Uploading Results

Time taken: 18m 43s
Space consumed*: 731 MB (748448 1K blocks)
Load average** (1/5/15 minutes): 0.71 / 0.83 / 0.76
Average bandwidth: 5.2 Mbps (731 MB / 1123 s * 8 bitsPerByte)

*Space consumed is measured by summing the output of df on all machines before and after upload, so includes space consumed by logging, ie not just the file chunks.

**Load average is measured at the end of the process on one of the pine64s so others will have different loads. For download, previous processes will affect the 15m load since download took less than 15m.

Downloading Results

Time taken: 9m 42s
Load average (1/5/15 minutes): 0.36 / 0.39 / 0.50
Average bandwidth: 3.3 Mbps (243 MB / 582 s * 8 bitsPerByte)

Comparison with other methods

Rough indication of the ethernet and microsd capability using same 243 MB file.

scp upload: 45s at 43.2 Mbps
scp download: 14s at 138.4 Mbps
cp locally: 1s

Observations

DemoApp sat at 99% uploaded for about 50 seconds, which is much more than 1% of total upload time.

Space consumed is just over triple the file size, which is expected given the triple-redundancy on the network.

CPU and network load would seem to indicate a lot of potential for parallelism.

I was surprised cpu load was not close to 100% on all cores. Since neither network nor cpu were saturated it seems that the launcher and / or demoapp are currently the main performance bottleneck. This is good since software is much easier to optimize than hardware!

Future testing

As stated in the original post, I’m looking forward to doing some tests to pinpoint bottlenecks. I’m interested in suggestions for tests that may benefit the development of the safe network.

I know this can be done on a cloud service, but why turn down an opportunity to build a cool computer?!

13 Likes

Pretty cool setup there. Just today I got 3 Pine 64s but no SD cards or other things I ordered so its going to be another round of waiting for them to send all the missing bits.

Be really interested in not just the vault performance using them, but your general impression of the PINE 64s as Vaults

Can you also include power usage if possible. Even a watt measuring on the AC feeding the rig would give useful info.

2 Likes

300 Mbps is link speed. I think the scp test of 138.4 Mbps download gives a better indication of actual network speed.

I put a meter on the atx psu today and it read 12W on boot, idled around 9W and uploading a file showed 10W (that’s power to 7 pine64s). I haven’t maxed the rig out yet, but this is at least an early indicator of power consumption.

My impression on the pine generally as a vault is they should be good, but my main concern is no sata port. Probably doesn’t matter since internet bandwidth will surely always be the bottleneck so a usb-hdd dock should work fine. I went pine64 over the rpi3 because pine64 has gigabit ethernet, which for the kind of testing I’m doing is non-negotiable. Of the ten I got, two seem to have flakey ethernet so the build quality seems to be still ‘early stages’.


I measured cpu load on all machines at the end of uploading a 1.4 GB file. It was surprisingly less consistent than I would have thought.

$ salt '*' cmd.run 'cat /proc/loadavg'
pine64-55:
    0.81 0.81 0.70 1/122 7321
pine64-56:
    0.14 0.25 0.25 1/122 7329
pine64-54:
    0.08 0.14 0.15 1/125 7318
pine64-52:
    0.84 1.03 0.91 1/123 7315
pine64-53:
    0.21 0.27 0.23 1/121 7337
pine64-50:
    0.18 0.41 0.41 1/126 7488
pine64-51:
    0.97 1.22 1.15 1/125 7348

According to a comparison of df before and after upload, the disk usage is quite consistent (as expected).

           Before   After    Difference
pine64-50  1625784  1728824  103040
pine64-51  1623984  1732740  108756
pine64-52  1624580  1734168  109588
pine64-53  1624480  1740240  115760
pine64-54  1624520  1716048   91528
pine64-55  1624360  1765624  141264
pine64-56  1624268  1753796  129528

Anyhow… this is really just farting around. Hopefully I have time to do some more rigorous testing in the very near future.

3 Likes

I wanted a way to profile a network of vaults, so perf as in the original post wasn’t going to cut it.

I added log messages at the start and end of every function in the vault code. Parsing the timestamps of the messages gives an indication of the time consumed by each method.

Here’s the environment for the first experiment:

Vault 10.5.0
Group Size of 3
Quorum Size of 2
7 x Pine64
Quadcore
Debian aarch64
2GB RAM
Gigabit ethernet
64GB Kingston MicroSD
28 vaults total

Launcher 0.7.1
Ubuntu 16.04 x64

Demo App 0.5.1
Ubuntu 16.04 x64

10:52:45 - start file upload

Size: 686.8 MB
Name: ubuntu-16.04-server-amd64.iso
SHA256: b8b172cbdf04f5ff8adc8c2c1b4007ccf66f00fc6a324a6da6eba67de71746f6

11:52:00 - end file upload

and the results:


                                                 Method    Duration (s)  RunCount
                                       vault.rs:115:run  2552.60870983        33
                                 chunk_store.rs:103:put  1510.47436018      2046
          data_manager.rs:578:send_gets_for_needed_data  1173.55571553     20937
                maid_manager.rs:285:forward_put_request  967.770070733      1992
                             vault.rs:175:process_event  205.934719917     14799
                         data_manager.rs:340:handle_put  176.054347865      1992
                       maid_manager.rs:239:send_refresh  111.245907806      1992
                 maid_manager.rs:117:handle_put_success   80.609155668      1992
                          vault.rs:83:vault_with_config   38.724921621        61
                     maid_manager.rs:180:handle_refresh   33.742462612      1992
                       data_manager.rs:759:send_refresh   33.543854022      2046
                        data_manager.rs:200:needed_data   31.490390826     20904
          maid_manager.rs:251:handle_put_immutable_data   14.932050575      1965
                               vault.rs:264:on_response   12.512659774      1992
                                 chunk_store.rs:150:has    8.159158608      3957
                                vault.rs:205:on_request    7.215548172     12303
                          maid_manager.rs:94:handle_put    3.472981887      1992
                           chunk_store.rs:219:file_path    3.271853595      6255
                           chunk_store.rs:210:do_delete    2.397543187      2046
                                chunk_store.rs:227:drop    2.119100287        33
... only top 20 shown

The surprising part to me is chunk_store.rs:103:put consumed the most time, ie writing the chunk to the microsd.

2046 chunks at 1 MB/chunk is almost exactly 3 x 686 (very satisfying!)

But the rate of 2046 MB / 1510 s = 1.35 MB/s is extremely low.

The next question I want to explore is synchronous vs asynchronous characteristics on the network, ie where are the blockages that prevent full consumption of network / cpu / io?

Any comments, questions or grievances are most welcome.

8 Likes

Due to the surprising result in the post above (writing files to disk takes the most time), I dug a bit deeper into the file-writing part of the vault.

I added more detailed logging in the file-writing section of chunk_store.rs.

I uploaded the same file as the last post (a 686 MB file over the course of an hour).

Accumulating the time taken by all vaults at this part of the code highlights that sync_all is taking by far the most time:

                               Method        Time (s)
      chunk_store.rs:114:File::create    87.032883285
    chunk_store.rs:115:file.write_all    23.501575157
     chunk_store.rs:117:file.sync_all  1686.019508590
     chunk_store.rs:118:file.metadata    10.044445459
 chunk_store.rs:120:update_used_space     0.032858072

Rust docs for sync_all:

Attempts to sync all OS-internal metadata to disk.

This function will attempt to ensure that all in-core data reaches the filesystem before returning.

Any vault operator trying to earn safecoin would not wait but instead return a successful write and do the actual commit to disk asynchronously in the background. I reckon it would be better for the user experience and for the network as a whole to do file writing in the background and return success as soon as the data is received over the wire. I’m sure this will happen anyway since the incentives encourage it. Even if disk io is not the main bottleneck on the live network, it’s easy pickings for intrepid vault operators that want to minimize their response time.

There was significant variation in the amount of time it took to write files to disk. I’m not sure why such variation exists. The following table indicates it is not correlated to cpu use, nor a particular microsd. As far as I can tell, there is no clear correlation between any variables.

   Machine     Vault    Writes   Time Spent  Cpu Load (15m)
                                Writing (%)

         1         1       127         1.7            2.23
                   2        88         9.3               
                   3        25         8.4               
                   4       129         1.9               

         2         1        23         8.7            1.24
                   2        83         5.2               
                   3       125         1.8               
                   4        15         0.7               

         3         1        41         2.0            0.51
                   2        90         3.6               
                   3       106         9.4               
                   4        93         3.4               

         4         1        41         0.7            0.28
                   2        23         0.1               
                   3        93         1.0               
                   4        80         0.3               

         5         1        83         0.7            0.70
                   2        77         1.6               
                   3        80         0.7               
                   4        88         1.7               

         6         1        77         0.5            0.25
                   2        88         1.7               
                   3        93         1.2               
                   4       117         9.5               

         7         1        22         9.3            0.22
                   2        77         1.6               
                   3        41         1.0               
                   4        21         0.2               

In summary, waiting for files to be written to disk is taking a lot of time. The user experience and network performance could be improved by assuming a successful write to disk once data is received, and performing disk io in the background.

13 Likes

Wonderful work. A lot of improvement can be done.

Maybe @anon86652309 or @Viv can give us some thoughts.
.

2 Likes

Great to see some real numbers. Good work. I wonder what the reason for the seeming randomness in write times is? Would be good to have that one explained.

2 Likes

Trying to isolate the primary slowness, I changed chunks from store-on-disk to store-in-memory. The change uses a HashMap and can be seen in chunk_store_ram.rs.

The results were surprising - no difference in performance.

I measured the estimated-time-to-completion during the upload process. It shows a fast start, then a gradually worsening slow-down until completion.

From the graph, the estimated time to completion is about 20-25m at 10% uploaded, but the actual time to completion ends up being about 50m. What causes this slow-down? Surely a-chunk-is-a-chunk-is-a-chunk but it seems that later chunks take longer to save than earlier chunks.

This is a 686 MB file. The slowdown probably won’t be seen on public test/alpha networks where the file size is limited to 25 MB (not to mention the bandwidth bottleneck which my test network doesn’t have).

Here are some links to the raw data for RAM timing and Disk timing.

This was a very surprising result and has left me scratching my head. My next step is to build a visualization of the logs similar to chrome debug tools that shows network activity on a timeline.

  • Are the logs misbehaving somehow? I’m using defer from this stack overflow code sample to log messages at the end of each function. It may be causing grief.
  • Is this happening as a side-effect of some other bit of the code?
  • Are there gaps of time between messages between vaults?
  • Can I trust the relatively new pine64 boards to behave?

I think many of these doubts can be probed once a suitable visualization is available. Any other ideas on how to approach this would be most welcome.

6 Likes

That’s very strange. Are you sure that ALL your vaults use the in memory chunk-store?

Yes, I ran
$ salt '*' cmd.run 'ls /path/to/vault_X/safe_vault_chunk_store/'
several times and only the lock file was in there as expected. Also I checked ram usage and it was much higher than with disk vaults.

The takeaway seems to be that disk io is not the real bottleneck, even though the log messages seemed to report it that way. I am somewhat dubious of the defer method for logging…

Here’s the top time-consuming methods for vaults running with store-in-memory.

chunk_store:put is no longer taking significant time.

                                                 Method   Duration (s)  RunCount
                                       vault.rs:115:run  2154.42219607        33
          data_manager.rs:579:send_gets_for_needed_data  1832.13618426     20869
                maid_manager.rs:285:forward_put_request  1460.65173148      1992
                         data_manager.rs:340:handle_put  495.183596764      1992
                       data_manager.rs:760:send_refresh  178.810385047      2046
                       maid_manager.rs:239:send_refresh  121.563561393      1992
                                 chunk_store.rs:105:put   73.137678166      2046
                          vault.rs:83:vault_with_config   50.763337318        59
...

Maybe you’re swapping to disk: if I understand correctly your physical machines have 2GB memory shared by 4 virtual machines. That’s probably not enough. In a way you’re still using disk.

Swap was not being used. I’m going to run it again and double check, but consider:

3 copies of 686 MB file = 2058 MB
Assume roughly equal distribution of chunks across all 7 devices
2058 / 7 = 294 MB of memory

The test file should easily fit in memory.

edit: I should also add, the vaults are not running in virtual machines, they run on the native os using separate ports and chunk store directories, so each vault has access to as much or as little memory as it requires from the full 2GB.

Appreciate the thoughts, swap is certainly a contender and I will be sure to double check it.

1 Like

Long post so I’ll put the conclusion first:

The routing event loop can be improved, which may improve upload speed.

Here’s the trail of evidence:

From previous tests, send_gets_for_needed_data is currently consuming the most time in the vault (assuming chunks are written to disk in the background).

The graph below shows there are intermittent delays. send_gets_for_needed_data completes in less than 100ms 90% of the time, but the other 10% can take up to 6s to complete.

send_gets_for_needed_data is called very often - 20,831 times. For context, this is more than 10 times the number of put calls (2046 puts). The high volume means it must be fast all the time otherwise those few slow calls really begin to add up.

Looking at the timing inside send_gets_for_needed_data shows where the problem is:

let src = Authority::ManagedNode(try!(self.routing_node.name()));
time spent: 5916.90330509
"all other lines of code in send_gets_for_needed_data combined"
time spent: 1.882290369

What’s taking so much time in routing_node.name? Here’s the line-by-line timing:

routing:node:name - create channel - 0.340465389
routing:node:name - send request   - 0.500385346
routing:node:name - receive result - 1794.10717215

Waiting for the result accounts for 99.9% of the total time executing name().

Why so much waiting for the name? It’s just a property of the node. It should be returned almost instantly.

name() returns the result asynchronously via action_sender, so it must wait for other items in the event queue to complete. This can be seen on the graph below.

In the logs it’s also very clear. Here’s a particularly bad example of the backlog causing name to be slow to return:

21067720255750 start data_manager.rs:579:send_gets_for_needed_data
21067720267834 Name Request Happens
21067721697429 CrustEvent::Message::Hop
21067771355696 CrustEvent::Message::Hop
21067821400019 CrustEvent::Message::Hop
21067870456471 CrustEvent::Message::Hop
21067920365164 CrustEvent::Message::Hop
21067941593830 CrustEvent::Message::Hop
21067963335098 CrustEvent::Message::Hop
21067987394495 CrustEvent::Message::Hop
21068008237521 CrustEvent::Message::Hop
21068033712512 CrustEvent::Message::Hop
21068053956183 CrustEvent::Message::Hop
21068077750611 CrustEvent::Message::Hop
21068097937196 CrustEvent::Message::Hop
21068121544659 CrustEvent::Message::Hop
21068144560600 CrustEvent::Message::Hop
21068196893925 CrustEvent::Message::Hop
21068247295219 CrustEvent::Message::Hop
21068296141581 CrustEvent::Message::Hop
21068348779625 CrustEvent::Message::Hop
21068398292887 CrustEvent::Message::Hop
21068448006780 CrustEvent::Message::Hop
21068501791076 CrustEvent::Message::Hop
21068549567647 CrustEvent::Message::Hop
21068572497293 CrustEvent::Message::Hop
21068593907383 CrustEvent::Message::Hop
21068645624768 CrustEvent::Message::Hop
21068696443703 CrustEvent::Message::Hop
21068745756540 CrustEvent::Message::Hop
21068797673474 CrustEvent::Message::Hop
21068821118431 CrustEvent::Message::Hop
21068871433389 CrustEvent::Message::Hop
21068921130949 CrustEvent::Message::Hop
21068971024724 CrustEvent::Message::Hop
21069021942705 CrustEvent::Message::Hop
21069071882982 CrustEvent::Message::Hop
21069122009266 CrustEvent::Message::Hop
21069146865192 CrustEvent::Message::Hop
21069197306196 CrustEvent::Message::Hop
21072624102073 CrustEvent::Message::Hop
21072673830676 CrustEvent::Message::Hop
21072696150382 CrustEvent::Message::Hop
21072718986066 CrustEvent::Message::Hop
21072771431853 CrustEvent::Message::Hop
21072818312267 CrustEvent::Message::Hop
21072842778928 CrustEvent::Message::Hop
21072892935213 CrustEvent::Message::Hop
21072944337545 CrustEvent::Message::Hop
21072967923757 CrustEvent::Message::Hop
21072992335958 CrustEvent::Message::Hop
21073037820778 CrustEvent::Message::Hop
21073057693935 CrustEvent::Message::Hop
21073099556203 CrustEvent::Message::Hop
21073118999886 CrustEvent::Message::Hop
21073169036208 CrustEvent::Message::Hop
21073219543548 CrustEvent::Message::Hop
21073246460801 CrustEvent::Message::Hop
21073299304603 CrustEvent::Message::Hop
21073325347156 CrustEvent::Message::Hop
21073377902739 CrustEvent::Message::Hop
21073405462600 CrustEvent::Message::Hop
21073457020270 CrustEvent::Message::Hop
21073482864108 CrustEvent::Message::Hop
21073537913992 CrustEvent::Message::Hop
21073591831168 CrustEvent::Message::Hop
21073616487545 CrustEvent::Message::Hop
21073642275714 CrustEvent::Message::Hop
21073696074718 CrustEvent::Message::Hop
21073721220822 CrustEvent::Message::Hop
21073772763825 CrustEvent::Message::Hop
21073799069930 CrustEvent::Message::Hop
21073852988731 CrustEvent::Message::Hop
21073879074327 CrustEvent::Message::Hop
21073931569116 CrustEvent::Message::Hop
21073982336008 CrustEvent::Message::Hop
21074008557110 CrustEvent::Message::Hop
21074193074109 CrustEvent::Message::Hop
21074220315457 CrustEvent::Message::Hop
21074244276266 CrustEvent::Message::Hop
21074257859481 CrustEvent::Message::Hop
21074283286928 CrustEvent::Message::Hop
21074307013854 CrustEvent::Message::Hop
21074320526107 CrustEvent::Message::Hop
21074345425160 CrustEvent::Message::Hop
21074367746908 CrustEvent::Message::Hop
21074381849600 CrustEvent::Message::Hop
21074404708618 CrustEvent::Message::Hop
21074428633260 CrustEvent::Message::Hop
21074452144636 CrustEvent::Message::Hop
21074466254203 CrustEvent::Message::Hop
21074489858041 CrustEvent::Message::Hop
21074513806224 CrustEvent::Message::Hop
21074538099129 CrustEvent::Message::Hop
21074561772428 CrustEvent::Message::Hop
21074575123050 CrustEvent::Message::Hop
21074599396621 CrustEvent::Message::Hop
21074622817286 CrustEvent::Message::Hop
21074645575008 CrustEvent::Message::Hop
21074670535772 CrustEvent::Message::Hop
21074684696716 CrustEvent::Message::Hop
21074708283428 CrustEvent::Message::Hop
21074731759636 CrustEvent::Message::Hop
21074754820870 CrustEvent::Message::Hop
21074781077848 CrustEvent::Message::Hop
21074803612771 CrustEvent::Message::Hop
21074825461252 CrustEvent::Message::Hop
21074850483267 CrustEvent::Message::Hop
21074872613467 CrustEvent::Message::Hop
21074896753199 CrustEvent::Message::Hop
21074920094694 CrustEvent::Message::Hop
21074944880492 CrustEvent::Message::Hop
21074966802351 CrustEvent::Message::Hop
21074990332477 CrustEvent::Message::Hop
21075006702004 CrustEvent::Message::Hop
21075028687031 CrustEvent::Message::Hop
21075043091068 CrustEvent::Message::Hop
21075066726990 CrustEvent::Message::Hop
21075080974938 CrustEvent::Message::Hop
21075104259263 CrustEvent::Message::Hop
21075129753588 CrustEvent::Message::Hop
21075143761485 CrustEvent::Message::Hop
21075158159230 CrustEvent::Message::Hop
21075180201218 CrustEvent::Message::Hop
21075195514663 CrustEvent::Message::Hop
21075218302470 CrustEvent::Message::Hop
21075241674424 CrustEvent::Message::Hop
21075258707642 CrustEvent::Message::Hop
21075282958004 CrustEvent::Message::Hop
21075298858680 CrustEvent::Message::Hop
21075321388103 Action::Name   -----------> This is where name is evaluated
21075321630362 Name Response Happens
21075321686572 start data_manager.rs:200:needed_data
21075321647446 CrustEvent::Message::Hop
21075321712448 end data_manager.rs:200:needed_data
21075321753074 start data_manager.rs:243:print_stats
21075321766867 end data_manager.rs:243:print_stats
21075321779659 end data_manager.rs:579:send_gets_for_needed_data

Ideally, calling name() should not wait for an arbitrary number of Crust events to complete, especially if none of those events affect the value of the response.

How might this be resolved?

  • Prioritize local events over network events.
  • Prioritize quick events over slow events (eg returning a property vs verifying a cryptographic signature).
  • Use more than one thread to manage the event loop
  • Process independent events concurrently.

Message prioritization is complex, so I can’t say for sure if the suggestions above will work as intended. I plan to find out!

I’m going to try changing the event processing so name doesn’t take so long to return. There’s no avoiding the intervening Hop messages (which require signature verification), but we’ll see what happens. Stay tuned…

17 Likes

Awesome, thanks for the detailed analysis!

Use more than one thread to manage the event loop

… or do the vault stuff in the event thread instead of a separate one. That would be my preference; I don’t like that safe_vault and routing communicate via events, and I think that one event loop should be enough: Everything that’s computationally expensive shouldn’t happen in that thread anyway.

5 Likes

To me this is the most interesting thread on this forum at the moment. There is clearly an issue with performance of the vaults in the wild. I was wondering if this was deliberate as part of the test process, but the lack of response from anyone in the know when this issue has been brought up leads me to suspect otherwise. I think you are onto something. Good luck with your future experiments. Wish I had the technical nous to help out.

1 Like

from the TEST8 thread:

1 Like

You don’t say whether this is at the beginning of the testnet or later. That’s relevant because the behaviour of the testnet changed radically over the days and weeks of its existence, from quick launcher responsiveness to eventual non-responsiveness. Without that time dimension it is worthless.

The tests are being done on a private network, as I understand it, so it’s unrelated to the testnets.

2 Likes