Support Issues with Baby Fleming Version 1 (Vaults Phase 2a - single-section network)

Some stats from my tweaking gossip intervals to understand their impact, seeing how long it takes to upload files of various sizes.

Baby-fleming Change A Change B
Size (kB) Time (s) Time (s) Time (s)
700 11.271 49.47 21.688
800 12.074 53.454 16.894
900 14.047 60.628 19.654
1000 14.595 57.409 19.647
1100 13.292 50.306 23.107
1200 85.711 62.416 19.867
1300 23.079 67.187 24.865
1400 32.758 48.352 22.146
1500 killed for cpu 74.33 26.145
1600 49.884 25.31
1700 46.471 27.088
1800 68.727 24.599
1900 50.433 29.201
2000 46.137 30.306
2200 60.206 26.344
2400 70.675 42.646
2600 57.213 62.955
2800 74.288 49.274
3000 360.143 93.373
3500 63.956 66.507
4000 81.542 57.404
4500 101.312 killed for cpu
5000 65.263
6000 81.199
7000 182.018
8000 162.767
9000 229.971
10000 killed for ram

Change A: Gossip slower, instead of 1s updates change to 7s updates
src/parsec.rs:L63 (or use ROUTING_GOSSIP_PERIOD environment variable)

-pub const GOSSIP_PERIOD: Duration = Duration::from_secs(1);
+pub const GOSSIP_PERIOD: Duration = Duration::from_secs(7);

Allowed larger files to complete but was overall slower for all file sizes. Shows that maybe nodes are being flooded by other gossip so can’t make up their mind about their own stuff…?

Change B: Gossip spread out between 1-3s rather than always every 1s, to try to help any lockstep situations that may be happening.
src/parsec.rs:L364

pub fn gossip_period(&self) -> Duration {
-    self.gossip_period
+    let t = rand::thread_rng().gen_range(0, 2000);
+    self.gossip_period + Duration::from_millis(t)
}

Allowed larger files to be uploaded without the performance hit from change a, but files were not as large as change a.

14 Likes