Hardware, Network, Communications Speeds, Lags, Bottlenecks


#1

Continuing the discussion from Wait, The Safe Network CAN'T Be Used As A VPN?:

Warren’s post raised some interesting questions that have been touched on in other threads in other categories. Topic is left as uncategorised because we don’t seem to have a suitable category for it

From my understanding of the dynamics in the communications/processing the overall speed of the system/network is determined by a few aspects while the other areas are miniscule in comparison. The discussion has to rely in part on previous assurances from the dev team while other areas rely on physics.

Please leave quantum physics/computing as theoretical as no one has them for the foreseeable future, and yes these would make SAFE almost instantaneous as it would most communications protocols/systems.

In my understanding the areas that have greatest affect on speed

Speed of light and resulting speed in wire/fibre

The time required to send data around the world will probability slow down the network the 2nd most. We know that there will be many hops for any chunk/SD to be sent from vaults to the requester. We also know that each hop is between close nodes in XOR space, can mean any physical distance.

It would not be unreasonable to assume that any transfer will involve hops between continents. In a true XOR space system it is expected that for even population distribution the average hop would be 1/4 way around the world. BUT the major internet population areas are USA and the EU. Asia/India are catching up fast. So the average distance for the next couple of years is more likely to be reasonably less. Lets use an overall average of 4,000kM using USA/EU as the major users of SAFE for the near future.

At 4,000kM and speed of light in a medium/repeaters approx 1/2 speed in a vacuum, we see that average delay is going to be 26 milli Seconds one way and handshaking requires at least 3 transmissions giving approx 80mSecs delay when sending a packet.

With large packets a 1MB chink may require 2 or 3 packets to the time is approx 80mSec delay for handshake and 3*26mSec for each packet. Thus approx 160mSec delay for each chunk due to electrical/light delay.

Data Rate

This is the second component in the communications delay issue and the major factor in the delay. A packet is not usable until it is fully received and checksum verified. In simple terms the real lag is the speed of the physical electrical/light layer and time required to receive the whole block over the physical layer.

With the current internet we can relay on the connecting backbone communications being greater than any ISP<–>Customer link and will not be significant.

For sender & receiver having a 10MBit/Sec UP/Down link, the data rate delay for a 1MB chunk is approx 0.8 seconds. The delay will be controlled by slowest of the senders UP link speed and the receiver’s Down link speed.

Internet congestion

Some countries have “international” congestion due to the lack of adequate undersea links.
Many/most ISPs have congestion issues.

Most of congestion occurs during “peak” times

This will slow down packets across the internet by an indeterminate amount. Sometimes a few milli seconds and sometimes hundreds of milli seconds

SAFE Routing

Unsure at this time, I just don’t know the dynamics that will be required to successfully connect two nodes together. But it is assumed that the equivalent of one/few hop(s) will be required sometimes.

Areas with low impact on speed

Processor - most work in hops is data movement which is very fast (micro seconds)
Other Hardware - This works in nano seconds speeds and requires little work.

When data reconstruction (decryption) OR Node processing (decisions) occur the processor has more work to do, but we have been assured that this will be low as one would expect from decrypting and decision making. Expect less than a couple of milli seconds per chunk on a PC

Routing processing, minimal - realise that PCs are already routing data when communicating on the net.

Summary

In my opinion using the analysis above the delays (lag) from the communication medium and passing chunks/data from one node to the next many times to get a chunk/data from vault to receiver is by and far the determining factor for the speed of the network.

Processor & hardware speeds play only a minor part in the delay equation. It is in terms of microseconds compared to milliseconds per hop. The largest processor/hardware delay will be the en/decrypting of the data at the client machine, but on modern processors that will still be much much less than even one average hop distance delay.

For each hop using fast connections 10MBit/s upload we can see that the average expected delay will be on the order of (take 0mSec congestion delay)

150 mSecs   +  800 mSecs       + (max) 1 mSec    + 0 mSec ~= 950mSec  (ave per hop)
link delay  + transfer delay   + H/W processor   + congestion

For a network requiring 5 hops per chunk the delays can be expected (for a USA/EU person) to be on the order of 4.5 - 5 seconds for the first chunk. Chunks can be in parallel so ~5 seconds for 1MB or 1GB The Hardware contributes about 10 mSecs in the whole delay. 1 in 400 (0.25%)

Caching allows often used chunks to have less hops.

What can be done to speed this up?

Yes quantum entanglement communications :slight_smile: but seriously really looking for practical solutions for todays technologies.

I can think of

  • increased caching to reduce hops overall. This is a overall average speeding up, but useless for private data usually accessed occasionally.
  • Somehow reduce hops while keeping anonymity/security at promised levels. Do not know how though.
  • have more vaults in rented farms with high bandwidth. This means the average distance is reduced and data rate between those nodes increases 100 fold. Only good if enough hops occurs between these “virtual” machines
  • Reduce max chunk size. At 1/2 the size the average delay becomes 150+400+1 ~= 550 ms and ~ 2.5 seconds for 5 hops, and then parallel getting means approx same time for 1/2MB file and 1GB file.
  • EDIT: pass the chunk packet by packet through the “hop” node so that the transfer delay can be reduced to one packet size and nullifies benefit from reducing the chunk size


Please can others help out with these figures as they seem to be rather high for a simple 5 hop chunk transfer. Have I made a major mistake in my napkin analysis


#2

I think we could eliminate handshakes from the equation by relying on UDP (uTP, RUDP) by default in Crust.

The transfer delay is the main bottleneck in your equation. You call 10 MBit/s fast, though I guess that’s a matter of perspective, mainly dependent on where you live. Where I live 50-100 MBit/s is not uncommon (got about 95 MBit/s upload myself). This is one of those areas where “our” ideals may conflict with optimal performance. We want farming to be viable for everyone, though the network would perhaps be a whole lot faster if it’d only be viable for those with high-end connections and machines.

Anyway, the transfer delay could theoretically be solved by passing on every bit of a chunk as soon as it is received rather than waiting until the whole chunk has arrived before starting to pass it on to the next node. This would make it impossible to validate that the data is correct before transmission, which I guess would be problematic.


#3

Yea, I kinda took 10MBit/s up as an average in the general calculations. In Australia 80% have 1Mbits/s or 2Mbits/s maximum upload. ADSL is 1 and cable max is 2. A lucky few have fiber to the home and they can pay for 40Mbit/s upload (100Mbit/s down)

Yes I was a tad quick in my analysis and didn’t even consider that the 1MB chunk would be in a few packets until the end of my post.

You are right, if each packet is passed on as it comes into a “hop” node then the transfer delay reduces to the order of one packet. If packets remain at approx 1500 bytes then this is minimal time.

I did not think that “hop” nodes validated the chunk since they have error detection in each packet. It something that I need to find out from the crust? library.

Passing on packet by packet rather than waiting for the whole chunk would definitely speed things up and I would think that it also allows slower link nodes/vaults to participate “better”

EDIT: that would mean that the 5 hop example would be less than one second.


#4

I just made a wild guess there, so you’re probably right.

One thing I missed in my previous analysis in the vault routing burden topic is that the amount of hops is not O(log(n)), but , where k is the bucket size in the routing table of a node. In other words, the higher k is, the more knowledge a particular node has of the DHT. So the amount of hops it takes to route can be reduced by increasing k. This increases the amount of connections a node maintains, so it’s not without a cost. But I imagine that as hardware performance increases over time, we can increase k to reduce the amounts of hops.

A quick look at https://github.com/maidsafe/routing/blob/90b3044ad2df4847d98e48700077e47e7ac9db70/src/routing_table.rs reveals that BUCKET_SIZE is currently 1 and that the OPTIMAL_SIZE of the routing table is set at 64. So I think that means that 64 nodes are divided over 512 buckets. This seems rather low to me, but maybe maintaining more connections is very costly, or perhaps these are just testing values. It would be cool if one of the devs could comment on this?


#5

Interesting. Of course we need a few hops at least to keep anonymity.

BTW I never did find what Theta was in that equation. (it was always written as “O” previously)


#6

Big O describes an asymptotically upper bound, Big Theta an asymptotically lower and upper bound. If that still makes no sense, I guess one could say that Big O is like a worst-case scenario, and Big Theta more like an average?


#7

So if I was to put some figures to the equation what would I put as either Theta or “O”. Sorry this really did not make sense but an example would explain it.

OR is it upper bound is LOG(n) and average is LOG.k(n)

maybe I should be reading the whitepaper :smile:


#8

I haven’t studied the mathematics behind it in detail (I’m not good at math at all, very slow at following complex equations), but as I understand it O(log(n)) is a significant simplification. I would use the Big Theta for any analysis, it’s probably closer to reality. But you’ll first have to know what k is going to be in the live SAFE network.

Incidentally, the Bittorrent DHT for trackerless torrents currently uses a k of 8 according to this (rather old) article, but it also uses only a 160 bit address space (SAFE uses 512 bits), so at most 8*160 connections are maintained.


#9

I plotted the equation with several k values, 2, 4 and 8. The Y axis is the amount of hops (on average), the X axis the amount of routing nodes in the network (up to 500K here).

It visualises nicely that you get dimishing returns as you increase k higher and higher.


#10

May I shamelessly tag @dirvine in the hope of some information on whether the current bucket and routing table sizes are going to be changed or not?


#11

At the moment not. It can be surprising to see some math here, but if you take a population of 7 billion nodes and look at the distribution of buckets then you will only get to 33 or 34 buckets. This is just down to distribution of the xor addresses and you can think of it like trying to get hash close together (like a btc mine attempt type scenario).

So even a RT size of 64 buckets we are fine for a long time. We then use 32 of these for the close nodes (this will be 29 most likely) so we offset the routing table if you look that way at it, but in fact until there are several billion nodes more than the population of the planet we are still comfortably large in terms of routing table size.

It’s not obvious at all though, but it is true :slight_smile:


#12

This is a large divergence between SAFE DHT and Kademlia, in Kademlia no connections are maintained and in fact many of the entries in a routing table are in fact dead nodes. There are improvements, namely dead node removal (down list) and beta refresh optimisations, but many are not implemented.

We use a recursive approach instead of the Kad iterative approach and therefore can make real connections and keep them. This has the effect of reducing the refresh time to the physical minimum (like setting Kad refresh to near zero).

Also worth noting that traditional DHT’s like Kad etc. will time data out if not republished (usually 24 hours). This is where vaults come in though.


#13

I don’t think that is a requirement? If we pick the relay node from outside the close group of either endpoint, then the two IP:s will mean nothing to the relay node, so no information leaks.


#14

Its not a black/white style of anonymity that SAFE provides. Its more reducing the opportunity for surveillance systems to determine the source/dest of the packets.

In the case of endpoints talking to their relay node and the two relay nodes talking to each other, anonymity is compromised if the 2 relay nodes are within a single/coop surveillance net. UK’s & USA spooks cast a wide net. So while this is not going to happen most of the time it will happen for a significant number of cases. In practice this is only a problem if they target either end points or the relay nodes as all the required information was captured. But in all those cases the communications is not considered anonymous.

When more intermediate nodes are added the opportunity for the full path to be captured reduces extremely quickly. Even one node between the relay nodes is a massive benefit to anonymity, but can still have enough chance of the total path being captured in the surveillance net and capable to be tied together.

Having 2 intervening nodes becomes quite good. Again not perfect.

By the time you have at least 2 nodes between the relay nodes and add to that the increased map of communications between these nodes and between other nodes made individual communications extremely difficult to follow and for most is sufficient anonymity. But time tagging can still in a small number of cases map individual src->dest communications

But only having the 2 relay nodes the net of communications is small enough with time tagging to be able to follow the flow of communications using analysis programs.