Vault routing burden

The amount of hops a chunk takes to reach the recipient is O(log(n)) in a worst case scenario. For example, in a network with 1 million nodes that means about 20 hops. Popular chunks get cached, reducing their hop count, so to take this into account and for the sake of simplicity, let’s say we have a SAFE network with an average chunk hop count of 10.

This means that for every GET of a (1 MB) chunk a vault serves, 10 MB of both up- and download bandwidth is used for routing (by all intermediary vaults), while only 1 MB upload bandwidth is used for actual farming (by the vault that had the chunk). Every vault is expected to route data for it’s position in the DHT, regardless of the size of the vault. This means that the chunk routing burden for a vault is equal to:

H = average hop count
S = average size of a chunk (almost 1 MB)
G = total network-wide GETs per day
C = average amount of copies per chunk (usually 6)
V = amount of vaults in the network

Daily vault routing burden = (H * S * G * C) / V

I’ve provided an example value for all except G, which is anyone’s guess. I think 1 billion client GETs per day in a network of 1 million vaults is realistic, to the low side even. This results in a daily 60 GB up- and download burden for routing for every vault, which is about 0.7 MB/s.

We haven’t included other message types (PUT’s and such), nor churn, nor actual farming, only routing. All this considered, I doubt it is feasible to farm on smartphone connections or in areas where telecommunication infrastructure is poor. I’m even concerned about whether it’s possible to farm on home connections in countries with high quality infrastructure. Even if you don’t have a datacap, your ISP probably isn’t going to let you get away with a daily consumption of 60 GB upload bandwidth.

Please make my day by poking some big holes in my reasoning?

10 Likes

Nice work!
Take 10-50,000 nodes instead, I think that’s more realistic.
In other posts I assumed the starting capacity (usable) in single digit PB’s, like 4-5 PB, which is 20 PB raw and assuming the network size in low single digits (000’s).
5,000 nodes * 4 TB each = 20 PB raw.

I think if the network reaches 10-50 K nodes that would be a lot.
I’m not saying 1 million won’t happen, but it won’t happen in v1.0.
By v2 there will be workarounds and probably routing rewards, so I’d just consider up to 50K nodes.

Churn: well, I stated my concerns about it before, but many have said it’s not a big deal. I think churn will be recognized as a source of significant workload once people start doing some simple calculations… Apparently noone thinks churn could be a significant cause of GET requests (and network traffic). I think it will be, maybe up to 30% of all traffic.

2 Likes

I was wondering if there is any mechanism in the network to drop the chunks that have taken a lot longer than other GETs that have already reached the destination?

Should you not be using 4 chunks since the network only ensures 4 and while there maybe 6, it is not going to be every time. Maybe use 5 for averaging.

Another things is 0.7MBytes/sec is over 5 Mbits/sec. Cannot happen for nearly all of Australians since we have a max of 2Mbits/sec and most can only get connections with 1Mbit/sec uploads. Except the lucky few who have fiber

If you are correct with 1 billion gets for 1 million vaults then indeed we may have issues.

BUT honestly I do not think that each and every vault is going to get 1000 requests per day on an ongoing basis.

The farming rate algorithm will adjust the farming rewards to aim for 6, so if the network is in equilibrium there are 6 copies. 5 would indicate a severe shortage of storage space and would result in a huge climb of the farming rewards.

Depends on the client/vault ratio and the average data usage of a client per day. I assumed 1 user for every vault and 1 GB data usage per client. HD video streaming makes this quite feasible.

Routing rewards don’t reduce the load. I’m not concerned about incentives or profitability, just minimum bandwidth requirements to farm in the first place.

1 billion requests is 1 PB / day. There won’t be that many requests (unless the app guys can earn by stuffing their GET requests).

My (guess)estimate is 1% of the capacity/day (or, if there’s 20 PB in raw capacity out there, that’d be 200 TB.)

Routing rewards would reward farmers to build nodes with suitable for caching, which would reduce minimum bandwidth requirements from the rest of the network. But it’d have to be rewarded. Imagine getting a 10% cut for “saved” GET. Maybe you’d buy one of a cheap SSD drive and keep SAFE cache on it.

Just realized the solution here is to make sure we get (far) more vaults than clients. Hop count increases exponentially less as vault count grows, so in the end routing burden decreases by having many small vaults rather than fewer big ones.

Caching is already rewarded because it saves a vault download bandwidth. If it doesn’t have a chunk in cache it has to pass on the request, receive (download) it, then pass it on (upload it). If it has a chunk in cache, it only has to upload it. Not sure if this is incentive enough, but it is incentive nonetheless.

As long as those clients are not the ones running those extra vaults. :smile:

And the opportunity for caching to help out increases too. IF significant caching occurs then the network wide bandwidth will reduce.

Also I feel that if these figures/rates pan out then changes to the GET strategy might be needed so that not every vault with the chunk responds each time to reduce the overall load

2 Likes

Indeed, but note that in my example I already halved the average hop count (from 20 to 10) to account for caching. This is very generous I think.

1 Like

I wonder if there might be a way, without reducing security to allow some nodes to be “skipped” on the return path. This would require some XOR (and/or IP) addressing to carry through on some hops. Instead of the vault receiving the GET request sending it to the node that actually told the vault to retrieve the chunk, it sends it to the 2nd or 3rd node up the chain.

So when the network is large and more than 5 hops is the norm that every 2nd or 3rd node is skipped. The skipped nodes could be called relay nodes and all they do is relay the request and play no further part in returning the chunk

With small network the IP address is lost on 1st hop and each node passes the chunk back.
With large networks the IP address is lost on 1st non-relay node and so on. So then say every 2nd (or 3rd) node passes/relays the request including the IP address of previous node so that that node can be bypassed on the way back.

In small network there maybe 3-4 nodes in the link, on large it might be 15-20 nodes with say 1/2 or 1/3 relays. Still sufficient nodes to hide the real source/dest from the intervening nodes.

1 Like

Your not alone. From the current SAFE system is, and by far, where I have more doubts.

Imagine if you could cut it down to 1 intermediate hop (Vault = > Intermediary => User)! Wow.

Here’s where I would like to poke an hole in your theory ;-). If we take 1 million users and account for the fact that the distribution is almost random, some Chunks might be 20 Hops away at max. But others will be just 1 Hop away. So the average distance to a chunk should be 10 without any caching. And now we add caching, which means that we have to know the amount of RAM it is allowed to use. The higher, the better (will be very good for those who bought Safecoin at the crowdsale). According to the fact that people behave as sheep, and a lot of them will watch the latest episode of Homeland at the same time, the average chunk might be only 7 or 8 hops away. Another point to make is that the Chunks are compressed. Let me know if you can poke some holes back :wink:

3 Likes

Hmm, if Homeland you’re downloading is compressed, it’ll be slower to watch.

This is from the thread

Should be no problem I think, The chunks won’t arrive at the same time, so when the chunks are encrypted one by one, while a number of them are downloading, you could unzip the ones you already have. So as long as your download isn’t faster than 280Mb/s it should be usefull. The only big difference with Popcorntime or BitTorrent is that we need to download the whole file first before we can watch anything.

2 Likes

Okay, I’ll wait & see when a workable WAN version is out.
It might be okay on x86, but ARM devices don’t exactly excel at that kind of workload.

I don’t see why streaming wouldn’t be possible?

3 Likes

I just had diner, so another thought popped up :grinning: . What if Safenet won’t work for the average home user? And only the folks with the fibers and fast cable connections are capable to run the Vault?? Let’s take this as a bad-case-scenario. What would happen? SAFE would still be popular, people would still use it to download because it’s free. Where does that bandwidth come from? In that case it wouldn’t surprise me if Safecoin prices would go up to a level where it would be possible to run a Vault on Amazon AWS or some other cloudprovider. People and companies go to great lengths to make money. Just look at videos about farming Bitcoin in China. Your Netflix HD streaming is coming from Amazon as well, GB after GB because Netflix can make money on it. They even have enough money to spend a couple of billion on content as well.

Self-encryption needs the whole file to encrypt and decrypt. You need all the chunks.

That can be fixed by dividing the video file into smaller files. The drawback is that a special video player is needed. The video format H.264 requires commercial license. VP8 is royalty free. (The same with H.265 and VP9.) The good thing is that since SAFE is a new system, new video solutions can easily be developed.

2 Likes

I agree! I think the Dev-team has thought about it already. But in my scenario I just talked about a 1,5 GB episode of Homeland that got shared by someone and is downloaded by a great number of users in the 24H after it has been on TV. That’s what you see now on KaT on Monday’s, although I only look at it for technical reasons :wink:

1 Like

I’m pretty sure that’s not true, you only need the datamap (all the hashes) and then it’s possible to decrypt every chunk independently of the others.

6 Likes