Vault routing burden

I think your argument is quite solid, and even the estimate for G, as detailed here, is not crazy. What will happen though is that G will be determined by the available speed. So I think it is safe to assume that the initial decentralised will be slower than the conventional experience (inevitably as additional layers of abstraction are added.) The challenge will be to balance and optimise all parameters, and “usablility (limited by speed)” will be one of them.

As a bright spot, it is proven to not be impossible to get very usable speeds, by simply observing the success of the bittorrent network.

4 Likes

safe network browsers will probably load the content of every possible link as soon as you enter a website :open_mouth: so browsing should be at perfect speed… (at least i would use that “super-fast” browser as soon as there is one …)

ps: sorry didn’t follow everything here … just assumed the problem with hops is reaction time of the network and not speed itself :wink:

If you disable JavaScript, they won’t.
It would be undesirable to do that and would create excessive traffic which in SAFE also has a side effect of mining coins.
If someone creates such a plugin it will probably used for request stuffing.

hmmm - ok - i didn’t get your argument with java

but as a user I don’t really care about the excessive traffic on the safe network … and I don’t really care about additional farming rewards generated … the important thing is that i don’t have to wait 1,5s for a site to be loaded :open_mouth:
…yes I know this would be slightly crazy … but it would enhance my browsing experience

1 Like

Good catch. It confirms me the vital importance of the cache.

Here are a Swedish study on consumption of data and applications. We can see that near 75% of downstream data are file sharing and media streaming. If, for calculating consumption in the safe network, remove about 8% belonging to VOIP and part of the 13% which is web browsing, we can estimate that about 90% of the safe data can be File sharing and Streaming Media .

I check the bittorrent, and today, the average size of most downloaded files are about 630MB. And, for example, the fifty most downloaded files take about 65% of total traffic.
As these fifty files occupy about 31GB we can expect that these data will be located mostly in the cache and also in relatively close positions (possibly less than 10 hops).
Other requested data, although to a lesser extent, will also be in the cache but, averaging, in more remote positions.

Will see, but it seems clear that, with the current implementation, the network could begin to have problems with many users.

By the way, does anyone know which is the estimated amount that will have the cache size?

2 Likes

Not judging here, but I think from various topics and comments (not just by digipl) it’s clear what people think MaidSafe will be mostly used for.
For example you won’t see anyone asking whether MaidSafe will make it possible to create anonymous uncesored blog posts.

BitTorrent uses it’s DHT for lookups only, the actual data transfer is directly peer-to-peer. In SAFE the DHT is used as a routing layer as well, which amplifies latency and aggregate bandwidth consumption. The latency is a non-issue for larger downloads and streaming because of parallel GETs, but the huge aggregate bandwidth consumption remains. It’s not a problem for clients, but it is for vaults (as per my OP).

I’m not overly pessimistic anymore though, because as I realised halfway this thread, the solution is to encourage many small vaults over fewer big vaults (through the farming algorithm). Even though this increases the aggregate bandwidth consumption, the routing load per vault is reduced and thus vault accessibility (minimum requirements) is improved.

3 Likes

If this is a problem, Intentional caching may reduce bandwidth significantly. If a client offers 50 gig, and the network only needs 2 gig for actual storage, the other 48 can hold the most requested chunks… Chances are the Viral stuff is going to be the bulk of the traffic anyway… The less hops the less traffic…

This would mess up the “Pay for content” part of the model. But it would be better to have a faster network that people could use than a slower one that paid developers to make stuff that few have the patience to use.

1 Like

I saw that earlier post and the logic is sound. However I would request some further clarification if you would.

First of all, are you (as I suspect) talking about many vaults over different machines? In other words, have more people farm. Otherwise, I don’t see the difference if the same number of people farm only with more vaults set up per machine.

Secondly, how do you envision the farming algorithm compensating for this? What properties must be present in order for it to ease the problem of the download burden for routing for every vault?

There is no way to tell if 2 vaults with the same public IP belong to one or more owners (think university campus). A good reason to run several vaults is rewards (you want your vault(s) to stay not too much above average)
You’re right about the bandwidth.

I’m just talking about as it pertains to this topic. If I were to rephrase that for clarification, I might say:

Is the solution in @Seneca’s eyes to encourage many small vaults with each of them belonging to separate IP addresses and/or owners in order to spread out the bandwidth to as many different networks as possible? Because I don’t see how one machine with a set number of larger vaults would benefit (as it pertains to this topic on vault bandwidth) by reconfiguring to use a greater amount of smaller vaults.

To which I believe you gave your answer @janitor. Thank you. However that first part of your response reminded me of another thought that I had.

If there are, say, two vaults running on one machine, what if the GET response (the chunk being routed) was routed through one of those vaults, then out to another random vault, and then back to the second vault? (the one on the same machine as the first vault) In that scenario the bandwidth is doubled for the same piece to be sent to the same machine.

The only way I can think to mitigate this is to have vaults on the same machine add each other’s address to their DHT (which would probably have to be done manually). Maybe not necessarily the same IP address, but certainly the same machine. However, would that cause any anonymity/security problems?

Regardless, I do believe that @Seneca is right in that encouraging more farms on more machines on more networks will mitigate this core characteristic of the network - namely that there will be a lot of bandwidth consumed by farmers. I would just like to dive a bit deeper into his reasoning in case I missed something.

You’re right, two vaults that run within the same box or on the same LAN in Vancouver might still send and receive chunks via Tasmania. And you’re also right that letting any Safe client or vault know the IP address of the other side would destroy anonymity.

At the same time it’s unlikely that a request from any of 20 Safe clients or vaults you own would end up in any of the 19 other vaults (I assume there will be 10-50 K vaults on the network in v1.0).

DHT allows for direct P2P transfer, but that’s why its privacy and security sucks - you can see the other guy’s IP address and even if it’s a firewall IP you can at least know the network from which he accessess the internet (if not the physical location).

1 Like

What about just being a smart admin and adding all of your machine’s other vaults XOR addresses to the others’ DHT? No need to mess with IP addresses.

The chances of vaults on the same machine being on a single routing path are negligible surely?

4 Likes

Of course. Admittedly that was a small aside in a much bigger topic.

Multiple smaller vaults per machine will open the way for more people to farm (who otherwise couldn’t due to routing burden).

Let’s double the amount of vaults in the OP scenario to two million. This almost halves (hop count increases slightly as well) the routing burden per vault from about 0.7 MB/s to about 0.36 MB/s. People who could afford the 0.7 MB/s before for one big vault can now (most likely) run two vaults half the size with a routing burden of 0.72 MB/s. The benefit is that people with connections that can only afford between 0.36 MB/s and 0.7 MB/s for the routing burden can now run a vault as well, while they couldn’t run any before.

Mitigating the extra earning rate above a certain vault size. It probably needs an approximate measure of the average or median vault size of the network.

2 Likes

Very simple cap would be:

personal farming rate = (general farming rate * your vault rank) / (average network-wide vault rank * 1.1)

This mitigates any benefits from having a vault size over 110% of the network average. Above that 110%, the odds of getting a farm attempt lessen proportionally to the amount of additional chunks your vault has. In other words, you don’t earn more by having more chunks.

1 Like

Comprehending Kademlia Routing - A TheoreticalFramework for the Hop Count Distribution

1 Like

2^20 = 1048576 If you exclude half of the netwok with every hop you need 20 hops for a million nodes.
As far as i undestand it there are 32 nodes in a group and if you don’t hold the data the request get passsed on the the node that is closest in xor to the data being requested.
Closeness depends on perspective altought you share closeness with alice and alice with bob you don’t share closeness with bob.
Excluding the requester there seems to be 30 available pathways for every hop.
Which gives me.
30^4 = 810000
30^5 = 24300000
An average of 4 hops in an network of a million nodes.

1 Like

To circumvent the routing overhead in resending the data many times, a routing node should be introduced to safe network.

It would be a persona that agrees to connect two ip:s together with minimal latency (in exchange for safecoin?).

The normal safe network routing would negotiate the connection, the nodes would ask the network to connect a router node to their ip.

This way the data is only sent&received twice, making video chat etc. possible.

I don’t think there are bad security implications, the data is encrypted, so routing nodes would only know 2x ip:s and the amount of information exchanged.