Vault routing burden

Not sure what you tried to say there. But I meant that A → Z could follow its shortest path as (remember that this is each node is looking for closest node next to it on the way to Z)

A → F
F → X
X → C
C → Q
Q → L
L → Z

And on the way back Z to A, remember that in this case each node on the path is looking for closest node on way to A

Z → E
E → C
C → K
K → M
M → B
B → L
L → S
S → A

On the way from A to Z we might see it pass through node C and Q is node the closest to Z out of its neighbours. But on the way from Z to A node C might see K as the closest to A. So the way back might see some of the same nodes.

It is going to be “random” whether to/from path is same or different # of hops and if its more or less

The issue that I don’t know is if the routing protocol has history thus remembering the node it got the request from and so effectively the return path is remembered. (I don’t think so, but if it is then it would be only kept for a time until that space is overwritten by new routing history)

1 Like

If anything an intermediary node would only store the node in their DHT - not remember the path of the request. That, I think, would break the “statelessness” of the network.

So yes, the average number of hops would be the same either requesting out for or recieving back a chunk.

Which brings us back to the original point: If I’m an intermediary node, will I be downloading and uploading multiple - if not in the tens of - GBs of data per day? What might be a way to minimize the bandwidth requirements of running a vault, and how can that be done in a fair manner so that every vault is still considered equal by the network?

I would tend to assume that it couldn’t - and that the costs might just have to be eaten by the vault owner.

And is the reason I doubt my “issue” is correct.

Thats the whole point of the thread, to find holes in that assertion. I did suggest a possible change that could reduce it say half or third, if it can be done securely. Look a few posts up in the thread with the banter between @janitor and myself

Another is to have 2 types of nodes, one that purely store/farm and full featured vaults that pass on data too. Low uplink speed vaults are marked as storage only. This reduces # hops and in small network would be a security problem because of the lack of full featured nodes, large networks the number of possible hops is usually large so less of a problem.

1 Like

Just realized the average won’t be 10, because there are exponentially more nodes far away than close by… Half of the chunks will be 20 hops away, 25% of the chunks will be 19 hops away, 12.5% of the chunks will be 18 hops away, etc.

3 Likes

Thanx. Exponential thinking on a XOR-level. I need coffee :wink:

2 Likes

I think your argument is quite solid, and even the estimate for G, as detailed here, is not crazy. What will happen though is that G will be determined by the available speed. So I think it is safe to assume that the initial decentralised will be slower than the conventional experience (inevitably as additional layers of abstraction are added.) The challenge will be to balance and optimise all parameters, and “usablility (limited by speed)” will be one of them.

As a bright spot, it is proven to not be impossible to get very usable speeds, by simply observing the success of the bittorrent network.

4 Likes

safe network browsers will probably load the content of every possible link as soon as you enter a website :open_mouth: so browsing should be at perfect speed… (at least i would use that “super-fast” browser as soon as there is one …)

ps: sorry didn’t follow everything here … just assumed the problem with hops is reaction time of the network and not speed itself :wink:

If you disable JavaScript, they won’t.
It would be undesirable to do that and would create excessive traffic which in SAFE also has a side effect of mining coins.
If someone creates such a plugin it will probably used for request stuffing.

hmmm - ok - i didn’t get your argument with java

but as a user I don’t really care about the excessive traffic on the safe network … and I don’t really care about additional farming rewards generated … the important thing is that i don’t have to wait 1,5s for a site to be loaded :open_mouth:
…yes I know this would be slightly crazy … but it would enhance my browsing experience

1 Like

Good catch. It confirms me the vital importance of the cache.

Here are a Swedish study on consumption of data and applications. We can see that near 75% of downstream data are file sharing and media streaming. If, for calculating consumption in the safe network, remove about 8% belonging to VOIP and part of the 13% which is web browsing, we can estimate that about 90% of the safe data can be File sharing and Streaming Media .

I check the bittorrent, and today, the average size of most downloaded files are about 630MB. And, for example, the fifty most downloaded files take about 65% of total traffic.
As these fifty files occupy about 31GB we can expect that these data will be located mostly in the cache and also in relatively close positions (possibly less than 10 hops).
Other requested data, although to a lesser extent, will also be in the cache but, averaging, in more remote positions.

Will see, but it seems clear that, with the current implementation, the network could begin to have problems with many users.

By the way, does anyone know which is the estimated amount that will have the cache size?

2 Likes

Not judging here, but I think from various topics and comments (not just by digipl) it’s clear what people think MaidSafe will be mostly used for.
For example you won’t see anyone asking whether MaidSafe will make it possible to create anonymous uncesored blog posts.

BitTorrent uses it’s DHT for lookups only, the actual data transfer is directly peer-to-peer. In SAFE the DHT is used as a routing layer as well, which amplifies latency and aggregate bandwidth consumption. The latency is a non-issue for larger downloads and streaming because of parallel GETs, but the huge aggregate bandwidth consumption remains. It’s not a problem for clients, but it is for vaults (as per my OP).

I’m not overly pessimistic anymore though, because as I realised halfway this thread, the solution is to encourage many small vaults over fewer big vaults (through the farming algorithm). Even though this increases the aggregate bandwidth consumption, the routing load per vault is reduced and thus vault accessibility (minimum requirements) is improved.

3 Likes

If this is a problem, Intentional caching may reduce bandwidth significantly. If a client offers 50 gig, and the network only needs 2 gig for actual storage, the other 48 can hold the most requested chunks… Chances are the Viral stuff is going to be the bulk of the traffic anyway… The less hops the less traffic…

This would mess up the “Pay for content” part of the model. But it would be better to have a faster network that people could use than a slower one that paid developers to make stuff that few have the patience to use.

1 Like

I saw that earlier post and the logic is sound. However I would request some further clarification if you would.

First of all, are you (as I suspect) talking about many vaults over different machines? In other words, have more people farm. Otherwise, I don’t see the difference if the same number of people farm only with more vaults set up per machine.

Secondly, how do you envision the farming algorithm compensating for this? What properties must be present in order for it to ease the problem of the download burden for routing for every vault?

There is no way to tell if 2 vaults with the same public IP belong to one or more owners (think university campus). A good reason to run several vaults is rewards (you want your vault(s) to stay not too much above average)
You’re right about the bandwidth.

I’m just talking about as it pertains to this topic. If I were to rephrase that for clarification, I might say:

Is the solution in @Seneca’s eyes to encourage many small vaults with each of them belonging to separate IP addresses and/or owners in order to spread out the bandwidth to as many different networks as possible? Because I don’t see how one machine with a set number of larger vaults would benefit (as it pertains to this topic on vault bandwidth) by reconfiguring to use a greater amount of smaller vaults.

To which I believe you gave your answer @janitor. Thank you. However that first part of your response reminded me of another thought that I had.

If there are, say, two vaults running on one machine, what if the GET response (the chunk being routed) was routed through one of those vaults, then out to another random vault, and then back to the second vault? (the one on the same machine as the first vault) In that scenario the bandwidth is doubled for the same piece to be sent to the same machine.

The only way I can think to mitigate this is to have vaults on the same machine add each other’s address to their DHT (which would probably have to be done manually). Maybe not necessarily the same IP address, but certainly the same machine. However, would that cause any anonymity/security problems?

Regardless, I do believe that @Seneca is right in that encouraging more farms on more machines on more networks will mitigate this core characteristic of the network - namely that there will be a lot of bandwidth consumed by farmers. I would just like to dive a bit deeper into his reasoning in case I missed something.

You’re right, two vaults that run within the same box or on the same LAN in Vancouver might still send and receive chunks via Tasmania. And you’re also right that letting any Safe client or vault know the IP address of the other side would destroy anonymity.

At the same time it’s unlikely that a request from any of 20 Safe clients or vaults you own would end up in any of the 19 other vaults (I assume there will be 10-50 K vaults on the network in v1.0).

DHT allows for direct P2P transfer, but that’s why its privacy and security sucks - you can see the other guy’s IP address and even if it’s a firewall IP you can at least know the network from which he accessess the internet (if not the physical location).

1 Like

What about just being a smart admin and adding all of your machine’s other vaults XOR addresses to the others’ DHT? No need to mess with IP addresses.

The chances of vaults on the same machine being on a single routing path are negligible surely?

4 Likes

Of course. Admittedly that was a small aside in a much bigger topic.