Fast Ephemeral Routing

mav · July 17, 2020, 11:34pm

Just a brainstorm, not sure if it’s a good idea but am looking for comments and criticisms.

Fast Ephemeral Routing

Summary

Routing efficiency can be greatly increased by reducing the amount of data sent in the response route. This would reduce the necessary requirements for running a node.

When requesting a chunk, the section responsible for the chunk responds with metadata that allows the client to fetch the data starting much closer to the chunk location, reducing the number of nodes that need to route the full data chunk.

Motivation

The current routing design responds by sending the full chunk data through every hop on the response route. This adds work to the routing nodes and delays the response to the client.

The proposal changes the response to contain only an Ephemeral Voucher (EV) that can be used by the client to (almost) directly request the chunk, saving the chunk data from having to traverse the entire route.

This design also improves the performance of caching.

Detailed Design

Ephemeral Vouchers

Responses contain EV data, which are a type of metadata that allows the client to know which nodes can supply the chunk and how to make a request for the chunk to those nodes. The process is:

The client requests a chunk.
The network routes the request to the closest section via Secure Message Delivery (SMD) and Reliable Message Delivery (RMD) via nearest neighbors.
The elders nearest that chunk generate and store an EV, which contains
- A list of IPs:Ports of elder nodes in the section (to keep one degree of separation between the client and the adult nodes holding the chunk).
- The name of the chunk being requested
- The one-time ephemeral access code that allows the client to download the chunk
- The expiry time (only for use by the elders)
The elders route the EV response back to the client using SMD + RMD via nearest neighbours.
The client receives the response and uses the data in the EV to contact one or more nodes in the elder list.
The elder checks the request has a valid ephemeral access code and has not expired, then passes it on to the adult nodes holding the chunk.
The chunk is returned via the elders to the client, avoiding using nearest neighbours.

This has several speed benefits

Only metadata, which is lightweight and fast, is passed along the full route. The slower chunk data is only sent over a single hop, skipping almost the entire route.
The client can request from a single elder listed in the EV, and only make additional requests if that response fails or is too slow. This prevents the client from receiving multiple redundant responses via RMD. The client can choose between multiple parallel requests for speed with extra wastage, or single serial requests to reduce wasted data transfer but at the mercy of the speed constraints of the particular elder selected.
Any action taken by the client receiving the EV allows the elders to reward based on speed to the client (on the assumption the client acts on the first seen response).

Caching

Any node along the request route which has the chunk in their cache can respond to the client with their own EV. This means the client will be able to receive the data faster, and any lagging responses from further along the route would only contain metadata and can be discarded without having to include the entire chunk data in every discarded response. This greatly reduces waste and duplicate data.

Privacy

Routing is meant to assist in privacy by scrubbing IP addresses on each hop.

For this reason the client does not directly download the chunk from the node holding it, but instead acts at one-degree-of-separation away with the elders.

Privacy may be further enhanced by including extra encryption of the metadata so elders are not aware of the particular chunk being requested (only which node to send the encrypted metadata to), but that is not detailed here to avoid complexity of the EV mechanism.

Optimisation For Message Size

Routing must currently deal with message sizes ranging in the order of bytes (eg requests for data) up to thousands of kilobytes (eg full chunks being routed).

When EVs are used all messages are of roughly the same size, in the order of hundreds of bytes, which makes it easier to optimise the flow of traffic through routes.

Efficiency Analysis

TODO Equation for expected EV message size, with example result of calculation

TODO Equation for expected total bandwidth per chunk in current style routing with RMD, with example result of calculation

TODO Equation for expected total bandwidth per chunk using EV, with example result of calculation

Deployment

This is an extension to the existing routing and does not need to replace it. All the current work on routing is still required for EVs to function correctly.

EV Retention Period

Elders must retain EV information about each request in order to complete it. If the client never requests the chunk the elder should discard the EV, but how long should the elder wait until they discard it?

The elder needs to resolve the conflict between holding on to the EV for a long time and consuming memory vs discarding the EV too early and being unable to send data to the client.

Once the voucher has been used it can be discarded. This is to prevent the client repeating the request and putting excess load on the elder. The EV acts as a type of rate limiter, however the elder could possibly implement their own rate limit rules.

Drawbacks

There’s some possible complexity of trying to deploy this on an already live network, since nodes that have not upgraded would not understand EV requests. This might be an interesting thought experiment about how other potential changes to messages and routing might be possible in the future. BIP9 is worth looking at for historical context.

Alternatives

Use the current routing as-is.

Maybe the client could open their own ‘temporary server’ and allow the node holding the chunk to directly push it to them? This would probably mean the client IP is exposed to every node on the hop so seems unworkable from a privacy perspective. Maybe the first hop in the request route (ie closest to the client) could act as the recipient for the chunk and then push it to the client? The best way to ‘close the loop’ and ‘eliminate hops’ is an open question.

Maybe the need for EVs can be eliminated and clients can simply request closer to the data by some less time-dependent mechanism?

Maybe having many simultaneous connections for every client is adequate to reduce the routing distance? In a large network with long section prefixes this may not be possible.

Unresolved Questions

Which gives the best result: a) slightly more hops with a lot less overall data transfer or b) slightly less hops but a lot more overall data transfer? (See Efficiency Analysis above, lots of TODOs, any help is welcome).

The best way to referee elders to ensure they do not discard EVs too quickly leading to poor experience for clients.

Should this include a way for clients to clear EVs from elders so polite clients can help clear unused EVs from elders without needing to return the data?

Should this be used for PUT as well as GET? Seems like it’s possible.

Can EV spoofing be used as a kind of DOS?

What privacy implications are there? How private is the current routing and what changes does EV bring to privacy? What risks to privacy are introduced or mitigated by EVs?

Scorch · July 18, 2020, 12:18am

Interesting concept. Just going to think aloud for a sec.

I suspect that slightly less hops with more overall data transfer may be faster (unless we’re talking “lot“ of extra data. Like you mentioned, would need to be quantified, but just giving an impression here) because each time a new hop is made we need to block on a select/poll and handshake, so we reduce queuing/processing time with fewer hops. In general that should offer much greater marginal positive impact than the marginal negative impact of increased transmission/propagation delay from more data, because those benefits from heavily from “pipelining” of information on the wire.

Also, whats the definition of expiration time in this context? Wouldn’t that break down the asynchronous assumption if we used some notion of time here?

mav · July 18, 2020, 12:48am

Not sure. It could be defined as a fixed timeout, eg 60s, or it could be like bitcoin mempool and grow/shrink up to a fixed buffer size for EVs, say 100MB of memory and the oldest are purged if it gets full. Not sure the most appropriate design, but it seems difficult to enforce at the network level so probably ends up being up to the node anyhow depending how they want to manage the risk of failures.

I don’t think there’s any need for network time to be introduced, this feature can still work just on local time.

It’ll be interested to see how multi-section networks go.

From the last test network speeds, 1k in 3s compared to 900k in 11s is worth looking at, roughly the size of a metadata message vs a single full chunk.

For example, if there’s 5 hops to negotiate each way, in current routing that’s 5 metadata request hops and 5 chunk data response hops (5*3s+5*11s = 70s), vs EV it’s 5 metadata request hops and 5 metadata response hops and 2 metadata EV hops and 2 data chunk EV hops (5*3+5*3+2*3+2*11 = 58s). Only very rough approximation but EV looks good there.

As you say it will depend on the time for each handshake and validation etc vs the time for data transfer. Doesn’t help that both these will also change into the future depending on improvements to computation / networking / crypto primitives etc.

drehb · July 18, 2020, 1:43am

How do you see the caches being filled in this proposal? With the existing design the cache can fill with data as it passes through the node. I guess when a node sees many requests for the same data it can request it itself for the purpose of caching it? Also, isn’t the caching node then also exposing it’s IP by responding with an EV?

mav · July 18, 2020, 6:50am

As you say probably initially it could work by nodes observing which chunks are requested frequently as they pass via routing. Although to my knowledge routing messages only happen with elders and do not touch adults (see RMD “The Delivery Group is a subset of the Elders of the section”), so that would also have implications for cache hits/misses as well as opportunities to fill the cache.

But I reckon a system that allows sequentially named chunks to be requested as suggested here would probably be a good way to warm caches and maybe also allow an aspect of audit and accountability.

dirvine · July 18, 2020, 10:06am

An interesting thing we pushed recently is section knowledge. There has been a change here and I really like it. So until the network consists of millions of sections (to be worked out) each section will update any section that speaks to it and vice versa.

What this means is section messages have the dst address but also what the sender thinks is the current dst section key (a bls public key). If the dst elder detects the src elder is “out of date” then it processes the message but sends back a ProofChain which is section keys signed by previous section keys , plus the current set of Elders (signed by the last key in the chain, i.e. the “current” key). This means the src gets updated.

Conversely if the dst does not recodnise the src section key, it errors the message back with the last section key he knows of for that src. The src then updates the ProofChain and elder set. The message is now resent knowing the dst can actually process it as we have proven authority on the message.

So this removed the requirements we had (and I hated) that each section must update all neighbors and they must all know the same thing at the same time and so on. It was a very bad design and in fact was cemented in as a network invariant, which was horrible. Much of our test suite was formed around this invariant test and lots and lots of code created to try and achieve it. It was very wrong and easily fixed. All that we need to do is allow differences in knowledge and provide a simple path to update knowledge in real time.

So with this and for now, all section will know all elders of all sections. We can still have it massivly scalable by only knowing neighbors, but that means more work right now.

Therefor a client can ask fro data, his elders can directly ask the remote elders and send it back directly to them. The remote Elders get it from at least 3 Adults and confirm with each other the Adults did deliver (so we can penalise lost or failing Adults).

So the Adults in each section are not known to anyone except the section members. The same holds true for clients. This reduces hops significantly. We still have the code to hop of course as sometimes the network section splits and so on. That’s all cool though.

So this means things are simpler. I am just adding this as background info though.

I need to fully digest this proposal (we did used to use hashes etc. in a similar manner, but not quite as detailed IIRC).

Scorch · July 18, 2020, 12:30pm

I think this is a more general question than specific to this idea, but I’ve still got some reading to do in general, so bear with me

Is there any inherent issue to having the client contact the elder directly/does this method imply the elder of the destination section then acquires knowledge of the clients IP? Might strain the elders a bit more, but maybe offering one extra hop to the client’s section elder before redeeming the EV would be useful for hiding from the destination elder which clients are making routine requests to the section. In a similar way that we don’t want the client asking the adult directly for the chunk to keep it ambiguous as to who’s requesting the chunk and who’s supplying the chunk on both ends (even if that chunk’s identity is encrypted). Not sure exactly what the danger would be in letting that information leak to the destination section elders, but seems like something they don’t strictly need to know I suppose

tfa · July 18, 2020, 3:36pm

This doesn’t seem secure. Currently a client only knows the IP addresses of its direct contact nodes. With your proposal it will be able to know many more IP addresses.

An attacker could generate a database of all elders in the network with their IP addresses, just by fetching some data (starting from index.html of known safe sites and recursively fetching all data linked from them, until all sections are covered in the XOR space).

mav · July 18, 2020, 11:30pm

My feeling is that IP scrapers intending to abuse EVs can do this with current routing by spinning up many many clients to collect IPs. EVs don’t have any impact on those types of people. Maybe EVs make it slightly easier to gather IPs? But starting a client is presumably not much more effort than making a GET?

I think you and @Scorch raise a good point in general, what are the risks of exposing IPs and to what degree does it need to be protected vs public. This question applies to elders and to clients, both who desire some degree of privacy but also need some degree of connection to work properly.

There’s an aspect of making IP collection not too easy but not impossible which we could call the necessary and desirable level of privacy, but there’s also an aspect of security by obscurity which gets in the way of features and functionality for the mere appearance of privacy and security. I’m not exactly sure where the line is but I feel that EV is within the security and privacy aims of the project.

Maybe instead of EVs connecting clients at one-degree-of-separation it could be two-degrees away? You see where this leads… So in the end some analysis of privacy implications is required, which I feel is a very difficult problem because of the unknowns.

And then there’s this to consider

edit: Looking at Optimal Path Length for TOR, there is a discussion of optimal path lengths. “Tor’s design decision to build paths with precisely three routers is thought to strike the correct balance.”

dirvine · July 19, 2020, 8:36am

I agree. IMO privacy in many ways is like security/cryptography. It is about many different components that add privacy or security. No one thing provides all of either principle. So masking IP where we can, without degrading the system too much is one. Then many other things, such as masking traffic (quic), encrypting traffic (tls) and many more.

Our biggest privacy tool is self authentication or the ability to create an account unencumbered on the network. Doing this without permission is critical and where crypto currency and some projects have helped. I think a lot of projects miss this and ask for IP or email or actually pinpoint users/nodes on a map. These maps and fancy dashboards with all that info are used as marketing tool and I am always astonished, folk love it, they buy into these projects for the dashboard that tells them so much, but they then think these are privacy based networks. It’s an issue we need to get over, the more private and secure we are the more boring our charts/dashboards will be. It’s an interesting thing that the frailty of humanity is like this. We feel in control when we have loads of info on everyone, but want nobody to have info on us

In any case I digress, we will have many small parts, like masked IP, encrypted chunks, multi degrees of separation (client → client handlers ->data handlers → Adults) and more where we can.

It’s a constant struggle as many developers will say oh hey let’s log X Y Z and then realise even that can be a huge privacy nightmare. So boring is the answer many times and the humans desire of info needs suppressed. Then we need simple verifiable code.

tfa · July 19, 2020, 10:00am

I would say much easier. The difference between the two is that you get a random node IP address when you launch a client and so, at the beginning you will get many IP addresses but then getting a new one will be slower and slower. With a get request you can target explicitly a yet uncovered section and so, the whole mapping will be generated very quickly.

I think that @dirvine’s wording is not quite right because knowledge of a remote section is very limited: only its prefix and its public key are known. This is in contrast of elders of neighboring sections whose IP addresses and precise XOR names are known.

I may be wrong, but this is how I interpret this line of code:

github.com

maidsafe/sn_routing/blob/9bacd4060971f114eaf06d1bc29f34e50cf849c5/src/section/section_map.rs#L29§


      
          use xor_name::{Prefix, XorName};
          
          /// Container for storing information about sections in the network.
          #[derive(Clone, Debug, PartialEq, Eq, Hash, Serialize, Deserialize)]
          pub struct SectionMap {
              // Our section.
              our: Proven<EldersInfo>,
              // Neighbour sections: maps section prefixes to their latest signed elders infos.
              neighbours: PrefixMap<Proven<EldersInfo>>,
              // BLS public keys of known sections excluding ours.
              keys: PrefixMap<Proven<(Prefix, bls::PublicKey)>>,
              // Indices of our section keys that are trusted by other sections.
              knowledge: PrefixMap<Proven<(Prefix, u64)>>,
          }
          
          impl SectionMap {
              pub fn verify(&self, history: &SectionProofChain) -> bool {
                  self.our.verify(history)
                      && self.neighbours.iter().all(|proven| proven.verify(history))
                      && self.keys.iter().all(|proven| proven.verify(history))
                      && self.knowledge.iter().all(|proven| proven.verify(history))

Note: I would like to be able to get all elders XOR names, so that a global view of the network could be generated easily, like I did in past test networks with a galaxy representation where nodes were stars and sections constellations. But maybe even this information is not safe for the network.

dirvine · July 19, 2020, 10:04am

Iff you are an Elder node you will get this. It does “expose” Elders to other Elders which is easier than having to get an elder per neighbor set. However it allows us to launch quicker and also analyse the network a bit better in early days. To be clear there are 7 elders per section and circa 60-120 nodes per section. So it’s a small % of all nodes and that is good. It’s all a balance to be honest and a fully working feature complete network will let us analyse these parts concretely and with focus. This will be prior to real safecoin. So our drive in house is to get feature complete then tweak where we feel we must. Some of your work and @mav as well as others will be crucial in these steps.

Topic		Replies	Views
Vault routing burden Safe-Node	69	6614	November 4, 2015
Network efficiency and performance improvement algorithm proposal Beginners	25	756	June 3, 2021
Internals of the Safe network Development	10	601	October 27, 2022
Caching Chunks (lru_time_cache) Development	3	1016	October 31, 2015
RFC 58 - Reliable Message Delivery RFCs	10	1701	June 15, 2020

Fast Ephemeral Routing

Fast Ephemeral Routing

Summary

Motivation

Detailed Design

Ephemeral Vouchers

Caching

Privacy

Optimisation For Message Size

Efficiency Analysis

Deployment

EV Retention Period

Drawbacks

Alternatives

Unresolved Questions

Related Topics