Hyperbolic embeddings for querying the RDF graph

Querying a graph is expensive. This idea is about making it faster.

Area grows quadratically in Euclidean space, volume by the cube, and so on. In hyperbolic space, area and volume grows exponentially. It’s quite insane just how much space there is in hyperbolic space, so here’s a video to get some intuition about it:

Anyway, the above implies an interesting property: graphs can be embedded in hyperbolic space without much distortion, in a way that the distance between the nodes is mostly preserved (trees can be embedded isomorphically, with no distortion at all), and that in a much smaller number of dimensions (for example, 2-3 instead of hundreds). Hyperbolic space can in fact be looked at as a kind of “continuous graph”.

So, how about using hyperbolic embeddings to index the RDF graph? Queries like “what is a short path between A and B?” could be turned into a geometric problem: the nodes close to the line between A and B will be candidates for the search. In fact, we may discover relationships that exist in real life but aren’t in the RDF database yet (“link prediction”).

I believe the Safe Network would allow for developing an embedding for the RDF nodes. In practice, these embeddings are built gradually by using positive and negative examples, such as “A and B are connected but A and C, A and D, A and E aren’t”. After each such step, A’s position (“embedding”) in the N-dimensional hyperbolic space is adjusted so that it would be closer to B but at the same time farther from C, D, and E. (It’s not necessary for the negative examples to be always correct because it’s a stochastic process anyway.)

So, the Safe node responsible for a given RDF entity would need to request a list of connected entities, each with their current embedding (position) and keep a cached list of random other RDF entities as negative examples, then update the entity’s embedding accordingly. This process would probably need to be piggy-backed on top of other sort of communication to be efficient.


Some basic references:

This paper sort of started it: https://arxiv.org/pdf/1705.08039.pdf, GitHub - facebookresearch/poincare-embeddings: PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Their newer work uses a different model of hyperbolic space, one with better numerical qualities for optimization, but it doesn’t have code on GitHub (yet): Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry | Maximilian Nickel, https://arxiv.org/pdf/1806.03417.pdf

12 Likes

I looked into volumes (“how many nodes I can fit here?”) in hyperbolic geometry and found that the number of dimensions doesn’t even matter because we can get the same (asymptotic) increase in volume for any dimension by multiplying the distances by a constant (or, which amounts to the same, by changing the curvature of the space). It’s just another example for how misleading our intuition from Euclidean geometry can be here.

2 Likes

That makes sense. In Euclidean space the volume in dimension n grows as r^n; in hyperbolic space you essentially substitute e^r for r (asymptotically), so (e^r)^n becomes e^(nr) - as if you just multiplied the distances by n. (This is obviously a huge mental shortcut, but should give some intuition how this is possible.)

All in all, nice find! I’m not really sure yet if and where we could use it in practice (I’m not really involved with the RDF graphs, so can’t really say what the applications would be there), but I love curved spaces, so I would love to see this applied somewhere in the project :wink:

BTW, not really related to all this, but a fun fact: the event horizon of a rapidly spinning black hole, while being topologically a sphere, can have a hyperbolic geometry near the “poles” (near the axis of rotation).

4 Likes

That makes complete sense. Also, the formulas for volume lead to the same conclusion, just in a more complicated way. In one that I can’t follow :sweat_smile:

Hyperbolic space makes little sense for the structure of the Safe Network itself because it doesn’t form a scale-free network. The type of subdivision sections use is strictly Euclidean, for example. (I’m not sure how node-aging changed things though so maybe I’m a little wrong on some more subtle level.)

Social networks, various types of knowledge fit terribly in Euclidean space but comfortably in hyperbolic, and then they can be searched easier. Not only that, but it also makes graph sharding (something pretty hard) not only possible but also relatively simple:

This picture is from a paper about indexing and sharding a large spatial dataset using the Poincaré disk model of hyperbolic space, and they found in their simulations that the subsections end up with the same average number of entries.

Your profile picture makes it just a tiny bit obvious :laughing:

Yes, all of this makes sense for the Safe Network only if graphs get some sort of “first-class object” support from the network because I’m skeptical that a large-scale index could be kept up-to-date just on the client side.

3 Likes