## Hyperbolic Entailment Cones for Learning Hierarchical Embeddings

Octavian-Eugen Ganea, Gary Becigneul, Thomas Hofmann

ICML'18 submission

### Basic Concepts of Differential Geometry

• Manifold: generalization of a surface in n dimensions

• Tangent space at x: $$T_x M$$

• Riemannian metric: $$g=(g_x(\cdot,\cdot))_{x\in M}$$

• Length of a curve $$\gamma$$:
$$L(\gamma)=\int_{t=0}^1\sqrt{g_{\gamma(t)}(\gamma'(t),\gamma'(t))}dt$$
• Exponential map: $$\exp_x(v):T_x M\to M$$

https://commons.wikimedia.org/wiki/File:Tangentialvektor.svg

from http://inspirehep.net/record/1355197/plots

### Hyperbolic Geometry

from http://mathworld.wolfram.com/PoincareHyperbolicDisk.html

### Hyperbolic Geometry

• Geodesics
• Conformality (angle preserving)
• Metric tensor (gives geodesics, distances, needed for optimization)
• Distance $$d_{\mathbb D}(x,y) = \cosh^{-1}\left(1+ 2 \frac{\| x-y\|^2}{(1-\|x\|^2) \cdot (1-\|y\|^2)} \right)$$

### Poincaré Ball

• Manifold: $$\mathbb{D}^n=\{x\in\mathbb{R}^n\mid\Vert x\Vert_2<1\}$$
• Tangent space: $$T_x\mathbb{D}^n=\mathbb{R}^n$$
• Metric tensor: $$g_x(u,v)=\lambda_x^2 u^T v,\ \lambda_x:=\frac{2}{1-\Vert x\Vert^2}$$
• Distance $$d_{\mathbb D}(x,y) = \cosh^{-1}\left(1+ 2 \frac{\| x-y\|^2}{(1-\|x\|^2) \cdot (1-\|y\|^2)} \right)$$

Property 1:

• Volumes grow exponentially with radius, similar to trees in which number of nodes grows exponentially with depth
• Length of circle of radius $$r$$: $$2\pi\sinh(r)$$

### Properties of the Poincaré Ball

[1] Christopher De Sa et al., 2018, Representation Tradeoffs for Hyperbolic Embeddings

\frac{d(x,y)}{d(0,x) + d(0,y)}
$\frac{d(x,y)}{d(0,x) + d(0,y)}$

Figure taken from [1].

Property 2:

### Hyperbolic vs Euclidean Tree Embeddings

[1] https://www.cs.cornell.edu/~cdesa/papers/arxiv2018_hyperbolic.pdf

[2] https://en.wikipedia.org/wiki/Hyperbolic_metric_space

Trees can be embedded with arbitrary low distortions in $$\mathbb D_2$$, but NOT in the Euclidean space for any number of dimensions! [1]

\text{distortion}(G) = \sum_{x,y \in G, x \neq y}\frac{|d_G(x,y) - d_{M}(x,y)|}{d_G(x,y)}
$\text{distortion}(G) = \sum_{x,y \in G, x \neq y}\frac{|d_G(x,y) - d_{M}(x,y)|}{d_G(x,y)}$
• Graph distortion when embedding in manifold M:
• Reverse: given $$n$$ points on $$\mathbb D$$, can we find a tree quasi-isometrically* embedded into these points ?
• Yes for hyperbolic spaces (Gromov theorem [2])
• No for Euclidean spaces.
*:\ \forall x_1,...,x_n\in X, \exists \text{weighted tree } T \text{and embedding } f:T\to X, \text{ s.t. } \forall i,j
$*:\ \forall x_1,...,x_n\in X, \exists \text{weighted tree } T \text{and embedding } f:T\to X, \text{ s.t. } \forall i,j$
\vert d_T(f^{-1}(x_i),f^{-1}(x_j))- d_X(x_i,x_j)\vert=\mathcal{O}(\delta\log(n))
$\vert d_T(f^{-1}(x_i),f^{-1}(x_j))- d_X(x_i,x_j)\vert=\mathcal{O}(\delta\log(n))$

### Hyperbolic Geometry - Math

For $$x\in\mathbb{D}^n$$ and $$v\in T_x\mathbb{D}^n$$, we have the geodesic satisfying $$\gamma_{x,v}(0)=x,\ \dot{\gamma}_{x,v}(0)=v$$,

$$\gamma_{x,v}(t)=\dfrac{\left(\lambda_x\cosh(t)+\lambda_x^2\langle x,v\rangle\sinh(t)\right)x+\lambda_x\sinh(t) v}{1+(\lambda_x-1)\cosh(t)+\lambda_x^2\langle x,v\rangle\sinh(t)}$$
and the exponential map

$$\exp_x(v)=\frac{\lambda_x\left(\cosh(\lambda_x\Vert v\Vert)+\langle x,\frac{v}{\Vert v\Vert}\rangle\sinh(\lambda_x\Vert v\Vert)\right)}{1+(\lambda_x-1)\cosh(\lambda_x\Vert v\Vert)+\lambda_x\langle x,\frac{v}{\Vert v\Vert}\rangle\sinh(\lambda_x\Vert v\Vert)}x+$$ $$\frac{\frac{1}{\Vert v\Vert}\sinh(\lambda_x\Vert v\Vert)}{1+(\lambda_x-1)\cosh(\lambda_x\Vert v\Vert)+\lambda_x\langle x,\frac{v}{\Vert v\Vert}\rangle\sinh(\lambda_x\Vert v\Vert)}v$$

### Poincaré Embeddings - M.Nickel et al, 2017

Nickel, M., and Douwe K. "Poincaré embeddings for learning hierarchical representations." NIPS. 2017.

• does not explicitly model asymmetric relations; relies on heuristic:   score(is-a$$(u,v)) = (1 + \alpha (\|u\| - \|v\|)) d(u,v)$$
• loss:

### Modeling Entailment  - Order Embeddings

Vendrov, Ivan, et al. "Order-embeddings of images and language." ICLR 2016

• entailment region/cone: $$\mathcal{O}_x:=\lbrace y\mid\forall i\in[d],\ y_i\geq x_i\rbrace$$
• define a partial order on $$\mathbb{R}^d$$

• capacity scales linearly with dimension d
• heavy  cone intersections
• disjoint areas of the cones are bounded

### Hyperbolic Entailment Cones

• Goal 1: generalize entailment regions/cones to any Riemannian manifold
• Goal 2: derive cones in the hyperbolic space (d-dimensional Poinaré ball)

### Convex Cones in a Complete Riemannian Manifold

• Euclidean convex cones:

$$v_1, v_2 \in S \Longrightarrow \alpha v_1 + \beta v_2 \in S \quad (\forall \alpha, \beta \geq 0)$$

• Generalize convex cones for complete Riemannian manifolds:$$S \subseteq T_x\mathcal{M}, \quad \frak S_x := \exp_x \left( S \right) \subset\mathcal{M}$$

### Angular Cones in the Poincaré Ball

Goal:

• induce a partial order in $$\mathbb{D}^n$$
• avoid heavy cone intersections
• extend in all space directions
• scale capacity exponentially with dimension

### Angular Cones in the Poincaré Ball

Satisfy four intuitive properties:

• axial symmetry (depends on aperture angle $$\psi$$) $$S_x^\psi := \{ v: \angle(v,x) \le \psi(x) \}, \quad \frak S^\psi_x := \exp_x(S_x^\psi)$$
• continuous cone aperture function $$\psi$$
• rotation invariance (only depends on norm of x) $$\exists \tilde{\psi} \quad \text{s.t.} \quad \psi(x) = \tilde{\psi}(\Vert x\Vert)$$
\psi
$\psi$

### Angular Cones in the Poincaré Ball

Satisfy four intuitive properties:

• (hardest) transitivity of nested angular cones $$\forall x,x' \in \mathbb D^n: \quad x' \in \frak S^{\psi(x)}_x \; \Longrightarrow \; \frak S^{\psi(x')}_{x'} \subseteq \frak S^{\psi(x)}_x$$
\psi
$\psi$
\psi
$\psi$

### Angular Cones in the Poincaré Ball

Four properties $$\Longrightarrow$$ closed form expression of $$\psi$$

Lemma 1: If transitivity holds, then $$\forall x \in \text{Dom}(\psi): \quad \psi(x) \leq \frac{\pi}{2}$$

Proof idea:

• take any $$x' \in \partial \frak S^\psi_x$$, the cone border
• then $$\psi(x')$$ cannot be $$> \frac{\pi}{2}$$
• continuously move $$x'$$ towards $$x$$

### Angular Cones in the Poincaré Ball

Four properties $$\Longrightarrow$$ closed form expression of $$\psi$$

Theorem 2: If transitivity holds, then the function $$h : [0,1) \cap \text{Dom}(\tilde{\psi}) \rightarrow \mathbb R_+, \quad h(r):= \frac{r}{1 - r^2} \sin(\tilde{\psi}(r))$$ is non-increasing.

Proof idea:

• $$\forall x\in\mathbb D^n, \forall x' \in \partial \frak S^{\psi(x)}_x$$ transitivity and hyperbolic law of sines in $$\Delta$$ Oxx': $$\Longrightarrow \sin(\psi(\|x'\|)) \sinh(\|x'\|_{\mathbb D}) \leq \sin(\psi(\|x\|)) \sinh(\|x\|_{\mathbb D})$$

### Angular Cones in the Poincaré Ball

Four properties $$\Longrightarrow$$ closed form expression of $$\psi$$

Lemma 1 and Theorem 2 $$\Longrightarrow$$ $$\psi : \mathbb{D}^n \setminus \mathcal{B}^n(O,\epsilon) \rightarrow (0,\pi/2)$$ $$x \mapsto \arcsin ( K (1 - \|x\|^2)/\|x\| )$$

or, equivalently (more practically useful):

$$\frak S^{\psi(x)}_x =\bigg\{ y \in \mathbb{D}^n | \quad \Xi(x,y) \leq \arcsin \left( K \frac{1 - \|x\|^2}{\|x\|} \right) \bigg\}$$

where

$$\Xi(x,y)=\arccos \left( \frac{<x,y> (1 + \|x\|^2) - \|x\|^2 (1 + \|y\|^2)}{\|x\| \cdot \|x-y\| \sqrt{1 + \|x\|^2 \|y\|^2 - 2 <x,y>}} \right)$$

### Angular Cones in the Poincaré Ball

Four properties $$\Longrightarrow$$ closed form expression of $$\psi$$

Similar derivation can be done for Euclidean angular cones

### Learning with Entailment Cones

• Learning from positive and negative pairs of nodes in a DAG
• loss: $$\mathcal{L}=\sum_{(u,v)\in P} E(u,v)+\sum_{(u',v')\in N}\max(0,\gamma-E(u',v'))$$
• penalty: $$E(u,v):=\max(0,\Xi(u,v)-\psi(u)),$$

### Riemannian Optimization

• SGD becomes: $$u\leftarrow\exp_u(-\eta\nabla^R_u\mathcal{L}), \quad u \in \mathbb{D}$$
• Riemannian gradient: $$\nabla^R_u\mathcal{L}=(1/\lambda_u)^2\nabla_u\mathcal{L}, \quad \text{conformal factor } \lambda_u=\frac{2}{1-\|u\|^2}$$
• fast and efficient approximation exists for the hyperbolic space (i.e. via retractions)

### Experiments - Qualitative

• Left deck: uniform tree of depth 7 and branching factor 3
• Right deck: WordNet mammal subtree - 4230 edges, 1165 nodes
• Each left: Poincare embeddings
• Each right: our hyperbolic cones

### Quantitative Experiments - Link Prediction in DAGs

• Transitive closure of WordNet hierarchy: 82K nodes, 660K edges