### Current browse context:

stat.ML

### Change to browse by:

### References & Citations

# Statistics > Machine Learning

# Title: Linear Time Sinkhorn Divergences using Positive Features

(Submitted on 12 Jun 2020 (v1), last revised 26 Oct 2020 (this version, v3))

Abstract: Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing in general quadratically in the size $n$ of the support of these distributions. Indeed, solving optimal transport (OT) with an entropic regularization requires computing a $n\times n$ kernel matrix (the neg-exponential of a $n\times n$ pairwise ground cost matrix) that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-\log\dotp{\varphi(x)}{\varphi(y)}$ where $\varphi$ is a map from the ground space onto the positive orthant $\RR^r_+$, with $r\ll n$. This choice yields, equivalently, a kernel $k(x,y)=\dotp{\varphi(x)}{\varphi(y)}$, and ensures that the cost of Sinkhorn iterations scales as $O(nr)$. We show that usual cost functions can be approximated using this form. Additionaly, we take advantage of the fact that our approach yields approximation that remain fully differentiable with respect to input distributions, as opposed to previously proposed adaptive low-rank approximations of the kernel matrix, to train a faster variant of OT-GAN \cite{salimans2018improving}.

## Submission history

From: Meyer Scetbon [view email]**[v1]**Fri, 12 Jun 2020 10:21:40 GMT (2859kb,D)

**[v2]**Thu, 25 Jun 2020 09:52:20 GMT (2859kb,D)

**[v3]**Mon, 26 Oct 2020 07:55:17 GMT (3280kb,D)

Link back to: arXiv, form interface, contact.