ctransform(c, v, x, ν, ε)

Compute the c-transform

\[v^{c,ε}(x) = \begin{cases} - ε \log\bigg(\int \exp{\Big(\frac{v_y - c(x, y)}{ε}\Big)} \, ν(\mathrm{d}y)\bigg) & \text{if } ε > 0,\\ \min_{y} c(x, y) - v_y & \text{otherwise}. \end{cases}\]

wasserstein([rng, ]c, μ, ν[, ε; kwargs...])

Estimate the (entropic regularization of the) Wasserstein distance

\[W_{ε}(μ, ν) = \min_{π ∈ Π(μ,ν)} \int c(x, y) \,π(\mathrm{d}(x,y)) + ε \mathrm{KL}(π \,|\, μ ⊗ ν)\]

with respect to cost function c using stochastic optimization.

If both measures μ and ν are DiscreteMeasures, then the Wasserstein distance is approximated with stochastic averaged gradient descent (SAG). The convergence rate of SAG is $O(1 / k)$, where $k$ is the number of iterations, and hence converges faster than stochastic gradient descent (SGD) at the price of increased memory consumption.

If only one of the measures μ and ν is a DiscreteMeasure, then the Wasserstein distance is approximated with stochastic gradient descent with averaging (SGA). In this case, it is required that samples of the non-discrete measure λ can be obtained with rand(rng, λ). The convergence rate of SGA is $O(1/√k)$, where $k$ is the number of iterations.

If ε is nothing (the default), then the unregularized Wasserstein distance is approximated. Otherwise, the entropic regularization with ε > 0 is estimated.

The SAG algorithm uses a constant step size, whereas the SGA algorithm uses the step size schedule

\[ τᵢ = \frac{τ₁}{1 + \sqrt{(i - 1) / w}}\]

for the $i$th iteration, where $τ₁$ corresponds to the initial step size and $w$ indicates the number of iterations serving as a warm-up phase.

Keyword arguments

  • maxiters::Int = 10_000: maximum number of gradient steps
  • stepsize = 1: constant stepsize of SAG or initial step size $τ₁$ of SGA
  • warmup_phase = 1: warm-up phase $w$ of SGA
  • atol = 0: absolute tolerance
  • rtol = iszero(atol) ? typeof(float(atol))(1 // 10_000) : 0: relative tolerance
  • montecarlo_samples = 10_000: Number of Monte Carlo samples from μ for approximating an expectation with respect to μ in the semi-discrete optimal transport problem


Genevay et al. (2016). Stochastic Optimization for Large-Scale Optimal Transport. Advances in Neural Information Processing Systems (NIPS 2016), 29:3440-3448.

Peyré, Gabriel, & Marco Cuturi (2019). Computational Optimal Transport. Foundations and Trends in Machine Learning, 11(5-6):355-607.