# StochasticOptimalTransport

`StochasticOptimalTransport.DiscreteMeasure`

`StochasticOptimalTransport.ctransform`

`StochasticOptimalTransport.wasserstein`

`StochasticOptimalTransport.DiscreteMeasure`

— Method`DiscreteMeasure(xs::AbstractVector, ps::AbstractVector)`

Construct a discrete measure with support `xs`

and corresponding weights `ps`

.

`StochasticOptimalTransport.ctransform`

— Method`ctransform(c, v, x, ν, ε)`

Compute the c-transform

\[v^{c,ε}(x) = \begin{cases} - ε \log\bigg(\int \exp{\Big(\frac{v_y - c(x, y)}{ε}\Big)} \, ν(\mathrm{d}y)\bigg) & \text{if } ε > 0,\\ \min_{y} c(x, y) - v_y & \text{otherwise}. \end{cases}\]

`StochasticOptimalTransport.wasserstein`

— Method`wasserstein([rng, ]c, μ, ν[, ε; kwargs...])`

Estimate the (entropic regularization of the) Wasserstein distance

\[W_{ε}(μ, ν) = \min_{π ∈ Π(μ,ν)} \int c(x, y) \,π(\mathrm{d}(x,y)) + ε \mathrm{KL}(π \,|\, μ ⊗ ν)\]

with respect to cost function `c`

using stochastic optimization.

If both measures `μ`

and `ν`

are `DiscreteMeasure`

s, then the Wasserstein distance is approximated with stochastic averaged gradient descent (SAG). The convergence rate of SAG is $O(1 / k)$, where $k$ is the number of iterations, and hence converges faster than stochastic gradient descent (SGD) at the price of increased memory consumption.

If only one of the measures `μ`

and `ν`

is a `DiscreteMeasure`

, then the Wasserstein distance is approximated with stochastic gradient descent with averaging (SGA). In this case, it is required that samples of the non-discrete measure `λ`

can be obtained with `rand(rng, λ)`

. The convergence rate of SGA is $O(1/√k)$, where $k$ is the number of iterations.

If `ε`

is `nothing`

(the default), then the unregularized Wasserstein distance is approximated. Otherwise, the entropic regularization with `ε > 0`

is estimated.

The SAG algorithm uses a constant step size, whereas the SGA algorithm uses the step size schedule

\[ τᵢ = \frac{τ₁}{1 + \sqrt{(i - 1) / w}}\]

for the $i$th iteration, where $τ₁$ corresponds to the initial step size and $w$ indicates the number of iterations serving as a warm-up phase.

**Keyword arguments**

`maxiters::Int = 10_000`

: maximum number of gradient steps`stepsize = 1`

: constant stepsize of SAG or initial step size $τ₁$ of SGA`warmup_phase = 1`

: warm-up phase $w$ of SGA`atol = 0`

: absolute tolerance`rtol = iszero(atol) ? typeof(float(atol))(1 // 10_000) : 0`

: relative tolerance`montecarlo_samples = 10_000`

: Number of Monte Carlo samples from`μ`

for approximating an expectation with respect to`μ`

in the semi-discrete optimal transport problem

**References**

Genevay et al. (2016). Stochastic Optimization for Large-Scale Optimal Transport. Advances in Neural Information Processing Systems (NIPS 2016), 29:3440-3448.

Peyré, Gabriel, & Marco Cuturi (2019). Computational Optimal Transport. Foundations and Trends in Machine Learning, 11(5-6):355-607.