We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: How many moments does MMD compare?

Abstract: We present a new way of study of Mercer kernels, by corresponding to a special kernel $K$ a pseudo-differential operator $p({\mathbf x}, D)$ such that $\mathcal{F} p({\mathbf x}, D)^\dag p({\mathbf x}, D) \mathcal{F}^{-1}$ acts on smooth functions in the same way as an integral operator associated with $K$ (where $\mathcal{F}$ is the Fourier transform). We show that kernels defined by pseudo-differential operators are able to approximate uniformly any continuous Mercer kernel on a compact set.
The symbol $p({\mathbf x}, {\mathbf y})$ encapsulates a lot of useful information about the structure of the Maximum Mean Discrepancy distance defined by the kernel $K$. We approximate $p({\mathbf x}, {\mathbf y})$ with the sum of the first $r$ terms of the Singular Value Decomposition of $p$, denoted by $p_r({\mathbf x}, {\mathbf y})$. If ordered singular values of the integral operator associated with $p({\mathbf x}, {\mathbf y})$ die down rapidly, the MMD distance defined by the new symbol $p_r$ differs from the initial one only slightly. Moreover, the new MMD distance can be interpreted as an aggregated result of comparing $r$ local moments of two probability distributions.
The latter results holds under the condition that right singular vectors of the integral operator associated with $p$ are uniformly bounded. But even if this is not satisfied we can still hold that the Hilbert-Schmidt distance between $p$ and $p_r$ vanishes. Thus, we report an interesting phenomenon: the MMD distance measures the difference of two probability distributions with respect to a certain number of local moments, $r^\ast$, and this number $r^\ast$ depends on the speed with which singular values of $p$ die down.
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2106.14277 [cs.LG]
  (or arXiv:2106.14277v1 [cs.LG] for this version)

Submission history

From: Rustem Takhanov [view email]
[v1] Sun, 27 Jun 2021 16:44:17 GMT (67kb)

Link back to: arXiv, form interface, contact.