Stats, Money, and NYC


Parametric T-SNE

The original (non-parametric) TSNE paper has almost 6000 citations and everyone uses it. One year after the paper came out, the same author wrote a second paper describing parametric TSNE where he trains a neural network to minimize the TSNE loss. This paper only has <200 citations and no one knows about it even though parametric TSNE is much more useful.

You train the NN once and then you can embed an arbitraty number of data points. This is what everyone actually wants to do.

How come everyone uses non-parametric TSNE? I think it's because everyone uses scikit learn which only has non-parametric TSNE.

>>> import numpy as np
>>> from sklearn.manifold import TSNE
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE(n_components=2).fit_transform(X)
>>> X_embedded.shape
(4, 2)

Nothing is easier than fit_transform. The only (good) non-parametric TSNE implementation I could find is in matlab. Yikes.

My main question with parametric TSNE is if data moves a tiny amount in the original space, is it guaranteed to only move a tiny amount in the embedded space? That seems important for a lot of applications (like tracking changes over time).

You'll only receive email when 2938 publishes a new post

More from 2938: