Neighbor Embedding on genomics & equivalent affinity matcher formulation#

We illustrate the basic usage of TorchDR with different neighbor embedding methods on the SNARE-seq gene expression dataset with given cell type labels.

# Author: Titouan Vayer <[email protected]>
#         Hugues Van Assel <[email protected]>
#
# License: BSD 3-Clause License

import urllib.request
import matplotlib.pyplot as plt
import numpy as np

from torchdr import (
    SNE,
    TSNE,
    UMAP,
    AffinityMatcher,
    EntropicAffinity,
    NormalizedGaussianAffinity,
)

Load the SNARE-seq dataset (gene expression) with cell type labels#

def load_numpy_from_url(url, delimiter="\t"):
    response = urllib.request.urlopen(url)
    data = response.read().decode("utf-8")
    data = data.split("\n")
    data = [row.split(delimiter) for row in data if row]
    numpy_array = np.array(data, dtype=float)
    return numpy_array


url_x = "https://rsinghlab.github.io/SCOT/data/snare_rna.txt"
X = load_numpy_from_url(url_x)

url_y = "https://rsinghlab.github.io/SCOT/data/SNAREseq_types.txt"
Y = load_numpy_from_url(url_y)

Run neighbor embedding methods#

params = {
    "optimizer": "Adam",
    "optimizer_kwargs": None,
    "max_iter": 100,
    "lr": 1e0,
}

sne = SNE(early_exaggeration_coeff=1, **params)

umap = UMAP(early_exaggeration_coeff=1, **params)

tsne = TSNE(early_exaggeration_coeff=1, **params)

all_methods = {
    "TSNE": tsne,
    "SNE": sne,
    "UMAP": umap,
}

for method_name, method in all_methods.items():
    print(f"--- Computing {method_name} ---")
    method.fit(X)
Random state is None
Random state is None
Random state is None
--- Computing TSNE ---
Random state is None
--- Computing SNE ---
Random state is None
--- Computing UMAP ---
Random state is None

Plot the different embeddings#

fig = plt.figure(figsize=(12, 4))

for i, (method_name, method) in enumerate(all_methods.items()):
    ax = fig.add_subplot(1, 3, i + 1)
    emb = method.embedding_.detach().numpy()  # get the embedding
    ax.scatter(emb[:, 0], emb[:, 1], c=Y, s=10)
    ax.set_title(f"{method_name}", fontsize=24)
    ax.set_xticks([])
    ax.set_yticks([])
plt.tight_layout()
TSNE, SNE, UMAP

Using AffinityMatcher#

We can reproduce the same embeddings using the class torchdr.AffinityMatcher. The latter takes as input two affinities and minimize a certain matching loss between them.

To reproduce the SNE algorithm we can match, via the cross entropy loss, a torchdr.EntropicAffinity with a torchdr.NormalizedGaussianAffinity.

sne_affinity_matcher = AffinityMatcher(
    n_components=2,
    # SNE matches an EntropicAffinity
    affinity_in=EntropicAffinity(sparsity=False),
    # with a Gaussian kernel normalized by row
    affinity_out=NormalizedGaussianAffinity(normalization_dim=1),
    loss_fn="cross_entropy_loss",  # and the cross_entropy loss
    **params,
)
sne_affinity_matcher.fit(X)

fig = plt.figure(figsize=(8, 4))
two_sne_dict = {"SNE": sne, "SNE (with affinity matcher)": sne_affinity_matcher}
for i, (method_name, method) in enumerate(two_sne_dict.items()):
    ax = fig.add_subplot(1, 2, i + 1)
    emb = method.embedding_.detach().numpy()  # get the embedding
    ax.scatter(emb[:, 0], emb[:, 1], c=Y, s=10)
    ax.set_title(f"{method_name}", fontsize=15)
    ax.set_xticks([])
    ax.set_yticks([])
plt.tight_layout()
SNE, SNE (with affinity matcher)
Random state is None
Random state is None

On the efficiency of using the torchdr API rather than AffinityMatcher directly#

Note

Calling torchdr.SNE enables to leverage sparsity and therefore significantly reduces the computational cost of the algorithm compared to using torchdr.AffinityMatcher with the corresponding affinities. In TorchDR, it is therefore recommended to use the specific class associated with the desired algorithm when available.

Total running time of the script: (1 minutes 45.844 seconds)

Gallery generated by Sphinx-Gallery