Mapping new data to the Human Neural Organoid Cell Atlas (HNOCA)
We built the HNOCA as a comprehensive collection of (healthy) neural organoid datasets. One can view it as a baseline of how neural organoids can "look like" and therefore it can be very valuable to compare new experimental data, e.g. from disease models to the HNOCA. Moreover, we highly encourage the community to help further expand the atlas over time and submit new datasets to be inclued.
Here we'll show how you can map new data to the HNOCA and how to perform quantitative comparisons. Analogous to our analysis in the manuscript, we map scRNA-seq data from a brain organoid model of fragile X syndrome (Kang et al.) and then perform DE analysis. First, let's load all the necessary libraries and the HNOCA data.
import os
import scarches
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import hnoca.map as hmap
import hnoca.stats as stats
os.chdir("/home/fleckj/scratch/hnoca/")
# Read reference and query data
hnoca_adata = sc.read("HNOCA_hv3k.h5ad")
query_adata = sc.read("KangWen2021_FXS.h5ad")
# Format batch columns
query_adata.obs["batch"] = query_adata.obs["sample"].astype(str).copy()
# Load reference model
hnoca_model = scarches.models.scPoli.load(
"scpoli_HNOCA/",
adata=hnoca_adata,
)
print(hnoca_model)
Now we have eveything together to start the reference mapping. Here we are finetuning the HNOCA scPoli model for 30 epochs (n_epochs
) retraining only the weights for the new batch covariates (retrain="partial"
).
mapper = hmap.AtlasMapper(hnoca_model)
mapper.map_query(
query_adata,
retrain='partial',
n_epochs=30,
pretraining_epochs=20,
eta=10,
batch_size=1024
)
mapper.save("HNOCA_FXS_mapper/")
With this finetuned model we can then compute a weighted kNN graph between the query and reference data.
mapper = hmap.AtlasMapper.load("HNOCA_FXS_mapper/")
mapper.compute_wknn(k=100)
Using the wkNN graph, we can now transfer cell type labels from the HNOCA to the query data and compute a UMAP from the latent space of the scPoli model.
ct_transfer = mapper.transfer_labels(label_key="level_1_pca")
ct_transfer[["best_label", "best_score"]]
X_latent = mapper.get_latent_representation(query_adata)
query_adata.obsm["X_latent"] = X_latent
query_adata.obs["best_label"] = ct_transfer["best_label"]
sc.pp.neighbors(query_adata, use_rep="X_latent", method="rapids")
sc.tl.umap(query_adata, min_dist=0.3, method="rapids")
sc.pl.umap(query_adata, color=["condition", "best_label"], wspace=0.4)
Looks quite good! Now we can quantify the difference between the FXS dataset and the HNOCA. For this, we first use the wkNN graph to find matched (meta-)cells for each cell in the FXS dataset. Then we perform a differential expression analysis between the matched HNOCA cells and the query using an F-test.
matched_adata = mapper.get_matched_expression()
matched_adata
paired_de = stats.test_de_paired(
query_adata,
matched_adata,
num_threads=4,
var_names=query_adata.var_names,
adjust_method="holm",
)
paired_de
plt.scatter(paired_de['coef'], -np.log10(paired_de['pval']), s=10)
plt.ylabel('-Log10(P)')
plt.show()