Reference mapping with HNOCA-tools
In this tutorial we'll explore how to perform reference mapping using the HNOCA toolbox. Analogous to our analysis in the HNOCA manuscript, we map an organoid morphogen screen dataset (Amin at al.) to a reference atlas of the developing human brain (Braun at al.). We already trained a scANVI model for this. Based on this model HNOCA-tools provides functions to map the organoid data to the reference atlas and make quantitative comparisons.
To get started, we first load all necessary pacakges and load the data & model.
import os
import scvi
import scanpy as sc
import pandas as pd
import numpy as np
import hnoca.map as hmap
os.chdir("/home/fleckj/scratch/hnoca/")
# Read reference and query data
ref_adata = sc.read("BraunLinnarsson_2023_devbrain_hv2k.h5ad")
query_adata = sc.read("AminPasca_2023_organoid_screen_hv2k.h5ad")
# Format batch columns
query_adata.obs["batch"] = query_adata.obs["orig.ident"].astype(str).copy()
ref_adata.obs["batch"] = ref_adata.obs["Donor"].astype(str).copy()
# Load reference model
ref_model = scvi.model.SCANVI.load(
"scarches_BraunLinnarsson/model.pt",
adata=ref_adata,
)
print(ref_model)
Now we can use the AtlasMapper
from HNOCA-tools to map the organoid data to the reference atlas. Under the hood this is using the transfer learning functionality of scANVI to finetune te model to the query dataset. We finetune for 100 epochs (max_epochs
) with a batch size of 1024 (batch_size
) and use retrain="partial"
to only retrain the weights corresponding to the new batch covariates.
# Initiate mapper
mapper = hmap.AtlasMapper(ref_model)
# Map query data to reference
mapper.map_query(
query_adata,
retrain="partial",
max_epochs=100,
batch_size=1024
)
We then compute a weighted kNN graph between the query and reference data.
mapper.compute_wknn(k=100)
Based on this wkNN mapping, we can now transfer labels from the reference and to estimate cell type composition of the query. We can also compute presence scores with respect to the reference data to assess which reference cell types are covered in the screen.
subregion_transfer = mapper.transfer_labels(label_key="CellClass")
subregion_transfer[["best_label", "best_score"]]
Let's check how this looks on a UMAP. We can see that this matches the original author annotations ("class"
) quite well.
query_adata.obs["best_label"] = subregion_transfer["best_label"]
query_adata.obs["best_score"] = subregion_transfer["best_score"]
sc.pl.umap(query_adata, color=["class", "best_label", "best_score"], ncols=1)
Next, we'd like to know which cell types of the reference atlas are covered by the organoid screen. For this, we can compute the per-condition presence scores based on the wkNN graph. (Depending on the number of conditions this can take a while, even if you are running it on a GPU.)
presence_scores = mapper.get_presence_scores(split_by="condition")
Now let's check out the presence scores on a umap of the reference atlas.
ref_adata.obs["max_presence_scores"] = presence_scores["max"]
sc.pl.umap(ref_adata, color=["CellClass", "Subregion", "max_presence_scores"], ncols=1, cmap="Blues")
From here on one can perform various other analysis based on the presence scores, e.g. compare them with the organoid atlas to reveal cell types for which coverage was improved by the screen.