Learning Spatial Relationships with MISTy

Here, we show how to use LIANA’s implementation of MISTy, a framework presented in Tanevski et al., 2022.

MISTy is a tool that helps us better understand how different features, such as genes or cell types, interact with each other in space. MISTy does so by learning both intra- and extracellular relationships - i.e. those that occur within and between cells/spots. A major advantage of MISTy is its flexibility. It can model different perspectives, or “views,” each describing a different way markers are related to each other. Each of these views can describe a different spatial context, i.e. define a relationship among the observed expressions of the markers, such as intracellular regulation or paracrine regulation.

MISTy has only one fixed view - i.e. the intraview, which contains the target (dependent) variables. The other views we refer to as extra views, and they contain the independent variables used to predict the intra view. MISTy can fit any number of extra views, and each extra view can contain any number of variables. The extra views can thus simultaneously learn the dependencies of target variables across different modalities, such as cell type proportions, pathways, or genes, etc.

MISTy represents each view represents as a potential source of variation in the measurements of the target variables in the intra view. MISTy further analyzes each view to determine how it contributes to the overall expression or abundance of each target variable. It explains this contribution by identifying the interactions between measurements that led to the observed results.

606732de01e64071a089bc3b955b80e9

To showcase MISTy, we use a single 10x Visium slide from Kuppe et al. (2022).

Environment

pip install squidpy
pip install "decoupler>=1.4.0"

Import generic packages

[1]:

import scanpy as sc
import squidpy as sq
import decoupler as dc
import plotnine as p9
import liana as li

Import Helper functions needed to create MISTy objects.

[2]:

from liana.method import MistyData, genericMistyData, lrMistyData

Import Pre-defined Single view models

[3]:

from liana.method.sp import RandomForestModel, LinearModel, RobustLinearModel

Load and Normalize Data

We will use an ischemic 10X Visium spatial slide from Kuppe et al., 2022. It is a tissue sample obtained from a patient with myocardial infarction, specifically focusing on the ischemic zone of the heart tissue.

The slide provides spatially-resolved information about the cellular composition and gene expression patterns within the tissue.

[4]:

adata = sc.read("kuppe_heart19.h5ad", backup_url='https://figshare.com/ndownloader/files/41501073?private_link=4744950f8768d5c8f68c')

[5]:

adata.obs.head()

[5]:

	in_tissue	array_row	array_col	sample	n_genes_by_counts	log1p_n_genes_by_counts	total_counts	log1p_total_counts	pct_counts_in_top_50_genes	pct_counts_in_top_100_genes	pct_counts_in_top_200_genes	pct_counts_in_top_500_genes	mt_frac	celltype_niche	molecular_niche
AAACAAGTATCTCCCA-1	1	50	102	Visium_19_CK297	3125	8.047510	7194.0	8.881142	24.770642	31.387267	39.797053	54.503753	0.085630	ctniche_1	molniche_9
AAACAATCTACTAGCA-1	1	3	43	Visium_19_CK297	3656	8.204398	10674.0	9.275660	35.956530	42.167885	49.456624	61.045531	0.033275	ctniche_5	molniche_3
AAACACCAATAACTGC-1	1	59	19	Visium_19_CK297	3013	8.011023	7339.0	8.901094	33.247036	39.910069	47.227143	59.326884	0.029139	ctniche_5	molniche_3
AAACAGAGCGACTCCT-1	1	14	94	Visium_19_CK297	4774	8.471149	14235.0	9.563529	22.739726	29.884089	37.850369	51.099403	0.149194	ctniche_7	molniche_2
AAACAGCTTTCAGAAG-1	1	43	9	Visium_19_CK297	2734	7.913887	6920.0	8.842316	35.664740	42.268786	50.000000	62.384393	0.025601	ctniche_5	molniche_3

Normalize data

[6]:

adata.layers['counts'] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

Spot clusters

[7]:

sq.pl.spatial_scatter(adata, color=[None, 'celltype_niche'], size=1.3, palette='Set1')

This slide comes with estimated cell type proportions using cell2location; See Kuppe et al., 2022. Let’s extract from .obsm them to an independent AnnData object.

[8]:

# Rename to more informative names
full_names = {'Adipo': 'Adipocytes',
              'CM': 'Cardiomyocytes',
              'Endo': 'Endothelial',
              'Fib': 'Fibroblasts',
              'PC': 'Pericytes',
              'prolif': 'Proliferating',
              'vSMCs': 'Vascular_SMCs',
              }
# but only for the ones that are in the data
adata.obsm['compositions'].columns = [full_names.get(c, c) for c in adata.obsm['compositions'].columns]

[9]:

comps = li.ut.obsm_to_adata(adata, 'compositions')

[10]:

comps.var

[10]:


Adipocytes
Cardiomyocytes
Endothelial
Fibroblasts
Lymphoid
Mast
Myeloid
Neuronal
Pericytes
Proliferating
Vascular_SMCs

[11]:

# check key cell types
sq.pl.spatial_scatter(comps,
                      color=['Vascular_SMCs','Cardiomyocytes',
                             'Endothelial', 'Fibroblasts'],
                      size=1.3, ncols=2, img_alpha=0
                      )

Funcomics

Before we run MISTy, let’s estimate pathway activities as a way to make the data a bit more interpretable. We will use decoupler-py with pathways genesets from PROGENy. See this tutorial for details.

[12]:

# obtain genesets
progeny = dc.get_progeny(organism='human', top=500)

/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`

Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Downloading annotations for all proteins from the following resources: `['PROGENy']`

[13]:

# use multivariate linear model to estimate activity
dc.run_mlm(
    mat=adata,
    net=progeny,
    source='source',
    target='target',
    weight='weight',
    verbose=True,
    use_raw=False,
)

Running mlm on mat with 4113 samples and 17703 targets for 14 sources.

100%|██████████| 1/1 [00:03<00:00,  3.05s/it]

[14]:

# extract progeny activities as an AnnData object
acts_progeny = li.ut.obsm_to_adata(adata, 'mlm_estimate')

[15]:

# Check how the pathway activities look like
sq.pl.spatial_scatter(acts_progeny, color=['Hypoxia', 'JAK-STAT'], cmap='RdBu_r', size=1.3)

Formatting & Running MISTy

The implementation of MISTy in LIANA relies on MuData objects (Bredikhin et al., 2022) and extends them to a very simple child class we call “MistyData”. To make it easier to use, we provide functions to construct “MistyData” objects that transform the data into a format that MISTy can use.

Briefly, a “MistyData” object is just a MuData object with intra as one of the modalities - this is the view in which the (target) variables explained by all other views are stored. MISTy is flexible to any other view that is appended, provided it also contains a spatial neighbors graph.

Writing a MistyData object will simply result in a MuData object being written to disk. To read it back as MistyData, use mudata.read_h5ad and pass back it to the MistyData() function.

Let’s use genericMistyData to construct a MuData object with the intra view and the cell type proportions as the first view. Then it additionally build a ‘juxta’ view for the spots that are neighbors of each other, and a ‘para’ view for all surrounding spots within a certain radius, or bandwidth.

In this case, we will use cell type compositions per spot as the intra view, and we will use the PROGENy pathway activities as the juxta and para views:

[16]:

misty = genericMistyData(intra=comps, extra=acts_progeny, cutoff=0.05, bandwidth=200, coord_type='generic', n_rings=1)

/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/mudata/_core/mudata.py:491: UserWarning: Cannot join columns with the same name because var_names are intersecting.

[17]:

misty

[17]:

MuData object with n_obs × n_vars = 4113 × 39
  obs:      'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
  3 modalities
    intra:  4113 x 11
      obs:  'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
      obsm: 'spatial'
    juxta:  4113 x 14
      obsm: 'spatial'
      layers:       'weighted'
      obsp: 'spatial_connectivities'
    para:   4113 x 14
      obsm: 'spatial'
      layers:       'weighted'
      obsp: 'spatial_connectivities'

Learn Relationships with MISTy

Now that we have constructed the object, let’s learn the relationships across the views.

[18]:

misty(model=RandomForestModel, n_jobs=-1, verbose = True)

Now learning: Vascular_SMCs: 100%|██████████| 11/11 [00:36<00:00,  3.32s/it]

Specifically, we will use the RandomForestModel to fit an individual random forrest model for each target in the intra view, using the juxta and para views as predictors.

MISTy returns two DataFrames: * target_metrics - the metrics that describe the target variables from the intra view, including R-squared across different views as well as the estimated contributions to the predictive performance of each view per target. * interactions - feature importances per view

if inplace is true (Default), these are appended to the MuData object.

Let’s check the variance explained when predicting each target variables in the intra view, with other variables (predictors) in the intra view itself. We can see that it explains itself relatively well (as expected).

[19]:

misty.uns['target_metrics'].head()

[19]:

	target	intra_R2	multi_R2	gain_R2	intra	juxta	para
0	Adipocytes	0.215315	0.279096	0.063781	0.502775	0.244016	0.253209
1	Cardiomyocytes	0.877470	0.892321	0.014851	0.780417	0.037872	0.181711
2	Endothelial	0.740765	0.740783	0.000018	0.955776	0.007447	0.036778
3	Fibroblasts	0.935028	0.935156	0.000128	0.966020	0.000000	0.033980
4	Lymphoid	0.108310	0.139882	0.031572	0.541302	0.142696	0.316001

[20]:

li.pl.target_metrics(misty, stat='intra_R2', return_fig=True)

[20]:

<Figure Size: (500 x 500)>

MISTy additionally calculate gain_R2, or in other words the performance gain when we additionally consider the other views (in addition to intra). When we look at the variance explained by the other views, we see that they explain a bit less (as expected), but still there is still some gain of predictive performance:

[21]:

li.pl.target_metrics(misty, stat='gain_R2')

[21]:

<Figure Size: (500 x 500)>

We can also check the contribution to the predictive performance of each view per target:

[22]:

li.pl.contributions(misty, return_fig=True)

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[22]:

<Figure Size: (500 x 500)>

Finally, using the information above we know which variables are best explained by our model, and we know which view explains them best. So, we can now also see what are the specific variables that explain each target best:

[23]:

# this information is stored here:
misty.uns['interactions'].head()

[23]:

	target	predictor	view	importances
0	Adipocytes	Cardiomyocytes	intra	0.063997
1	Adipocytes	Endothelial	intra	0.077043
2	Adipocytes	Fibroblasts	intra	0.054170
3	Adipocytes	Lymphoid	intra	0.048505
4	Adipocytes	Mast	intra	0.131783

[24]:

li.pl.interactions(misty, view='juxta', return_fig=True, figure_size=(7,5))

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[24]:

<Figure Size: (700 x 500)>

Linear Misty

We can also use a Linear model, while a bit more simplistic is much faster and more interpretable.

Moreover, we will bypass predicting the intraview with features within the intraview features (bypass_intra). This will allow us to see how well the other views explain the intraview, excluding the intraview itself.

[25]:

misty(model=LinearModel, k_cv=10, seed=1337, bypass_intra=True, verbose = True)

Now learning: Vascular_SMCs: 100%|██████████| 11/11 [00:03<00:00,  2.91it/s]

Let’s check the joined R-squared for views:

[26]:

li.pl.target_metrics(misty, stat='gain_R2', return_fig=True)

[26]:

<Figure Size: (500 x 500)>

and their contributions per target:

[27]:

li.pl.contributions(misty, return_fig=True)

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[27]:

<Figure Size: (500 x 500)>

Since this is a linear model, the coefficients would not be directly comparable (as are importances in a Random Forest). Thus, we use the coefficients’ t-values, as calculated by Ordinary Least Squares, which are signed and directly comparable.

Let’s explore the t-values for each target-prediction interaction:

[28]:

(
    li.pl.interactions(misty, view='juxta', return_fig=True, figure_size=(7,5)) +
    p9.scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
)

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[28]:

<Figure Size: (700 x 500)>

Feature importances

Regardless of the model, each target is predicted independently, and the interpretation of feature importances depends on the model used. By default, we use a random forest, so the feature importances are the mean decrease in Gini impurity of the features. On the other hand, when we use a linear model, the feature importances are the t-values of the model coefficients.

Build Custom Misty Views

As we previously mentioned, one can build any view structure that they deem relevant for their data. So, let’s explore how to build custom views. Here, we will just use two distinct prior knowledge sources to check which one achieves better predictive performance.

So, let’s also estimate Transcription Factor activities with decoupler:

[29]:

# get TF prior knowledge
net = dc.get_collectri()

[30]:

# Estimate activities
dc.run_ulm(
    mat=adata,
    net=net,
    verbose=True,
    use_raw=False,
)

Running ulm on mat with 4113 samples and 17703 targets for 694 sources.

100%|██████████| 1/1 [00:01<00:00,  1.45s/it]

[31]:

# extract activities
acts_tfs = li.ut.obsm_to_adata(adata, 'ulm_estimate')

In addition to the features, we also need to provide spatial weights for the spots. Here, we will use LIANA’s inbuilt radial kernel function to compute spatial weights based on the spatial coordinates of the spots. However, this can be replaced by any other spatial weights matrix, such as those calculated via squidpy.gr.spatial_neighbors.

[32]:

# Calculate spatial neighbors
li.ut.spatial_neighbors(acts_tfs, cutoff=0.1, bandwidth=200, set_diag=False)

Visualize the weights for a specific spot:

[33]:

li.pl.connectivity(acts_tfs, idx=0, figure_size=(6,5))

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[33]:

<Figure Size: (600 x 500)>

[34]:

# transfer spatial information to progeny activities
# NOTE: spatial connectivities can differ between views, but in this case we will use the same
acts_progeny.obsm['spatial'] = acts_tfs.obsm['spatial']
acts_progeny.obsp['spatial_connectivities'] = acts_tfs.obsp['spatial_connectivities']

Build an object with custom views:

[35]:

misty = MistyData(data={"intra": comps, "TFs": acts_tfs, "Pathways": acts_progeny})

[36]:

misty

[36]:

MuData object with n_obs × n_vars = 4113 × 719
  obs:      'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
  3 modalities
    intra:  4113 x 11
      obs:  'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
      uns:  'spatial', 'log1p', 'celltype_niche_colors'
      obsm: 'compositions', 'mt', 'spatial'
    TFs:    4113 x 694
      obs:  'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
      uns:  'spatial', 'log1p', 'celltype_niche_colors'
      obsm: 'compositions', 'mt', 'spatial', 'mlm_estimate', 'mlm_pvals', 'ulm_estimate', 'ulm_pvals'
      layers:       'weighted'
      obsp: 'spatial_connectivities'
    Pathways:       4113 x 14
      obs:  'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
      uns:  'spatial', 'log1p', 'celltype_niche_colors'
      obsm: 'compositions', 'mt', 'spatial', 'mlm_estimate', 'mlm_pvals'
      layers:       'weighted'
      obsp: 'spatial_connectivities'

Run Misty as before:

[37]:

misty(model=LinearModel, verbose=True, bypass_intra=True)

Now learning: Vascular_SMCs: 100%|██████████| 11/11 [00:36<00:00,  3.31s/it]

We can see that Cardiomyocytes and Fibroblasts are relatively well explained by TFs & Pathways.

[38]:

li.pl.target_metrics(misty, stat='gain_R2')

[38]:

<Figure Size: (500 x 500)>

We also see that the two views explain the targets similarly well.

[39]:

li.pl.contributions(misty, return_fig=True)

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[39]:

<Figure Size: (500 x 500)>

Plot cell type x Trascription factor interactions

[40]:

(
    li.pl.interactions(misty, view='TFs', top_n=20) +
    p9.labs(x='Transcription Factor', y='Cell type') +
    p9.theme_bw(base_size=14) +
    p9.theme(axis_text_x=p9.element_text(rotation=90, size=13)) +
    # change to blue-red
    p9.scale_fill_gradient2(low='blue', mid='white', high='red')
)

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[40]:

<Figure Size: (640 x 480)>

Ligand-Receptor Misty

Finally, we provide a utility function that builds an object with receptors in the intra view and ligands in the para view (or in their surrounding).

For the sake of computational speed, let’s identify the highly variable genes

[41]:

sc.pp.highly_variable_genes(adata)
hvg = adata.var[adata.var['highly_variable']].index

Build LR Misty object:

[42]:

misty = lrMistyData(adata[:, hvg], bandwidth=200, set_diag=False, cutoff=0.01, nz_threshold=0.1)

/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/mudata/_core/mudata.py:491: UserWarning: Cannot join columns with the same name because var_names are intersecting.

[43]:

misty(bypass_intra=True, model=LinearModel, verbose=True)

Now learning: CHL1:  25%|██▍       | 22/89 [00:06<00:19,  3.39it/s]      /home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/method/sp/_misty/_Misty.py:317: RuntimeWarning: invalid value encountered in divide
Now learning: NPR3:  46%|████▌     | 41/89 [00:12<00:14,  3.27it/s]       /home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/method/sp/_misty/_Misty.py:317: RuntimeWarning: invalid value encountered in divide
Now learning: CLEC12A:  92%|█████████▏| 82/89 [00:25<00:02,  3.12it/s]  /home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/method/sp/_misty/_Misty.py:317: RuntimeWarning: invalid value encountered in divide
Now learning: CSF3R: 100%|██████████| 89/89 [00:27<00:00,  3.25it/s]

Let’s now explore the top interactions between the ligands and receptors:

[44]:

(
    li.pl.interactions(misty, view='extra', return_fig=True, figure_size=(6, 5), top_n=25, key=abs) +
    p9.scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
    p9.labs(y='Receptor', x='Ligand')
)

Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[44]:

<Figure Size: (600 x 500)>

In contrast to any other other functions in LIANA, misty will infer all possible interactions between ligands and receptors - i.e. not only those that were annotated specifically as ligand-receptor interactions.

While this can be seen as a limitation, it can also be seen as an advantage of MISTy, as it allows us to explore potential ligand-receptor interactions that were not previously annotated!

While MISTy provides a flexible framework for the inference of spatially-informed interactions, it only summarizes relationships between the variables on the level of the whole slide (or niche); one should thus consider LIANA+’s bivariate local functions. These are simple spatially-informed spatially-informed metrics calculated at the spot-/cell-level. Thus, one can use them to visualize and explore the local distribution of interactions with spatial context.

Citing MISTy:

If you use MISTy via LIANA+, please cite MISTy’s original publication (Tanevski et al., 2022)

[ ]: