# Spatially-informed Bivariate Metrics

This tutorial provides an overview of the local scores implemented in LIANA+. These scores are used to identify spatially co-expressed ligand-receptor pairs. However, there also applicable to other types of spatially-informed bivariate analyses.

It provides brief explanations of the mathematical formulations of the scores; these include adaptation of bivariate Moran’s R, Pearson correlation, Spearman correlation, weighted Jaccard similarity, and Cosine similarity. The tutorial also showcases interaction categories (masks) and significance testing.

## Environement Setup

```
[1]:
```

```
import pandas as pd
import scanpy as sc
import decoupler as dc
import liana as li
from mudata import MuData
```

## Load and Normalize Data

To showcase LIANA’s local functions, we will use an ischemic 10X Visium spatial slide from Kuppe et al., 2022. It is a tissue sample obtained from a patient with myocardial infarction, focusing on the ischemic zone of the heart tissue.

The slide provides spatially-resolved information about the cellular composition and gene expression patterns within the tissue.

```
[2]:
```

```
adata = sc.read("kuppe_heart19.h5ad", backup_url='https://figshare.com/ndownloader/files/41501073?private_link=4744950f8768d5c8f68c')
adata.layers['counts'] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
```

```
[3]:
```

```
adata.obs.head()
```

```
[3]:
```

in_tissue | array_row | array_col | sample | n_genes_by_counts | log1p_n_genes_by_counts | total_counts | log1p_total_counts | pct_counts_in_top_50_genes | pct_counts_in_top_100_genes | pct_counts_in_top_200_genes | pct_counts_in_top_500_genes | mt_frac | celltype_niche | molecular_niche | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

AAACAAGTATCTCCCA-1 | 1 | 50 | 102 | Visium_19_CK297 | 3125 | 8.047510 | 7194.0 | 8.881142 | 24.770642 | 31.387267 | 39.797053 | 54.503753 | 0.085630 | ctniche_1 | molniche_9 |

AAACAATCTACTAGCA-1 | 1 | 3 | 43 | Visium_19_CK297 | 3656 | 8.204398 | 10674.0 | 9.275660 | 35.956530 | 42.167885 | 49.456624 | 61.045531 | 0.033275 | ctniche_5 | molniche_3 |

AAACACCAATAACTGC-1 | 1 | 59 | 19 | Visium_19_CK297 | 3013 | 8.011023 | 7339.0 | 8.901094 | 33.247036 | 39.910069 | 47.227143 | 59.326884 | 0.029139 | ctniche_5 | molniche_3 |

AAACAGAGCGACTCCT-1 | 1 | 14 | 94 | Visium_19_CK297 | 4774 | 8.471149 | 14235.0 | 9.563529 | 22.739726 | 29.884089 | 37.850369 | 51.099403 | 0.149194 | ctniche_7 | molniche_2 |

AAACAGCTTTCAGAAG-1 | 1 | 43 | 9 | Visium_19_CK297 | 2734 | 7.913887 | 6920.0 | 8.842316 | 35.664740 | 42.268786 | 50.000000 | 62.384393 | 0.025601 | ctniche_5 | molniche_3 |

Spot clusters

```
[4]:
```

```
sc.pl.spatial(adata, color=[None, 'celltype_niche'], size=1.3, palette='Set1')
```

## Background

Here, we will demonstrate how to use the spatially-informed bivariate metrics to assess the spatial relationship between two variables. Specifically, we focus on local bivariate similarity metrics. In contrast to other spatial Methods, including Misty; focusing on local spatial relationships enables us to pinpoint the exact location of spatial relationships, and to identify spatial relationships that might occur only in a specific sub-region of our samples.

Following the initial concept of LIANA, and inspired by scHOT, we have natively re-implemented **6** local bivariate metrics, including scHOT’s (default) masked Spearman & SpatialDM’s local Moran’s R.

As part of the LIANA+ manuscript, we performed two distinct tasks to evaluate the ability of these metrics to preserve biological information, and saw that on average when used to identify local ligand-receptor relationships, spatially-weighted Cosine similarity did best. Thus, we will focus on it throughout this tutorial. However, we expect that other scoring functions might be better suited for other tasks, e.g. Spatially-weighted Jaccard Similarity should be well suited for categorical data; thus we encourage you to explore them.

### Available Local Functions

```
[5]:
```

```
li.mt.bivariate.show_functions()
```

```
[5]:
```

name | metadata | reference | |
---|---|---|---|

0 | pearson | weighted Pearson correlation coefficient | None |

1 | spearman | weighted Spearman correlation coefficient | None |

2 | cosine | weighted Cosine similarity | None |

3 | jaccard | weighted Jaccard similarity | None |

4 | product | simple weighted product | If vars are z-scaled = Lee's static (Lee 2021;... |

5 | norm_product | normalized weighted product | None |

6 | morans | Moran's R | Li, Z., Wang, T., Liu, P. and Huang, Y., 2022.... |

7 | masked_spearman | masked & weighted Spearman correlation | Ghazanfar, S., Lin, Y., Su, X., Lin, D.M., Pat... |

### How do they work?

The local functions work are quite simple, as they are simply weighted versions of well-known similarity metrics. For example, the spatially-weighted version of Cosine similarity is defined as:

where for each spot **i**, we perform summation over all spots **n**, where **w** represents the spatial connectivity weights from spot **i** to every other spot **j**; for variables **x** and **y**.

### Spatial Connectivity

The way that spatially-informed methods usually work is by making use of weights based on the proximity (or spatial connectivity) between spots/cells. These spatial connectivities are then used to calculate the metric of interest, e.g. Cosine similarity, in a spatially-informed manner.

The spatial weights in LIANA+ are by default defined as a family of radial kernels that use the inverse Euclidean distance between cells/spots to bind the weights between 0 and 1, with spots that are closest having the highest spatial connectivity to one another (1), while those that are thought to be too far to be in contact are assigned 0.

Key parameters of spatial_neighbors include: - `bandwidth`

controls the radius of the spatial connectivities where higher values will result in a broader area being considered (controls the radius relative to the coordinates stored in `adata.obsm['spatial']`

) - `cutoff`

controls the minimum value that will be considered to have a spatial relationship (anything lower than the `cutoff`

is set to 0). - `kernel`

controls the distribution (shape) of the weights (‘gaussian’ by default) -
`set_diag`

sets the diagonal (i.e. the weight for each spot to itself) to 1 if True. **NOTE**: Here we set it to True as we expect many cells to be neighbors of themselves within a visium spot

As choosing an optimal bandwidith can be tricky, we provide the `query_bandwidth`

function which uses a set of coordinates to provide an estimate of how many cell or spot neighbors are being considered for each spot over a range of bandwidths.

```
[6]:
```

```
plot, _ = li.ut.query_bandwidth(coordinates=adata.obsm['spatial'], start=0, end=500, interval_n=20)
plot
```

```
[6]:
```

```
<Figure Size: (640 x 480)>
```

Here, we can see that a bandwidth of 150-200 (pixels) roughly includes 6 neighbours i.e. the first ring of neighbours in the hexagonal grid of 10x Visium. So, we will build the spatial graph with a bandwidth of 200.

```
[7]:
```

```
li.ut.spatial_neighbors(adata, bandwidth=200, cutoff=0.1, kernel='gaussian', set_diag=True)
```

Let’s visualize the spatial weights for a single spot to all other spots in the dataset:

```
[8]:
```

```
li.pl.connectivity(adata, idx=0, size=1.3, figure_size=(6, 5))
```

```
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt
```

```
[8]:
```

```
<Figure Size: (600 x 500)>
```

LIANA’s connectivities are flexible and can be defined in any way that fits the user. We have thus aligned LIANA’s `spatial_neighbors`

function to Squidpy’s `spatial_neighbors`

function. A perfectly viable solution would be to use Squidpy’s nearest neighbors graph, which one can use to easily replace LIANA’s radial kernel connecitivies.

## Bivariate Ligand-Receptor Relationships

Now that we have covered the basics, let’s see how these scores look for potential ligand-receptor interactions on our 10X Visium Slide. Note that LIANA+ will take the presence of heteromeric complexes into account at the individual spot-level!

```
[9]:
```

```
li.mt.bivariate(adata,
resource_name='consensus', # NOTE: uses HUMAN gene symbols!
local_name='cosine', # Name of the function
global_name="morans", # Name global function
n_perms=100, # Number of permutations to calculate a p-value
mask_negatives=False, # Whether to mask LowLow/NegativeNegative interactions
add_categories=True, # Whether to add local categories to the results
nz_prop=0.2, # Minimum expr. proportion for ligands/receptors and their subunits
use_raw=False,
verbose=True
)
```

```
Using `.X`!
Using resource `consensus`.
100%|██████████| 100/100 [01:26<00:00, 1.15it/s]
100%|██████████| 100/100 [00:45<00:00, 2.20it/s]
```

Now that this is done, we can extract and explore the newly-created AnnData object that counts our local scores

```
[10]:
```

```
adata
```

```
[10]:
```

```
AnnData object with n_obs × n_vars = 4113 × 17703
obs: 'in_tissue', 'array_row', 'array_col', 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'mt_frac', 'celltype_niche', 'molecular_niche'
var: 'gene_ids', 'feature_types', 'genome', 'SYMBOL', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'mt', 'rps', 'mrp', 'rpl', 'duplicated'
uns: 'spatial', 'log1p', 'celltype_niche_colors'
obsm: 'compositions', 'mt', 'spatial', 'local_scores'
layers: 'counts'
obsp: 'spatial_connectivities'
```

```
[ ]:
```

```
```

```
[11]:
```

```
lrdata = adata.obsm['local_scores']
```

In addition to the local bivariate scores, we can also get the “global” scores for each pair of variables, which we can choose the best pairs of variables to visualize:

```
[12]:
```

```
lrdata.var.sort_values("mean", ascending=False).head(3)
```

```
[12]:
```

ligand | receptor | ligand_means | ligand_props | receptor_means | receptor_props | morans | morans_pvals | mean | std | |
---|---|---|---|---|---|---|---|---|---|---|

interaction | ||||||||||

TIMP1^CD63 | TIMP1 | CD63 | 3.048890 | 0.996839 | 2.887192 | 0.994165 | 0.033729 | 0.0 | 0.981958 | 0.013757 |

LGALS1^ITGB1 | LGALS1 | ITGB1 | 3.534027 | 0.997326 | 2.457884 | 0.988330 | 0.149732 | 0.0 | 0.978951 | 0.027114 |

LUM^ITGB1 | LUM | ITGB1 | 3.173511 | 0.996353 | 2.457884 | 0.988330 | -0.103130 | 0.0 | 0.963635 | 0.026913 |

```
[13]:
```

```
lrdata.var.sort_values("std", ascending=False).head(3)
```

```
[13]:
```

ligand | receptor | ligand_means | ligand_props | receptor_means | receptor_props | morans | morans_pvals | mean | std | |
---|---|---|---|---|---|---|---|---|---|---|

interaction | ||||||||||

S100A1^RYR2 | S100A1 | RYR2 | 0.697087 | 0.572332 | 0.430792 | 0.406516 | 0.271809 | 0.0 | 0.489952 | 0.259712 |

VTN^ITGAV_ITGB5 | VTN | ITGAV_ITGB5 | 0.434514 | 0.395089 | 1.108789 | 0.814977 | 0.161359 | 0.0 | 0.545211 | 0.234249 |

LPL^VLDLR | LPL | VLDLR | 0.791770 | 0.609531 | 0.612294 | 0.540725 | 0.220095 | 0.0 | 0.572516 | 0.231662 |

We can also use Global bivariate Moran’s R (or Lee’s statistic) - an extension of univariate Moran’s I, as proposed by Anselin 2019 and Lee and Li, 2019; implemented initially in [SEAGAL] SpatialDM.

A useful property of Bivariate Moran’s R is that it is bound between -1 (perfect negative spatial co-expression) and 1 (perfect positive spatial co-expression). Thus, it can be used to identify pairs of variables that are spatially clustered together, or apart.

```
[14]:
```

```
lrdata.var.sort_values("morans", ascending=False).head()
```

```
[14]:
```

ligand | receptor | ligand_means | ligand_props | receptor_means | receptor_props | morans | morans_pvals | mean | std | |
---|---|---|---|---|---|---|---|---|---|---|

interaction | ||||||||||

APOE^LRP1 | APOE | LRP1 | 1.179983 | 0.752006 | 1.937682 | 0.956236 | 0.284863 | 0.0 | 0.802385 | 0.142088 |

S100A1^RYR2 | S100A1 | RYR2 | 0.697087 | 0.572332 | 0.430792 | 0.406516 | 0.271809 | 0.0 | 0.489952 | 0.259712 |

SERPING1^LRP1 | SERPING1 | LRP1 | 1.711175 | 0.931437 | 1.937682 | 0.956236 | 0.243950 | 0.0 | 0.906663 | 0.077607 |

VTN^ITGAV_ITGB1 | VTN | ITGAV_ITGB1 | 0.434514 | 0.395089 | 1.713479 | 0.929006 | 0.235151 | 0.0 | 0.592883 | 0.231178 |

MDK^LRP1 | MDK | LRP1 | 1.127114 | 0.789691 | 1.937682 | 0.956236 | 0.227208 | 0.0 | 0.829393 | 0.118597 |

From these Global summaries, we see that the average Cosine similarity largely represents **coverage** - e.g. *TIMP1 & CD63* is ubiquoutesly and uniformly distributed across the slide.

On the other hand, among most variable interactions and with with the highest global morans R is e.g. **VTN&ITGAV_ITGB5**. This interaction is thus more likely to represent biological relationships, with distinct spatial clustering patterns.

So, let’s visualize both:

```
[15]:
```

```
sc.pl.spatial(lrdata, color=['VTN^ITGAV_ITGB5', 'TIMP1^CD63'], size=1.4, vmax=1, cmap='magma')
```

```
... storing 'ligand' as categorical
... storing 'receptor' as categorical
```

As expected, we see that the **TIMP1 & CD63** interaction is uniformly distributed across the slide, while **VTN&ITGAV_ITGB5** shows a clear spatial pattern.

We can also see that this is the case when we look at the individual genes:

```
[16]:
```

```
sc.pl.spatial(adata, color=['VTN', 'ITGAV', 'ITGB5',
'TIMP1', 'CD63'],
size=1.4, ncols=2)
```

While Moran’s R provides a sound summary of spatial clustering, it is limited to two variables at a time and is thus not fit for complex, or non-linear, spatial relationships between variables. Thus, we also encourage the user to explore LIANA+’s re-implementation of MISTy.

### Permutation-based p-values

In addition to the local scores, we also calculated permutation-based p-values based on a null distribution generated by shuffling the spot labels. Let’s see how these look for the two interactions from above:

```
[17]:
```

```
sc.pl.spatial(lrdata, layer='pvals', color=['VTN^ITGAV_ITGB5', 'TIMP1^CD63'], size=1.4, cmap="magma_r")
```

These largely agree with what we saw above for *VTN&ITGAV_ITGB1* as appears to be specific to a certain region.

### Local Categories

Did you notice that we used `mask_negatives`

as a parameter when first estimating the interaction? This essentially means that we mask interactions in which both members are negative (or lowly expressed) when calculating the p-values, i.e. such which occur at places in which both members of the interaction are highly expressed. The locations at which both members are highly- expressed is defined as follows:

For each interaction, we define the category of both **x** and **y** for each spot as follows:

$$ Category_{i}(x) =

$$

Then we combine the categories of **x** and **y** for each spot, such that high-high are positive (1), high-low (or low-high) are -1; and low-low are 0. When working with non-negative values (i.e. gene expression); the features will be z-scaled (across observations).

```
[18]:
```

```
sc.pl.spatial(lrdata, layer='cats', color=['VTN^ITGAV_ITGB5', 'TIMP1^CD63'], size=1.4, cmap="coolwarm")
```

Here, we can distinguish areas in which the interaction between interaction members is positive (high-high) in Red (1), while interactions in which one is high the other is low or negative (High-low) are in Blue (-1). We also see that some interactions are neither, these are predominantly interactions in which both members are lowly-expressed (i.e. low-low); we see those in white (0).

When set to `mask_negatives=False`

, we also return interactions that are not between necessarily positive/high magnitude genes ; when set to `mask_negatives=True`

, we mask interactions that are negative (low-low) or uncategorized; for both the p-values and the local scores.

## Identify Intercellular Patterns

Now that we have estimated ligand-receptor scores, we can use non-negative matrix factorization (NMF) to identify coordinated cell-cell communication signatures. This would ultimately decompose the ligand-receptor scores into a basis matrix (`W`

) and a coefficient matrix (`H`

). We will use a very simple utility function (around sklearn’s NMF implementation) to do so, along with a simple `k`

(component number) selection procedure.

Basis Matrix (

`W`

): Each basis vector represents a characteristic pattern of ligand-receptor expression in the dataset. The values in`W`

(factor score) indicate the strengths of factor in each spot; high values indicate high influence by the associated communication signature, while low values mean a weak influence.Coefficient Matrix (

`H`

): Each row of`H`

represents the participation of the corresponding sample in the identified factor. The elements of each basis vector indicate the contribution of different interactions to the pattern (factor).

By decomposing the ligand-receptor interactions into W and H, NMF can potentially identify underlying CCC processes, with additive and non-negative relationships between the features. This property aligns well with the biological intuition that genes work together in a coordinated manner, but assumes linearity and only captures additive effects. Thus, alternative decomposition or clustering approaches can be used to a similar end.

In this particular scenario, we chose NMF as it is a well-established method for decomposition of non-negative matrices, it is fast, and it is easy to interpret. Also, in our case the ligand-receptor local scores already encode the spatial relationships between the features, so we don’t necessarily need to use a spatial-aware decomposition methods (e.g. SpatialDE, Spatial NMF, Chrysalis, or MEFISTO).

One limitation of NMF is that it requires the number of components (factors) to be specified - a somewhat an arbitrary choice. To aid the selection of `n_components`

, we provide a simple function that estimates an elbow based on reconstruction error. Another limitation is that it only accepts non-negative values, so it won’t work with metrics that can be negative (e.g. Pearson correlation). In this case, we use Cosine similarity with non-negative values, which results also in non-negative
local scores.

```
[19]:
```

```
li.multi.nmf(lrdata, n_components=None, inplace=True, random_state=0, max_iter=200, verbose=True)
```

```
0%| | 0/10 [00:00<?, ?it/s]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
10%|█ | 1/10 [00:02<00:22, 2.51s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
20%|██ | 2/10 [00:07<00:29, 3.73s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
30%|███ | 3/10 [00:12<00:30, 4.29s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
40%|████ | 4/10 [00:17<00:28, 4.71s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
50%|█████ | 5/10 [00:23<00:25, 5.17s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
60%|██████ | 6/10 [00:30<00:23, 5.81s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
70%|███████ | 7/10 [00:38<00:19, 6.39s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
80%|████████ | 8/10 [00:46<00:14, 7.02s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
90%|█████████ | 9/10 [00:55<00:07, 7.65s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
100%|██████████| 10/10 [01:05<00:00, 6.55s/it]
Estimated rank: 3
/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py:1710: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence.
```

```
[20]:
```

```
# Extract the variable loadings
lr_loadings = li.ut.get_variable_loadings(lrdata, varm_key='NMF_H').set_index('index')
```

```
[21]:
```

```
# Extract the factor scores
factor_scores = li.ut.get_factor_scores(lrdata, obsm_key='NMF_W')
```

Convert NMF Factor scores to an AnnData object for plotting

```
[22]:
```

```
nmf = sc.AnnData(X=lrdata.obsm['NMF_W'],
obs=lrdata.obs,
var=pd.DataFrame(index=lr_loadings.columns),
uns=lrdata.uns,
obsm=lrdata.obsm)
```

```
[23]:
```

```
sc.pl.spatial(nmf, color=[*nmf.var.index, None], size=1.4, ncols=2)
```

Wee see that Factor 2 is largely covering the ischemic areas of the side, let’s check the interactions contributing the most to it:

```
[24]:
```

```
lr_loadings.sort_values("Factor2", ascending=False).head(10)
```

```
[24]:
```

Factor1 | Factor2 | Factor3 | |
---|---|---|---|

index | |||

APOE^LRP1 | 1.301646 | 1.277514 | 0.314726 |

MDK^LRP1 | 1.424609 | 1.208298 | 0.373209 |

C3^LRP1 | 0.861049 | 1.197818 | 0.176564 |

SERPING1^LRP1 | 1.533601 | 1.184908 | 0.552800 |

VCAN^ITGB1 | 1.528520 | 1.164367 | 0.655848 |

C3^CD81 | 0.742810 | 1.164207 | 0.228812 |

C1QB^LRP1 | 1.506631 | 1.162397 | 0.559764 |

PSAP^LRP1 | 1.571949 | 1.156683 | 0.680581 |

FBN1^ITGB1 | 1.530816 | 1.147669 | 0.701900 |

S100A9^CD68 | 1.294533 | 1.146112 | 0.376383 |

From here on, one can estimate pathway activities or look for downstream effects of these interactions, plot the interactions, etc. These utility functions in liana are covered in other tutorials, but also applicable here.

## Beyond Ligand-Receptors

While protein-mediated ligand-receptor interactions are interesting, cell-cell communication is not limited to those alone. Rather it is a complex process that involves a variety of different mechanisms such as signalling pathways, metabolite-mediated signalling, and distinct cell types.

So, if such diverse mechanisms are involved in cell-cell communication, why should we limit ourselves to ligand-receptor interactions? Let’s see how we can use LIANA+ to explore other types of cell-cell communication.

One simple approach would be to check relationships e.g. between transcription factors and cell type proportions.

### Extract Cell type Composition

This slide comes with estimated cell type proportions using cell2location; See Kuppe et al., 2022. Let’s extract from .obsm them to an independent AnnData object.

```
[25]:
```

```
# let's extract those
comps = li.ut.obsm_to_adata(adata, 'compositions')
# check key cell types
sc.pl.spatial(comps, color=['vSMCs','CM', 'Endo', 'Fib'], size=1.3, ncols=2)
```

### Estimate Transcription Factor Activity

```
[26]:
```

```
# Get transcription factor resource
net = dc.get_collectri()
```

```
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
```

While multi-omics datasets might be even more of an interest, for the sake of simplicity (and because the general lack of spatial mutli-omics data at current times), let’s instead use enrichment analysis to estimate the activity of transcription factors in each spot. We will use one of decoupler-py’s enrichment methods with CollectTRI to do so. Refer to this tutorial for more info.

```
[27]:
```

```
# run enrichment
dc.run_ulm(adata, net=net, use_raw=False, verbose=True)
```

```
Running ulm on mat with 4113 samples and 17703 targets for 694 sources.
```

```
100%|██████████| 1/1 [00:01<00:00, 1.79s/it]
```

#### Extract highly-variable TF activities

To reduce the number of TFs for the sake of computational speed, we will only focus on the top 50 most variable TFs.

Note we will use the simple coefficient of variation to identify the most variable TFs, but one can also use more sophisticated or spatially-informed methods to extract those (light-weight suggestions are welcome).

```
[28]:
```

```
est = li.ut.obsm_to_adata(adata, 'ulm_estimate')
est.var['cv'] = est.X.std(axis=0) / est.X.mean(axis=0)
top_tfs = est.var.sort_values('cv', ascending=False, key=abs).head(50).index
```

Create MuData object with TF activities and cell type proportions, and transfer spatial connectivities and other information from the original AnnData object.

```
[29]:
```

```
mdata = MuData({"tf":est, "comps":comps})
mdata.obsp = adata.obsp
mdata.uns = adata.uns
mdata.obsm = adata.obsm
```

Define Interactions of interest:

```
[30]:
```

```
from itertools import product
```

```
[31]:
```

```
interactions = list(product(comps.var.index, top_tfs))
```

### Estimate Cosine Similarity

```
[32]:
```

```
li.mt.bivariate(mdata,
x_mod="comps",
y_mod="tf",
x_transform=sc.pp.scale,
y_transform=sc.pp.scale,
local_name="cosine",
interactions=interactions,
mask_negatives=True,
add_categories=True,
x_use_raw=False,
y_use_raw=False,
xy_sep="<->",
x_name='celltype',
y_name='tf'
)
```

To make the distributions comparable, we simply z-scale the TF activities and cell type proportions via the `x_transform`

& `y_transform`

parameters.

The type of transformation will affect the interpretation of the results, and different types of transformation might be more appropriate for different types of data. We provide zero-inflated minmax `zi_minmax`

& `neg_to_zero`

transformation functions via `li.fun.transform`

.

One can explore how different transformations affect the results, to also get a better feeling how these local metrics work.

```
[33]:
```

```
mdata.mod['local_scores'].var.sort_values("mean", ascending=False).head(5)
```

```
[33]:
```

celltype | tf | celltype_means | celltype_props | tf_means | tf_props | mean | std | |
---|---|---|---|---|---|---|---|---|

interaction | ||||||||

CM<->TBX20 | CM | TBX20 | -4.452886e-09 | 1.0 | -2.968591e-09 | 1.0 | 0.619449 | 0.161595 |

Fib<->SNAI2 | Fib | SNAI2 | -1.804437e-09 | 1.0 | -1.382432e-09 | 1.0 | 0.563575 | 0.194297 |

Fib<->SOX6 | Fib | SOX6 | -1.804437e-09 | 1.0 | 1.461012e-08 | 1.0 | 0.505218 | 0.246272 |

Myeloid<->SNAI2 | Myeloid | SNAI2 | -4.758476e-09 | 1.0 | -1.382432e-09 | 1.0 | 0.459984 | 0.220001 |

CM<->HAND1 | CM | HAND1 | -4.452886e-09 | 1.0 | -1.059379e-08 | 1.0 | 0.456131 | 0.220627 |

#### Let’s plot the results

```
[34]:
```

```
sc.pl.spatial(mdata.mod['local_scores'], color=['Myeloid<->SNAI2', 'CM<->HAND1'], size=1.4, cmap="coolwarm", vmax=1, vmin=-1)
```

```
... storing 'celltype' as categorical
... storing 'tf' as categorical
```

Plot categories

```
[35]:
```

```
sc.pl.spatial(mdata.mod['local_scores'], layer='cats', color=['Myeloid<->SNAI2', 'CM<->HAND1'], cmap='coolwarm')
```

Plot variables (without transformations)

```
[36]:
```

```
sc.pl.spatial(mdata.mod['tf'], color=['SNAI2', 'HAND1'], cmap='coolwarm', size=1.4, vcenter=0)
```

```
[37]:
```

```
sc.pl.spatial(mdata.mod['comps'], color=['Myeloid', 'CM'], cmap='viridis', size=1.4)
```

```
[ ]:
```

```
```