Multi-Modal Ligand-Receptor Inference¶
[1]:
import numpy as np
import pandas as pd
import scanpy as sc
import liana as li
import mudata as mu
from matplotlib import pyplot as plt
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Infer Ligand-Receptor Interactions between RNA and Proteins¶
Download Processed CITE-seq Data¶
[2]:
prot = sc.read('citeseq_prot.h5ad', backup_url='https://figshare.com/ndownloader/files/47625196')
rna = sc.read('citeseq_rna.h5ad', backup_url='https://figshare.com/ndownloader/files/47625193')
Load Processed CITE-Seq Data¶
Here, we will use a very simple dataset to demonstrate the multi-modal ligand-receptor inference. We used a CITE-Seq dataset from 10X and followed the muon CITE-seq tutorial to process the RNA and Protein data.
Some minor differences are notable in the clustering due to the different versions of the packages used in the tutorial and the LIANA+ environment.
[3]:
mdata = mu.MuData({'rna': rna, 'prot': prot})
# make sure that cell type is accessible
mdata.obs['celltype'] = mdata.mod['rna'].obs['celltype'].astype('category')
# inspect the object
mdata
[3]:
MuData object with n_obs × n_vars = 3885 × 17838 obs: 'celltype' var: 'gene_ids', 'feature_types', 'genome' 2 modalities rna: 3885 x 17806 obs: 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'celltype' var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm' uns: 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap' obsm: 'X_pca', 'X_umap' obsp: 'connectivities', 'distances' prot: 3885 x 32 var: 'gene_ids', 'feature_types', 'genome' uns: 'neighbors', 'pca', 'umap' obsm: 'X_pca', 'X_umap' varm: 'PCs' layers: 'counts' obsp: 'connectivities', 'distances'
Infer Interactions¶
We see that we have two modalities, once for RNA and one for Proteins. We will next infer the ligand-receptor interactions between these two modalities.
CITE-seq data often focuses on antibody tagging of surface proteins, primarily receptors. To ensure only the protein modality is used for receptors, we append 'AB:'
to receptor names in the antibody data. This step is necessary only when both RNA and antibody data have matching feature names.
[4]:
# Obtain a ligand-receptor resource of interest
resource = li.rs.select_resource(resource_name='consensus')
# Append AB: to the receptor names
resource['receptor'] = 'AB:' + resource['receptor']
# Append AB: to the protein modality
mdata.mod['prot'].var_names = 'AB:' + mdata.mod['prot'].var['gene_ids']
While running LIANA+ with multimodal is largely analogous to when working with uni-modal data, there are a couple of things to keep in mind. Here, we need to ensure that the correct data is passed from each modality as well as to ensure that the correct modalities are used. Moreover, we need to ensure that data from the different modalities is comparable, which often requires the transformation of the data.
In this case, we use zero-inflated min-max normalization to ensure that the data from the two modalities is comparable. Essentially, a min-max normalization in which any value bellow 0.5 (by default) following normalization is set to 0. This normalization was originally introduced by the CiteFuse method, which is a ligand-receptor method for CITE-seq data.
[5]:
li.mt.rank_aggregate(adata=mdata,
groupby='celltype',
# pass our modified resource
resource=resource,
# NOTE: Essential arguments when handling multimodal data
mdata_kwargs={
# Ligand-Receptor pairs are directed so we need to correctly pass
# `RNA` with ligands as `x_mod` and receptors as `y_mod`
'x_mod': 'rna',
'y_mod': 'prot',
# We use .X from the x_mod
'x_use_raw':False,
# We use .X from the y_mod
'y_use_raw':False,
# NOTE: we need to ensure that the modalities are correctly transformed
'x_transform':li.ut.zi_minmax,
'y_transform':li.ut.zi_minmax,
},
verbose=True
)
Using provided `resource`.
Using `.X`!
Transforming rna using zi_minmax
Using `.X`!
Converting to sparse csr matrix!
Using `.X`!
46 features of mat are empty, they will be removed.
Transforming prot using zi_minmax
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
0.69 of entities in the resource are missing from the data.
Generating ligand-receptor stats for 3885 samples and 37 features
Assuming that counts were `natural` log-normalized!
Running CellPhoneDB
100%|██████████| 1000/1000 [00:04<00:00, 221.32it/s]
Running Connectome
Running log2FC
Running NATMI
Running SingleCellSignalR
[6]:
mdata.uns['liana_res'].head()
[6]:
source | target | ligand_complex | receptor_complex | lr_means | cellphone_pvals | expr_prod | scaled_weight | lr_logfc | spec_weight | lrscore | specificity_rank | magnitude_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
335 | pre-B | CD4+ memory T | HLA-DRA | AB:CD4 | 0.775442 | 0.0 | 0.600403 | 1.294984 | 0.643030 | 0.054068 | 0.968058 | 0.003006 | 0.000006 |
305 | mature B | CD4+ memory T | HLA-DRA | AB:CD4 | 0.772048 | 0.0 | 0.594934 | 1.284592 | 0.640910 | 0.053575 | 0.967917 | 0.003511 | 0.000022 |
450 | pre-B | CD4+ naïve T | HLA-DRA | AB:CD4 | 0.756579 | 0.0 | 0.572285 | 1.246018 | 0.640179 | 0.051536 | 0.967308 | 0.003890 | 0.000050 |
420 | mature B | CD4+ naïve T | HLA-DRA | AB:CD4 | 0.753185 | 0.0 | 0.567072 | 1.235627 | 0.638059 | 0.051066 | 0.967163 | 0.003890 | 0.000089 |
324 | pDC | CD4+ memory T | HLA-DRA | AB:CD4 | 0.751015 | 0.0 | 0.561047 | 1.220208 | 0.594210 | 0.050524 | 0.966993 | 0.003890 | 0.000138 |
Note that feature-wise min-max binds all features (i.e. genes and proteins) to the same limits, which essentially neglects the biological differences between features. This is typically not the case when working with untransformed data, as in that case we mostly care about cells being comparable, while features are typically with variable limits / distributions. As such, there might be some subtle differences from the interpretation of LIANA+ results, depending on the transformation of choice.
A benchmark of different normalizations for CCC are pending and alternative normalizations should also be explored. While other normalizations and subsequent transformations can also be used, the single-cell methods in LIANA+ require the data to be non-negative.
Plot Results & More¶
[7]:
li.pl.dotplot(adata = mdata,
colour='lr_means',
size='specificity_rank',
inverse_size=True, # we inverse sign since we want small p-values to have large sizes
source_labels=['CD4+ naïve T', 'NK', 'Treg', 'CD8+ memory T'],
target_labels=['CD14 mono', 'mature B', 'CD8+ memory T', 'CD16 mono'],
figure_size=(9, 5),
# finally, since cpdbv2 suggests using a filter to FPs
# we filter the pvals column to <= 0.05
filter_fun=lambda x: x['cellphone_pvals'] <= 0.05,
cmap='plasma'
)
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[7]:
<Figure Size: (900 x 500)>
Metabolite-mediated CCC from Transcriptomics Data¶
Recently, tools such as NeuronChat, MEBOCOST, scConnect, Cellinker, and CellPhoneDBv5 have proposed approaches, such as enrichment, expression average, among others, to infer metabolite-mediated CCC events from transcriptomics data. Similarly, we can use LIANA+ to infer metabolite-mediated CCC events from transcriptomics data, as described in the MetalinksDB manuscript.
Briefly, we use a univariate linear regression model to estimate metabolite abundances for each cell. To do so, we make use of production-degradation enzyme prior knowledge to infer the metabolite abundances. Optionally, we also take transporters into account. We then use these inferred metabolite abundances to infer metabolite-mediated CCC events.
Focus on Transcriptomics Data¶
[8]:
adata = mdata.mod['rna']
Obtain MetalinksDB Prior Knowledge¶
Here, we will use MetalinksDB which contains prior knowledge about metabolite-receptor interactions as well as such for the production and degradation enzymes for metabolites. We will use the latter type of prior knowledge to infer the metabolite abundances for each cell.
[9]:
metalinks = li.resource.get_metalinks(biospecimen_location='Blood',
source=['CellPhoneDB', 'Cellinker', 'scConnect', # Ligand-Receptor resources
'recon', 'hmr', 'rhea', 'hmdb' # Production-Degradation resources
],
types=['pd', 'lr'], # NOTE: we obtain both ligand-receptor and production-degradation sets
)
Prepare the Metabolite-Receptor Resource¶
[10]:
resource = metalinks[metalinks['type']=='lr'].copy()
resource = resource[['metabolite', 'gene_symbol']]\
.rename(columns={'gene_symbol':'receptor'}).drop_duplicates()
resource.head()
[10]:
metabolite | receptor | |
---|---|---|
173 | Oxoglutaric acid | OXGR1 |
351 | Acetaldehyde | TRPA1 |
410 | Calcitriol | VDR |
843 | ADP | P2RY1 |
1071 | ADP | P2RY6 |
Prepare the Production-Degradation Network¶
[11]:
pd_net = metalinks[metalinks['type'] == 'pd']
# we need to aggregate the production-degradation values
pd_net = pd_net[['metabolite', 'gene_symbol', 'mor']].groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index()
pd_net.head()
[11]:
metabolite | gene_symbol | mor | |
---|---|---|---|
0 | (+)-Limonene | CYP2C19 | -1.0 |
1 | (+)-Limonene | CYP2C9 | -1.0 |
2 | (R)-Lipoic acid | CEL | 1.0 |
3 | (R)-Lipoic acid | DLD | -1.0 |
4 | (R)-Lipoic acid | LIPA | 1.0 |
Prepare the transporter network¶
[12]:
t_net = metalinks[metalinks['type'] == 'pd']
t_net = t_net[['metabolite', 'gene_symbol', 'transport_direction']].dropna()
# Note that we treat export as positive and import as negative
t_net['mor'] = t_net['transport_direction'].apply(lambda x: 1 if x == 'out' else -1 if x == 'in' else None)
t_net = t_net[['metabolite', 'gene_symbol', 'mor']].dropna().groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index()
t_net = t_net[t_net['mor']!=0]
[13]:
meta = li.mt.fun.estimate_metalinks(adata,
resource,
pd_net=pd_net,
t_net=t_net, # (Optional)
use_raw=False,
# keyword arguments passed to decoupler-py
source='metabolite', target='gene_symbol',
weight='mor', min_n=3)
# pass cell type information
meta.obs['celltype'] = adata.obs['celltype']
Essentially, we now have a dataset with two modalities, one for RNA and one for Metabolites. The metabolites are estimated as t-values. Let’s visualize a couple:
[14]:
with plt.rc_context({"figure.figsize": (5, 5), "figure.dpi": (100)}):
sc.pl.umap(meta.mod['metabolite'], color=['Prostaglandin J2', 'Metanephrine', 'celltype'], cmap='coolwarm')
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

Infer Metabolite-Receptor Interactions¶
We will next infer the putative ligand-receptor interactions between these two modalities.
[15]:
li.mt.rank_aggregate(adata=meta,
groupby='celltype',
# pass our modified resource
resource=resource.rename(columns={'metabolite':'ligand'}),
# NOTE: Essential arguments when handling multimodal data
mdata_kwargs={
'x_mod': 'metabolite',
'y_mod': 'receptor',
'x_use_raw':False,
'y_use_raw':False,
'x_transform':li.ut.zi_minmax,
'y_transform':li.ut.zi_minmax,
},
verbose=True
)
Using `.X`!
Converting to sparse csr matrix!
Using provided `resource`.
Transforming metabolite using zi_minmax
Using `.X`!
Using `.X`!
9 features of mat are empty, they will be removed.
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
0.42 of entities in the resource are missing from the data.
Transforming receptor using zi_minmax
Generating ligand-receptor stats for 3885 samples and 190 features
Assuming that counts were `natural` log-normalized!
Running CellPhoneDB
100%|██████████| 1000/1000 [00:05<00:00, 187.30it/s]
Running Connectome
Running log2FC
Running NATMI
Running SingleCellSignalR
Explore Results¶
[16]:
meta.uns['liana_res'].head()
[16]:
source | target | ligand_complex | receptor_complex | lr_means | cellphone_pvals | expr_prod | scaled_weight | lr_logfc | spec_weight | lrscore | specificity_rank | magnitude_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1921 | pre-B | NK | Prostaglandin J2 | PTGDR | 0.498391 | 0.0 | 0.170128 | 1.251592 | 0.361967 | 0.069165 | 0.736382 | 0.001834 | 7.902015e-07 |
1845 | mature B | NK | Prostaglandin J2 | PTGDR | 0.482969 | 0.0 | 0.163384 | 1.204874 | 0.341928 | 0.066424 | 0.732437 | 0.002293 | 3.160181e-06 |
1793 | Treg | NK | Prostaglandin J2 | PTGDR | 0.472940 | 0.0 | 0.158999 | 1.174495 | 0.321561 | 0.064641 | 0.729763 | 0.002528 | 7.109003e-06 |
1713 | CD4+ naïve T | NK | Prostaglandin J2 | PTGDR | 0.461330 | 0.0 | 0.153923 | 1.139323 | 0.345938 | 0.062577 | 0.726551 | 0.002954 | 1.263573e-05 |
1687 | CD4+ memory T | NK | Prostaglandin J2 | PTGDR | 0.416394 | 0.0 | 0.134274 | 1.003199 | 0.259653 | 0.054589 | 0.712777 | 0.003029 | 5.050294e-05 |
[17]:
li.pl.dotplot(adata = meta,
colour='lr_means',
size='cellphone_pvals',
inverse_size=True, # we inverse sign since we want small p-values to have large sizes
source_labels=['CD4+ naïve T', 'NK', 'Treg', 'CD8+ memory T'],
target_labels=['CD14 mono', 'mature B', 'CD8+ memory T', 'CD16 mono'],
figure_size=(12, 6),
# Filter to top 10 acc to magnitude rank
top_n=10,
orderby='magnitude_rank',
orderby_ascending=True,
cmap='plasma'
)
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/plotting/_common.py:104: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[17]:
<Figure Size: (1200 x 600)>
Our inferred metabolite-protein interactions remain purely hypothetical and require validation.
Our metabolite estimation approach, like other approaches predicting metabolite-receptor interactions from transcriptomics data, infers metabolite abundances from gene expression, assuming a linear relationship between enzymatic gene expression and metabolite abundance. Thereby, it overlooks the complex, non-linear nature of metabolite fluxes influenced by cell states and microenvironments. Finally, our method treats each metabolite independently - simplifications and limitations that more sophisticated methods or multi-omics integration may address.
Next Steps¶
From here on one may follow-up with any of the other LIANA+ functionalities, such as plotting the results, or cross-conditional analyses.