Multi-Modal Ligand-Receptor Inference#

This notebook shows how to 1. analyse single-cell cite-seq data; 2. how to infer metabolite-receptor interactions from transcriptomics data.

import numpy as np
import pandas as pd
import scanpy as sc
import liana as li
import mudata as mu
from matplotlib import pyplot as plt

Infer Ligand-Receptor Interactions between RNA and Proteins#

Download Processed CITE-seq Data#

prot = sc.read('data/citeseq_prot.h5ad', backup_url='https://figshare.com/ndownloader/files/47625196')
rna = sc.read('data/citeseq_rna.h5ad', backup_url='https://figshare.com/ndownloader/files/47625193')

Load Processed CITE-Seq Data#

Here, we will use a very simple dataset to demonstrate the multi-modal ligand-receptor inference. We used a CITE-Seq dataset from 10X and followed the muon CITE-seq tutorial to process the RNA and Protein data.

Some minor differences are notable in the clustering due to the different versions of the packages used in the tutorial and the LIANA+ environment.

mdata = mu.MuData({'rna': rna, 'prot': prot})
# make sure that cell type is accessible
mdata.obs['celltype'] = mdata.mod['rna'].obs['celltype'].astype('category')
# inspect the object
mdata

MuData object with n_obs × n_vars = 3885 × 17838
  obs:	'celltype'
  var:	'gene_ids', 'feature_types', 'genome'
  2 modalities
    rna:	3885 x 17806
      obs:	'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'celltype'
      var:	'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
      uns:	'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
      obsm:	'X_pca', 'X_umap'
      obsp:	'connectivities', 'distances'
    prot:	3885 x 32
      var:	'gene_ids', 'feature_types', 'genome'
      uns:	'neighbors', 'pca', 'umap'
      obsm:	'X_pca', 'X_umap'
      varm:	'PCs'
      layers:	'counts'
      obsp:	'connectivities', 'distances'

Infer Interactions#

We see that we have two modalities, once for RNA and one for Proteins. We will next infer the ligand-receptor interactions between these two modalities.

CITE-seq data often focuses on antibody tagging of surface proteins, primarily receptors. To ensure only the protein modality is used for receptors, we append 'AB:' to receptor names in the antibody data. This step is necessary only when both RNA and antibody data have matching feature names.

# Obtain a ligand-receptor resource of interest
resource = li.rs.select_resource(resource_name='consensus')
# Append AB: to the receptor names
resource['receptor'] = 'AB:' + resource['receptor']

# Append AB: to the protein modality
mdata.mod['prot'].var_names = 'AB:' + mdata.mod['prot'].var['gene_ids']

While running LIANA+ with multimodal is largely analogous to when working with uni-modal data, there are a couple of things to keep in mind. Here, we need to ensure that the correct data is passed from each modality as well as to ensure that the correct modalities are used. Moreover, we need to ensure that data from the different modalities is comparable, which often requires the transformation of the data.

In this case, we use zero-inflated min-max normalization to ensure that the data from the two modalities is comparable. Essentially, a min-max normalization in which any value bellow 0.5 (by default) following normalization is set to 0. This normalization was originally introduced by the CiteFuse method, which is a ligand-receptor method for CITE-seq data.

li.mt.rank_aggregate(adata=mdata,
                     groupby='celltype',
                     # pass our modified resource
                     resource=resource,
                     # NOTE: Essential arguments when handling multimodal data
                     mdata_kwargs={
                     # Ligand-Receptor pairs are directed so we need to correctly pass
                     # `RNA` with ligands as `x_mod` and receptors as `y_mod`
                     'x_mod': 'rna',
                     'y_mod': 'prot',
                     # We use .X from the x_mod
                     'x_use_raw':False,
                     # We use .X from the y_mod
                     'y_use_raw':False,
                     # NOTE: we need to ensure that the modalities are correctly transformed
                     'x_transform':li.ut.zi_minmax,
                     'y_transform':li.ut.zi_minmax,
                    },
                  verbose=True
                  )

Using provided `resource`.
Transforming rna using zi_minmax
Transforming prot using zi_minmax
Generating ligand-receptor stats for 3885 samples and 37 features
Assuming that counts were `natural` log-normalized!
Running CellPhoneDB
Running Connectome
Running log2FC
Running NATMI
Running SingleCellSignalR

mdata.uns['liana_res'].head()

	source	target	ligand_complex	receptor_complex	lr_means	expr_prod	scaled_weight	lr_logfc	spec_weight	lrscore	specificity_rank	magnitude_rank
1287	pre-B	CD4+ memory T	HLA-DRA	AB:CD4	0.775442	0.600403	1.294984	0.643030	0.054068	0.968058	0.003006	0.000006
923	mature B	CD4+ memory T	HLA-DRA	AB:CD4	0.772048	0.594934	1.284592	0.640910	0.053575	0.967917	0.003511	0.000022
1297	pre-B	CD4+ naïve T	HLA-DRA	AB:CD4	0.756579	0.572285	1.246019	0.640179	0.051536	0.967308	0.003890	0.000050
936	mature B	CD4+ naïve T	HLA-DRA	AB:CD4	0.753185	0.567072	1.235627	0.638059	0.051066	0.967163	0.003890	0.000089
1146	pDC	CD4+ memory T	HLA-DRA	AB:CD4	0.751015	0.561047	1.220208	0.594210	0.050524	0.966993	0.003890	0.000138

Note that feature-wise min-max binds all features (i.e. genes and proteins) to the same limits, which essentially neglects the biological differences between features. This is typically not the case when working with untransformed data, as in that case we mostly care about cells being comparable, while features are typically with variable limits / distributions. As such, there might be some subtle differences from the interpretation of LIANA+ results, depending on the transformation of choice.

A benchmark of different normalizations for CCC are pending and alternative normalizations should also be explored. While other normalizations and subsequent transformations can also be used, the single-cell methods in LIANA+ require the data to be non-negative.

Plot Results & More#

li.pl.dotplot(adata = mdata,
              colour='lr_means',
              size='specificity_rank',
              inverse_size=True, # we inverse sign since we want small p-values to have large sizes
              source_labels=['CD4+ naïve T', 'NK', 'Treg', 'CD8+ memory T'],
              target_labels=['CD14 mono', 'mature B', 'CD8+ memory T', 'CD16 mono'],
              figure_size=(9, 5),
              # finally, since cpdbv2 suggests using a filter to FPs
              # we filter the pvals column to <= 0.05
              filter_fun=lambda x: x['cellphone_pvals'] <= 0.05,
              cmap='plasma'
             )

../_images/0f15421139b9e4dc0b854809c53109ca748131dd0f08343761132869b497ca5e.png

Metabolite-mediated CCC from Transcriptomics Data#

Recently, tools such as NeuronChat, MEBOCOST, scConnect, Cellinker, and CellPhoneDBv5 have proposed approaches, such as enrichment, expression average, among others, to infer metabolite-mediated CCC events from transcriptomics data. Similarly, we can use LIANA+ to infer metabolite-mediated CCC events from transcriptomics data, as described in the MetalinksDB manuscript.

Briefly, we use a univariate linear regression model to estimate metabolite abundances for each cell. To do so, we make use of production-degradation enzyme prior knowledge to infer the metabolite abundances. Optionally, we also take transporters into account. We then use these inferred metabolite abundances to infer metabolite-mediated CCC events.

Focus on Transcriptomics Data#

adata = mdata.mod['rna']

Obtain MetalinksDB Prior Knowledge#

Here, we will use MetalinksDB which contains prior knowledge about metabolite-receptor interactions as well as such for the production and degradation enzymes for metabolites. We will use the latter type of prior knowledge to infer the metabolite abundances for each cell.

metalinks = li.resource.get_metalinks(biospecimen_location='Blood',
                                      source=['CellPhoneDB', 'Cellinker', 'scConnect', # Ligand-Receptor resources
                                              'recon', 'hmr', 'rhea', 'hmdb' # Production-Degradation resources
                                              ],
                                      types=['pd', 'lr'], # NOTE: we obtain both ligand-receptor and production-degradation sets
                                     )

Prepare the Metabolite-Receptor Resource#

resource = metalinks[metalinks['type']=='lr'].copy()
resource = resource[['metabolite', 'gene_symbol']]\
    .rename(columns={'metabolite': 'source', 'gene_symbol': 'receptor'}).drop_duplicates()
resource.head()

	source	receptor
173	Oxoglutaric acid	OXGR1
351	Acetaldehyde	TRPA1
410	Calcitriol	VDR
843	ADP	P2RY1
1071	ADP	P2RY6

Prepare the Production-Degradation Network#

pd_net = metalinks[metalinks['type'] == 'pd']
# we need to aggregate the production-degradation values
pd_net = pd_net[['metabolite', 'gene_symbol', 'mor']].groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index()
pd_net = pd_net.rename(columns={'metabolite': 'source', 'gene_symbol': 'target', 'mor': 'weight'})
pd_net.head()

	source	target	weight
0	(+)-Limonene	CYP2C19	-1.0
1	(+)-Limonene	CYP2C9	-1.0
2	(R)-Lipoic acid	CEL	1.0
3	(R)-Lipoic acid	DLD	-1.0
4	(R)-Lipoic acid	LIPA	1.0

Prepare the transporter network#

t_net = metalinks[metalinks['type'] == 'pd']
t_net = t_net[['metabolite', 'gene_symbol', 'transport_direction']].dropna()
# Note that we treat export as positive and import as negative
t_net['mor'] = t_net['transport_direction'].apply(lambda x: 1 if x == 'out' else -1 if x == 'in' else None)
t_net = t_net[['metabolite', 'gene_symbol', 'mor']].dropna().groupby(['metabolite', 'gene_symbol']).agg('mean').reset_index()
t_net = t_net[t_net['mor']!=0]
t_net = t_net.rename(columns={'metabolite': 'source', 'gene_symbol': 'target', 'mor': 'weight'})

meta = li.mt.fun.estimate_metalinks(adata,
                                    resource,
                                    pd_net=pd_net,
                                    t_net=t_net,
                                    use_raw=False,
                                    tmin=3)
# pass cell type information
meta.obs['celltype'] = adata.obs['celltype']

Essentially, we now have a dataset with two modalities, one for RNA and one for Metabolites. The metabolites are estimated as t-values. Let’s visualize a couple:

with plt.rc_context({"figure.figsize": (5, 5), "figure.dpi": (100)}):
    sc.pl.umap(meta.mod['metabolite'], color=['Prostaglandin J2', 'Metanephrine', 'celltype'], cmap='coolwarm')
    

../_images/336a498354f0b20f6f7e20796509586386174c5310bea87ac1e688166a60694e.png

Infer Metabolite-Receptor Interactions#

We will next infer the putative ligand-receptor interactions between these two modalities.

li.mt.rank_aggregate(adata=meta,
                     groupby='celltype',
                     # pass our modified resource
                     resource=resource.rename(columns={'source': 'ligand'}),
                     # NOTE: Essential arguments when handling multimodal data
                     mdata_kwargs={
                     'x_mod': 'metabolite',
                     'y_mod': 'receptor',
                     'x_use_raw':False,
                     'y_use_raw':False,
                     'x_transform':li.ut.zi_minmax,
                     'y_transform':li.ut.zi_minmax,
                    },
                  verbose=True
                  )

Using provided `resource`.
Transforming metabolite using zi_minmax
Transforming receptor using zi_minmax
Generating ligand-receptor stats for 3885 samples and 190 features
Assuming that counts were `natural` log-normalized!
Running CellPhoneDB
Running Connectome
Running log2FC
Running NATMI
Running SingleCellSignalR

Explore Results#

meta.uns['liana_res'].head()

	source	target	ligand_complex	receptor_complex	lr_means	expr_prod	scaled_weight	lr_logfc	spec_weight	lrscore	specificity_rank	magnitude_rank
3446	pre-B	NK	Prostaglandin J2	PTGDR	0.498391	0.170128	1.251590	0.361967	0.069165	0.736271	0.001834	7.902015e-07
2567	mature B	NK	Prostaglandin J2	PTGDR	0.482969	0.163385	1.204872	0.341928	0.066424	0.732325	0.002293	3.160182e-06
1973	Treg	NK	Prostaglandin J2	PTGDR	0.472941	0.159000	1.174493	0.321560	0.064641	0.729651	0.002528	7.109003e-06
1073	CD4+ naïve T	NK	Prostaglandin J2	PTGDR	0.461330	0.153923	1.139322	0.345939	0.062577	0.726438	0.002999	1.263573e-05
781	CD4+ memory T	NK	Prostaglandin J2	PTGDR	0.416395	0.134274	1.003198	0.259653	0.054589	0.712661	0.003013	5.050295e-05

li.pl.dotplot(adata = meta,
              colour='lr_means',
              size='cellphone_pvals',
              inverse_size=True, # we inverse sign since we want small p-values to have large sizes
              source_labels=['CD4+ naïve T', 'NK', 'Treg', 'CD8+ memory T'],
              target_labels=['CD14 mono', 'mature B', 'CD8+ memory T', 'CD16 mono'],
              figure_size=(12, 6),
              # Filter to top 10 acc to magnitude rank
              top_n=10,
              orderby='magnitude_rank',
              orderby_ascending=True,
              cmap='plasma'
             )

../_images/24a5421c1cca778f342dba6b53c4ed01a4404a7f7a7f0c9b7bb0cd99d320bc55.png

Our metabolite estimation approach, like other approaches predicting metabolite-receptor interactions from transcriptomics data, infers metabolite abundances from gene expression, assuming a linear relationship between enzymatic gene expression and metabolite abundance. Thereby, it overlooks the complex, non-linear nature of metabolite fluxes influenced by cell states and microenvironments. Finally, our method treats each metabolite independently - simplifications and limitations that [more sophisticated methods](https://www.sciencedirect.com/science/article/pii/S2212877821002532) or [multi-omics integration](https://liana-py.readthedocs.io/en/latest/notebooks/sma.html) may address. Thus, any inferred metabolite-protein interactions remain purely hypothetical and require validation.

Next Steps#

From here on one may follow-up with any of the other LIANA+ functionalities, such as plotting the results, or cross-conditional analyses.

Multi-Modal Ligand-Receptor Inference

Contents