Intercellular Context Factorization with Tensor-Cell2cell

Background

Tensor decomposition of cell-cell communication patterns, as proposed by Armingol and Baghdassarian et al., 2022, enables us to decipher context-driven intercellular communication by simultaneously accounting for an unlimited number of “contexts”. These contexts could represent samples coming from longtidinal sampling points, multiple conditions, or cellular niches.

The power of Tensor-cell2cell is in its ability to decompose latent patterns of intercellular communication in an untargeted manner, in theory being able to handle cell-cell communication results coming from any experimental design, regardless of its complexity.

Simply put, tensor_cell2cell uses LIANA’s output by sample to build a 4D tensor, represented by 1) contexts, 2) interactions, 3) sender, and 4) receiver cell types. This tensor is then decomposed into a set of factors, which can be interpreted as low-dimensionality latent variables (vectors) that capture the CCC patterns across contexts. We will combine LIANA with tensor_cell2cell to decipher potential ligand-receptor interaction changes.

Extensive tutorials combining LIANA & Tensor-cell2cell are available here.

Load Packages

Install required packages via pip with the following command:

pip install liana cell2cell decoupler omnipath seaborn==0.11

[1]:

import pandas as pd
import scanpy as sc
import plotnine as p9

import liana as li
import cell2cell as c2c
import decoupler as dc # needed for pathway enrichment

import warnings
warnings.filterwarnings('ignore')
from collections import defaultdict

%matplotlib inline

/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

[2]:

# NOTE: to use CPU instead of GPU, set use_gpu = False
use_gpu = True

if use_gpu:
    import torch
    import tensorly as tl

    device = "cuda" if torch.cuda.is_available() else "cpu"
    if device == "cuda":
        tl.set_backend('pytorch')
else:
    device = "cpu"

device

[2]:

'cuda'

Load & Prep Data

As a simple example, we will look at ~25k PBMCs from 8 pooled patient lupus samples, each before and after IFN-beta stimulation (Kang et al., 2018; GSE96583). Note that by focusing on PBMCs, for the purpose of this tutorial, we assume that coordinated events occur among them.

This dataset is downloaded from a link on Figshare; preprocessed for pertpy.

[3]:

# load data as from CCC chapter
adata = li.testing.datasets.kang_2018()

Showcase anndata object

[4]:

adata

[4]:

AnnData object with n_obs × n_vars = 24673 × 15706
    obs: 'nCount_RNA', 'nFeature_RNA', 'tsne1', 'tsne2', 'condition', 'cluster', 'cell_type', 'patient', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.4', 'seurat_clusters', 'sample', 'cell_abbr'
    var: 'name'
    obsm: 'X_pca', 'X_umap'
    layers: 'counts'

[5]:

adata.obs.head()

[5]:

	nCount_RNA	nFeature_RNA	tsne1	tsne2	condition	cluster	cell_type	patient	nCount_SCT	nFeature_SCT	integrated_snn_res.0.4	seurat_clusters	sample	cell_abbr
index
AAACATACATTTCC-1	3017.0	877	-27.640373	14.966629	ctrl	9	CD14+ Monocytes	patient_1016	1704.0	711	1	1	ctrl&1016	CD14
AAACATACCAGAAA-1	2481.0	713	-27.493646	28.924885	ctrl	9	CD14+ Monocytes	patient_1256	1614.0	662	1	1	ctrl&1256	CD14
AAACATACCATGCA-1	703.0	337	-10.468194	-5.984389	ctrl	3	CD4 T cells	patient_1488	908.0	337	6	6	ctrl&1488	CD4T
AAACATACCTCGCT-1	3420.0	850	-24.367997	20.429285	ctrl	9	CD14+ Monocytes	patient_1256	1738.0	653	1	1	ctrl&1256	CD14
AAACATACCTGGTA-1	3158.0	1111	27.952170	24.159738	ctrl	4	Dendritic cells	patient_1039	1857.0	928	12	12	ctrl&1039	DCs

[6]:

adata.obs["cell_type"].cat.categories

[6]:

Index(['CD4 T cells', 'CD14+ Monocytes', 'B cells', 'NK cells', 'CD8 T cells',
       'FCGR3A+ Monocytes', 'Dendritic cells', 'Megakaryocytes'],
      dtype='object')

[7]:

sample_key = 'sample'
condition_key = 'condition'
groupby = 'cell_type'

Basic QC

Note that this data has been largely pre-processed & annotated, we refer the user to the Quality Control and other relevant chapters from the best-practices book for information about pre-processing and annotation steps.

[8]:

# filter cells and genes
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
# log1p normalize the data
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

In addition to the basic QC steps, one needs to ensure that the cell groups on which they run the analysis are well defined, and stable across samples.

Show pre-computed UMAP

[9]:

sc.pl.umap(adata, color=[condition_key, groupby], frameon=False)

... storing 'sample' as categorical

Ligand-Receptor Inference by Sample

Before we decompose the CCC patterns across contexts/samples with tensor_cell2cell, we need to run liana on each sample. This is because tensor_cell2cell uses LIANA’s output by sample to build a 4D tensor, that is later decomposed into CCC patterns. To do so, liana provides a utility function called by_sample that runs each method in LIANA on each sample within the AnnData object, and returns a long-format pandas.DataFrame with the results.

In this example, we will use liana’s rank_aggregate method, which provides a robust rank consensus that combines the predictions of multiple ligand-receptor methods. Nevertheless, any other method can be used.

[10]:

li.mt.rank_aggregate.by_sample(
    adata,
    groupby=groupby,
    sample_key=sample_key, # sample key by which we which to loop
    use_raw=False,
    verbose=True, # use 'full' to show all verbose information
    n_perms=None, # exclude permutations for speed
    return_all_lrs=True, # return all LR values
    )

Now running: stim&1488: 100%|██████████| 16/16 [01:30<00:00,  5.64s/it]

Check results

[11]:

adata.uns["liana_res"].sort_values("magnitude_rank").head(10)

[11]:

	sample	source	target	ligand_complex	receptor_complex	lr_means	expr_prod	scaled_weight	lr_logfc	spec_weight	lrscore	magnitude_rank
328398	ctrl&1488	FCGR3A+ Monocytes	CD14+ Monocytes	TIMP1	CD63	2.380374	4.891490	1.928655	2.288309	0.129141	0.984287	1.151878e-08
464128	stim&1015	NK cells	NK cells	B2M	KLRD1	2.471536	2.552514	1.650699	0.789530	0.096218	0.977725	1.176411e-08
222411	ctrl&1244	FCGR3A+ Monocytes	CD14+ Monocytes	TIMP1	CD63	2.504210	5.661785	1.703509	2.209408	0.116978	0.985015	1.284372e-08
138180	ctrl&1016	FCGR3A+ Monocytes	CD14+ Monocytes	TIMP1	CD63	2.299348	4.729413	1.456102	1.823954	0.112280	0.983242	1.449280e-08
646800	stim&1256	NK cells	NK cells	B2M	KLRD1	2.561136	2.982985	1.479990	0.931471	0.089038	0.979089	1.467004e-08
0	ctrl&101	FCGR3A+ Monocytes	CD14+ Monocytes	TIMP1	CD63	2.627803	6.208072	1.545891	2.279768	0.132816	0.984116	1.765464e-08
186788	ctrl&1039	FCGR3A+ Monocytes	CD14+ Monocytes	TIMP1	CD63	2.547695	5.674415	1.301281	1.767435	0.104337	0.981838	2.530672e-08
43757	ctrl&107	FCGR3A+ Monocytes	CD14+ Monocytes	TIMP1	CD63	2.544711	5.852476	1.065290	1.678028	0.104096	0.982827	2.729847e-08
464129	stim&1015	CD8 T cells	NK cells	B2M	KLRD1	2.458305	2.537012	1.609142	0.755273	0.095633	0.977658	4.705531e-08
138181	ctrl&1016	CD14+ Monocytes	CD14+ Monocytes	TIMP1	CD63	2.284614	4.683657	1.445400	2.444503	0.111194	0.983162	5.796964e-08

Even on a small subset interactions and cell types, the interpretation becomes challenging. To overcome this, we can use Tensor-cell2cell to find the variable CCC patterns across contexts/samples.

Building a Tensor

Before we can decompose the tensor, we need to build it. To do so, we will use the to_tensor_c2c function from liana. This function takes as input the pandas.DataFrame with the results from liana.by_sample, and returns a cell2cell.tensor.PrebuiltTensor object. This object contains the tensor, as well as other useful utility functions.

Note that the way that we build the tensor can impact the results that we obtain. This is largely controlled by the how, lr_fill, and cell_fill parameters, but these are out of the scope of this tutorial. For more information, please refer to the tensor_cell2cell documentation, as well as the c2c.tensor.external_scores.dataframes_to_tensor function.

[12]:

tensor = li.multi.to_tensor_c2c(adata,
                                sample_key=sample_key,
                                score_key='magnitude_rank', # can be any score from liana
                                how='outer_cells' # how to join the samples
                                )

100%|██████████| 16/16 [00:46<00:00,  2.90s/it]

We can check the shape of the tensor, represented as (Contexts, Interactions, Senders, Receivers).

[13]:

tensor.tensor.shape

[13]:

torch.Size([16, 418, 7, 7])

One can save the tensor to disk, by using the c2c.io.export_variable_with_pickle function

[14]:

c2c.io.export_variable_with_pickle(tensor, "tensor_tutorial.pkl")

tensor_tutorial.pkl  was correctly saved.

Build Metadata

[15]:

context_dict = adata.obs[[sample_key, condition_key]].drop_duplicates()
context_dict = dict(zip(context_dict[sample_key], context_dict[condition_key]))
context_dict = defaultdict(lambda: 'Unknown', context_dict)

tensor_meta = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
                                                  metadata_dicts=[context_dict, None, None, None],
                                                  fill_with_order_elements=True
                                                  )

Running Tensor-cell2cell

Let’s now run the Tensor decomposition pipeline of Tensor-cell2cell. This function includes optimal rank estimation, as well as PARAFAC decomposition of the tensor. For more information, please refer to the Tensor-cell2cell manuscript.

Note

Optimal Rank Estimation

Here, we have omitted the optimal rank estimation step, as the optimal rank was precomputed. This can be a computationally intensive process, and we recommend using a GPU for this step.

If your machine does not have a GPU, you could use Google Colab to estimate the optimal rank. This is done automatically by setting the rank parameter to None.

[16]:

tensor = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
                                                    tensor_meta,
                                                    copy_tensor=True, # Whether to output a new tensor or modifying the original
                                                    rank=6, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis. Here, it was precomuputed.
                                                    tf_optimization='regular', # To define how robust we want the analysis to be.
                                                    random_state=0, # Random seed for reproducibility
                                                    device=device, # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
                                                    elbow_metric='error', # Metric to use in the elbow analysis.
                                                    smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
                                                    upper_rank=20, # Max number of factors to try in the elbow analysis
                                                    tf_init='random', # Initialization method of the tensor factorization
                                                    tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
                                                    cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
                                                    sample_col='Element', # Columns containing the elements in the tensor metadata
                                                    group_col='Category', # Columns containing the major groups in the tensor metadata
                                                    output_fig=False, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
                                                    )

Running Tensor Factorization

Plot Tensor Decomposition results

[17]:

factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=tensor,
                                                 metadata = tensor_meta, # This is the metadata for each dimension
                                                 sample_col='Element',
                                                 group_col='Category',
                                                 meta_cmaps = ['viridis', 'Dark2_r', 'tab20', 'tab20'],
                                                 fontsize=10, # Font size of the figures generated
                                                 )

Factorization Results

To get a more detailed look we can access the factors and loadings of the decomposition. As expected, for each factor we get four vectors, one for each dimension of the tensor. We can access those as follows:

[18]:

factors = tensor.factors

[19]:

factors.keys()

[19]:

odict_keys(['Contexts', 'Ligand-Receptor Pairs', 'Sender Cells', 'Receiver Cells'])

Here, we see clearly that Factor 6 is associated with the IFN-beta stimulation, further supported by significance testing:

[20]:

_ = c2c.plotting.context_boxplot(context_loadings=factors['Contexts'],
                                 metadict=context_dict,
                                 nrows=2,
                                 figsize=(8, 6),
                                 statistical_test='t-test_ind',
                                 pval_correction='fdr_bh',
                                 cmap='plasma',
                                 verbose=False,
                                )

The cell types associated with Factors of interest, in this case Factor 6 are CD14+ Monocytes, FCGR3A+ Monocytes, and Dendritic cells:

[21]:

c2c.plotting.ccc_networks_plot(factors,
                               included_factors=['Factor 6'],
                               network_layout='circular',
                               ccc_threshold=0.05, # Only important communication
                               nrows=1,
                               panel_size=(8, 8), # This changes the size of each figure panel.
                              )

[21]:

(<Figure size 800x800 with 1 Axes>, <Axes: title={'center': 'Factor 6'}>)

We can also check the loadings of each factor, which are the weights assigned to each interaction, sender, and receiver cell type. So, let’s check the ligand-receptor interactions with the highest loadings in Factor 6.

[22]:

lr_loadings = factors['Ligand-Receptor Pairs']
lr_loadings.sort_values("Factor 6", ascending=False).head(10)

[22]:

	Factor 1	Factor 2	Factor 3	Factor 4	Factor 5	Factor 6
TNFSF13B^HLA-DPB1	0.000104	2.876963e-02	1.024479e-04	1.813596e-01	1.175934e-01	0.331742
LGALS9^CD47	0.027514	6.518205e-02	6.209285e-06	6.604026e-02	1.530567e-01	0.249330
TNFSF13B^CD40	0.004380	6.468394e-42	0.000000e+00	6.402386e-02	4.540565e-02	0.242946
CCL8^CCR1	0.021408	0.000000e+00	0.000000e+00	5.262714e-17	8.407791e-45	0.235348
LGALS9^PTPRC	0.024144	8.636896e-02	8.615649e-03	7.164904e-02	1.608317e-01	0.230816
CCL2^CCR1	0.000718	0.000000e+00	0.000000e+00	6.474526e-02	0.000000e+00	0.225154
LGALS1^CD69	0.027558	1.308006e-01	2.494596e-10	3.079193e-02	2.857613e-01	0.224754
LGALS9^CD44	0.022000	7.814220e-02	7.626680e-05	8.022206e-02	1.576868e-01	0.205645
CCL8^CCR5	0.015222	0.000000e+00	0.000000e+00	1.890149e-17	6.600298e-38	0.197040
LGALS9^HAVCR2	0.038709	5.201748e-02	3.091704e-13	5.378049e-02	4.273469e-20	0.192646

Though anecdotal, in this example we can see that within the interactions with the highest loadings in the stimulation-associated factor is CCL8->CCR1 - previously associated with IFN-beta stimulation.

Downstream Analysis

Let’s also perform a basic enrichment analysis on the results above. We will use decoupler with pathway genesets from PROGENy.

[23]:

# load PROGENy pathways
net = dc.get_progeny(organism='human', top=5000)

Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`

Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Downloading annotations for all proteins from the following resources: `['PROGENy']`

[24]:

# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')

[25]:

# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, net, lr_sep="^")
lr_progeny.head()

[25]:

	source	interaction	weight
60	JAK-STAT	LGALS9^PTPRC	1.307807
1568	Androgen	SEMA4D^MET	-0.831693
1960	Androgen	HGF^MET	-1.288956
2352	Androgen	TIMP3^MET	-1.122612
3030	NFkB	SELE^CD44	3.332552

[26]:

# run enrichment analysis
estimate, pvals = dc.run_ulm(lr_loadings.transpose(), lr_progeny, source="source", target="interaction", use_raw=False)

Check Enrichment results for Factor 5

[27]:

dc.plot_barplot(estimate, 'Factor 6', vertical=True, cmap='coolwarm', vmin=-7, vmax=7)

We can see that the most enriched PROGENy pathway in Factor 6 is the JAK-STAT signaling pathway, which is consistent with what we would expect.

Outlook & Further Analysis

There are different ways to explore these results downstream of the tensor decomposition, but these are out of scope for this tutorial.

Stay tuned for more in-depth tutorials with Tensor-cell2cell and liana! In the meantime, we refer the user to the extensive Tensor-cell2cell x LIANA tutorials