Intercellular Context Factorization with Tensor-Cell2cell

Open In Colab

Background

Tensor decomposition of cell-cell communication patterns, as proposed by Armingol and Baghdassarian et al., 2022, enables us to decipher context-driven intercellular communication by simultaneously accounting for an unlimited number of “contexts”. These contexts could represent samples coming from longtidinal sampling points, multiple conditions, or cellular niches.

The power of Tensor-cell2cell is in its ability to decompose latent patterns of intercellular communication in an untargeted manner, in theory being able to handle cell-cell communication results coming from any experimental design, regardless of its complexity.

Simply put, tensor_cell2cell uses LIANA’s output by sample to build a 4D tensor, represented by 1) contexts, 2) interactions, 3) sender, and 4) receiver cell types. This tensor is then decomposed into a set of factors, which can be interpreted as low-dimensionality latent variables (vectors) that capture the CCC patterns across contexts. We will combine LIANA with tensor_cell2cell to decipher potential ligand-receptor interaction changes.

Extensive tutorials combining LIANA & Tensor-cell2cell are available here.

Load Packages

Install required packages via pip with the following command:

pip install liana cell2cell decoupler omnipath seaborn==0.11
[1]:
import pandas as pd
import scanpy as sc
import plotnine as p9

import liana as li
import cell2cell as c2c
import decoupler as dc # needed for pathway enrichment

import warnings
warnings.filterwarnings('ignore')
from collections import defaultdict

%matplotlib inline
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
[2]:
# NOTE: to use CPU instead of GPU, set use_gpu = False
use_gpu = True

if use_gpu:
    import torch
    import tensorly as tl

    device = "cuda" if torch.cuda.is_available() else "cpu"
    if device == "cuda":
        tl.set_backend('pytorch')
else:
    device = "cpu"

device
[2]:
'cuda'

Load & Prep Data

As a simple example, we will look at ~25k PBMCs from 8 pooled patient lupus samples, each before and after IFN-beta stimulation (Kang et al., 2018; GSE96583). Note that by focusing on PBMCs, for the purpose of this tutorial, we assume that coordinated events occur among them.

This dataset is downloaded from a link on Figshare; preprocessed for pertpy.

[3]:
# load data as from CCC chapter
adata = li.testing.datasets.kang_2018()

Showcase anndata object

[4]:
adata
[4]:
AnnData object with n_obs × n_vars = 24673 × 15706
    obs: 'nCount_RNA', 'nFeature_RNA', 'tsne1', 'tsne2', 'condition', 'cluster', 'cell_type', 'patient', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.4', 'seurat_clusters', 'sample', 'cell_abbr'
    var: 'name'
    obsm: 'X_pca', 'X_umap'
    layers: 'counts'
[5]:
adata.obs.head()
[5]:
nCount_RNA nFeature_RNA tsne1 tsne2 condition cluster cell_type patient nCount_SCT nFeature_SCT integrated_snn_res.0.4 seurat_clusters sample cell_abbr
index
AAACATACATTTCC-1 3017.0 877 -27.640373 14.966629 ctrl 9 CD14+ Monocytes patient_1016 1704.0 711 1 1 ctrl&1016 CD14
AAACATACCAGAAA-1 2481.0 713 -27.493646 28.924885 ctrl 9 CD14+ Monocytes patient_1256 1614.0 662 1 1 ctrl&1256 CD14
AAACATACCATGCA-1 703.0 337 -10.468194 -5.984389 ctrl 3 CD4 T cells patient_1488 908.0 337 6 6 ctrl&1488 CD4T
AAACATACCTCGCT-1 3420.0 850 -24.367997 20.429285 ctrl 9 CD14+ Monocytes patient_1256 1738.0 653 1 1 ctrl&1256 CD14
AAACATACCTGGTA-1 3158.0 1111 27.952170 24.159738 ctrl 4 Dendritic cells patient_1039 1857.0 928 12 12 ctrl&1039 DCs
[6]:
adata.obs["cell_type"].cat.categories
[6]:
Index(['CD4 T cells', 'CD14+ Monocytes', 'B cells', 'NK cells', 'CD8 T cells',
       'FCGR3A+ Monocytes', 'Dendritic cells', 'Megakaryocytes'],
      dtype='object')
[7]:
sample_key = 'sample'
condition_key = 'condition'
groupby = 'cell_type'

Basic QC

Note that this data has been largely pre-processed & annotated, we refer the user to the Quality Control and other relevant chapters from the best-practices book for information about pre-processing and annotation steps.

[8]:
# filter cells and genes
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
# log1p normalize the data
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

In addition to the basic QC steps, one needs to ensure that the cell groups on which they run the analysis are well defined, and stable across samples.

Show pre-computed UMAP

[9]:
sc.pl.umap(adata, color=[condition_key, groupby], frameon=False)
... storing 'sample' as categorical
../_images/notebooks_liana_c2c_20_1.png

Ligand-Receptor Inference by Sample

Before we decompose the CCC patterns across contexts/samples with tensor_cell2cell, we need to run liana on each sample. This is because tensor_cell2cell uses LIANA’s output by sample to build a 4D tensor, that is later decomposed into CCC patterns. To do so, liana provides a utility function called by_sample that runs each method in LIANA on each sample within the AnnData object, and returns a long-format pandas.DataFrame with the results.

In this example, we will use liana’s rank_aggregate method, which provides a robust rank consensus that combines the predictions of multiple ligand-receptor methods. Nevertheless, any other method can be used.

[10]:
li.mt.rank_aggregate.by_sample(
    adata,
    groupby=groupby,
    sample_key=sample_key, # sample key by which we which to loop
    use_raw=False,
    verbose=True, # use 'full' to show all verbose information
    n_perms=None, # exclude permutations for speed
    return_all_lrs=True, # return all LR values
    )
Now running: stim&1488: 100%|██████████| 16/16 [01:30<00:00,  5.64s/it]

Check results

[11]:
adata.uns["liana_res"].sort_values("magnitude_rank").head(10)
[11]:
sample source target ligand_complex receptor_complex lr_means expr_prod scaled_weight lr_logfc spec_weight lrscore magnitude_rank
328398 ctrl&1488 FCGR3A+ Monocytes CD14+ Monocytes TIMP1 CD63 2.380374 4.891490 1.928655 2.288309 0.129141 0.984287 1.151878e-08
464128 stim&1015 NK cells NK cells B2M KLRD1 2.471536 2.552514 1.650699 0.789530 0.096218 0.977725 1.176411e-08
222411 ctrl&1244 FCGR3A+ Monocytes CD14+ Monocytes TIMP1 CD63 2.504210 5.661785 1.703509 2.209408 0.116978 0.985015 1.284372e-08
138180 ctrl&1016 FCGR3A+ Monocytes CD14+ Monocytes TIMP1 CD63 2.299348 4.729413 1.456102 1.823954 0.112280 0.983242 1.449280e-08
646800 stim&1256 NK cells NK cells B2M KLRD1 2.561136 2.982985 1.479990 0.931471 0.089038 0.979089 1.467004e-08
0 ctrl&101 FCGR3A+ Monocytes CD14+ Monocytes TIMP1 CD63 2.627803 6.208072 1.545891 2.279768 0.132816 0.984116 1.765464e-08
186788 ctrl&1039 FCGR3A+ Monocytes CD14+ Monocytes TIMP1 CD63 2.547695 5.674415 1.301281 1.767435 0.104337 0.981838 2.530672e-08
43757 ctrl&107 FCGR3A+ Monocytes CD14+ Monocytes TIMP1 CD63 2.544711 5.852476 1.065290 1.678028 0.104096 0.982827 2.729847e-08
464129 stim&1015 CD8 T cells NK cells B2M KLRD1 2.458305 2.537012 1.609142 0.755273 0.095633 0.977658 4.705531e-08
138181 ctrl&1016 CD14+ Monocytes CD14+ Monocytes TIMP1 CD63 2.284614 4.683657 1.445400 2.444503 0.111194 0.983162 5.796964e-08

Even on a small subset interactions and cell types, the interpretation becomes challenging. To overcome this, we can use Tensor-cell2cell to find the variable CCC patterns across contexts/samples.

Building a Tensor

Before we can decompose the tensor, we need to build it. To do so, we will use the to_tensor_c2c function from liana. This function takes as input the pandas.DataFrame with the results from liana.by_sample, and returns a cell2cell.tensor.PrebuiltTensor object. This object contains the tensor, as well as other useful utility functions.

Note that the way that we build the tensor can impact the results that we obtain. This is largely controlled by the how, lr_fill, and cell_fill parameters, but these are out of the scope of this tutorial. For more information, please refer to the tensor_cell2cell documentation, as well as the c2c.tensor.external_scores.dataframes_to_tensor function.

[12]:
tensor = li.multi.to_tensor_c2c(adata,
                                sample_key=sample_key,
                                score_key='magnitude_rank', # can be any score from liana
                                how='outer_cells' # how to join the samples
                                )
100%|██████████| 16/16 [00:46<00:00,  2.90s/it]

We can check the shape of the tensor, represented as (Contexts, Interactions, Senders, Receivers).

[13]:
tensor.tensor.shape
[13]:
torch.Size([16, 418, 7, 7])

One can save the tensor to disk, by using the c2c.io.export_variable_with_pickle function

[14]:
c2c.io.export_variable_with_pickle(tensor, "tensor_tutorial.pkl")
tensor_tutorial.pkl  was correctly saved.

Build Metadata

[15]:
context_dict = adata.obs[[sample_key, condition_key]].drop_duplicates()
context_dict = dict(zip(context_dict[sample_key], context_dict[condition_key]))
context_dict = defaultdict(lambda: 'Unknown', context_dict)

tensor_meta = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
                                                  metadata_dicts=[context_dict, None, None, None],
                                                  fill_with_order_elements=True
                                                  )

Running Tensor-cell2cell

Let’s now run the Tensor decomposition pipeline of Tensor-cell2cell. This function includes optimal rank estimation, as well as PARAFAC decomposition of the tensor. For more information, please refer to the Tensor-cell2cell manuscript.

Note

Optimal Rank Estimation

Here, we have omitted the optimal rank estimation step, as the optimal rank was precomputed. This can be a computationally intensive process, and we recommend using a GPU for this step.

If your machine does not have a GPU, you could use Google Colab to estimate the optimal rank. This is done automatically by setting the rank parameter to None.

[16]:
tensor = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
                                                    tensor_meta,
                                                    copy_tensor=True, # Whether to output a new tensor or modifying the original
                                                    rank=6, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis. Here, it was precomuputed.
                                                    tf_optimization='regular', # To define how robust we want the analysis to be.
                                                    random_state=0, # Random seed for reproducibility
                                                    device=device, # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
                                                    elbow_metric='error', # Metric to use in the elbow analysis.
                                                    smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
                                                    upper_rank=20, # Max number of factors to try in the elbow analysis
                                                    tf_init='random', # Initialization method of the tensor factorization
                                                    tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
                                                    cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
                                                    sample_col='Element', # Columns containing the elements in the tensor metadata
                                                    group_col='Category', # Columns containing the major groups in the tensor metadata
                                                    output_fig=False, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
                                                    )
Running Tensor Factorization

Plot Tensor Decomposition results

[17]:
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=tensor,
                                                 metadata = tensor_meta, # This is the metadata for each dimension
                                                 sample_col='Element',
                                                 group_col='Category',
                                                 meta_cmaps = ['viridis', 'Dark2_r', 'tab20', 'tab20'],
                                                 fontsize=10, # Font size of the figures generated
                                                 )
../_images/notebooks_liana_c2c_41_0.png

Factorization Results

To get a more detailed look we can access the factors and loadings of the decomposition. As expected, for each factor we get four vectors, one for each dimension of the tensor. We can access those as follows:

[18]:
factors = tensor.factors
[19]:
factors.keys()
[19]:
odict_keys(['Contexts', 'Ligand-Receptor Pairs', 'Sender Cells', 'Receiver Cells'])

Here, we see clearly that Factor 6 is associated with the IFN-beta stimulation, further supported by significance testing:

[20]:
_ = c2c.plotting.context_boxplot(context_loadings=factors['Contexts'],
                                 metadict=context_dict,
                                 nrows=2,
                                 figsize=(8, 6),
                                 statistical_test='t-test_ind',
                                 pval_correction='fdr_bh',
                                 cmap='plasma',
                                 verbose=False,
                                )
../_images/notebooks_liana_c2c_47_0.png

The cell types associated with Factors of interest, in this case Factor 6 are CD14+ Monocytes, FCGR3A+ Monocytes, and Dendritic cells:

[21]:
c2c.plotting.ccc_networks_plot(factors,
                               included_factors=['Factor 6'],
                               network_layout='circular',
                               ccc_threshold=0.05, # Only important communication
                               nrows=1,
                               panel_size=(8, 8), # This changes the size of each figure panel.
                              )
[21]:
(<Figure size 800x800 with 1 Axes>, <Axes: title={'center': 'Factor 6'}>)
../_images/notebooks_liana_c2c_49_1.png

We can also check the loadings of each factor, which are the weights assigned to each interaction, sender, and receiver cell type. So, let’s check the ligand-receptor interactions with the highest loadings in Factor 6.

[22]:
lr_loadings = factors['Ligand-Receptor Pairs']
lr_loadings.sort_values("Factor 6", ascending=False).head(10)
[22]:
Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6
TNFSF13B^HLA-DPB1 0.000104 2.876963e-02 1.024479e-04 1.813596e-01 1.175934e-01 0.331742
LGALS9^CD47 0.027514 6.518205e-02 6.209285e-06 6.604026e-02 1.530567e-01 0.249330
TNFSF13B^CD40 0.004380 6.468394e-42 0.000000e+00 6.402386e-02 4.540565e-02 0.242946
CCL8^CCR1 0.021408 0.000000e+00 0.000000e+00 5.262714e-17 8.407791e-45 0.235348
LGALS9^PTPRC 0.024144 8.636896e-02 8.615649e-03 7.164904e-02 1.608317e-01 0.230816
CCL2^CCR1 0.000718 0.000000e+00 0.000000e+00 6.474526e-02 0.000000e+00 0.225154
LGALS1^CD69 0.027558 1.308006e-01 2.494596e-10 3.079193e-02 2.857613e-01 0.224754
LGALS9^CD44 0.022000 7.814220e-02 7.626680e-05 8.022206e-02 1.576868e-01 0.205645
CCL8^CCR5 0.015222 0.000000e+00 0.000000e+00 1.890149e-17 6.600298e-38 0.197040
LGALS9^HAVCR2 0.038709 5.201748e-02 3.091704e-13 5.378049e-02 4.273469e-20 0.192646

Though anecdotal, in this example we can see that within the interactions with the highest loadings in the stimulation-associated factor is CCL8->CCR1 - previously associated with IFN-beta stimulation.

Downstream Analysis

Let’s also perform a basic enrichment analysis on the results above. We will use decoupler with pathway genesets from PROGENy.

[23]:
# load PROGENy pathways
net = dc.get_progeny(organism='human', top=5000)
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Downloading annotations for all proteins from the following resources: `['PROGENy']`
[24]:
# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')
[25]:
# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, net, lr_sep="^")
lr_progeny.head()
[25]:
source interaction weight
60 JAK-STAT LGALS9^PTPRC 1.307807
1568 Androgen SEMA4D^MET -0.831693
1960 Androgen HGF^MET -1.288956
2352 Androgen TIMP3^MET -1.122612
3030 NFkB SELE^CD44 3.332552
[26]:
# run enrichment analysis
estimate, pvals = dc.run_ulm(lr_loadings.transpose(), lr_progeny, source="source", target="interaction", use_raw=False)

Check Enrichment results for Factor 5

[27]:
dc.plot_barplot(estimate, 'Factor 6', vertical=True, cmap='coolwarm', vmin=-7, vmax=7)
../_images/notebooks_liana_c2c_60_0.png

We can see that the most enriched PROGENy pathway in Factor 6 is the JAK-STAT signaling pathway, which is consistent with what we would expect.

Outlook & Further Analysis

There are different ways to explore these results downstream of the tensor decomposition, but these are out of scope for this tutorial.

Stay tuned for more in-depth tutorials with Tensor-cell2cell and liana! In the meantime, we refer the user to the extensive Tensor-cell2cell x LIANA tutorials