Intercellular Context Factorization with Tensor-Cell2cell
Background
Tensor decomposition of cell-cell communication patterns, as proposed by Armingol and Baghdassarian et al., 2022, enables us to decipher context-driven intercellular communication by simultaneously accounting for an unlimited number of “contexts”. These contexts could represent samples coming from longtidinal sampling points, multiple conditions, or cellular niches.
The power of Tensor-cell2cell is in its ability to decompose latent patterns of intercellular communication in an untargeted manner, in theory being able to handle cell-cell communication results coming from any experimental design, regardless of its complexity.
Simply put, tensor_cell2cell uses LIANA’s output by sample to build a 4D tensor, represented by 1) contexts, 2) interactions, 3) sender, and 4) receiver cell types. This tensor is then decomposed into a set of factors, which can be interpreted as low-dimensionality latent variables (vectors) that capture the CCC patterns across contexts. We will combine LIANA with tensor_cell2cell to decipher potential ligand-receptor interaction changes.
Extensive tutorials combining LIANA & Tensor-cell2cell are available here.
Load Packages
Install required packages via pip with the following command:
pip install liana cell2cell decoupler omnipath seaborn==0.11
[1]:
import pandas as pd
import scanpy as sc
import plotnine as p9
import liana as li
import cell2cell as c2c
import decoupler as dc # needed for pathway enrichment
import warnings
warnings.filterwarnings('ignore')
from collections import defaultdict
%matplotlib inline
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
[2]:
# NOTE: to use CPU instead of GPU, set use_gpu = False
use_gpu = True
if use_gpu:
import torch
import tensorly as tl
device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cuda":
tl.set_backend('pytorch')
else:
device = "cpu"
device
[2]:
'cuda'
Load & Prep Data
As a simple example, we will look at ~25k PBMCs from 8 pooled patient lupus samples, each before and after IFN-beta stimulation (Kang et al., 2018; GSE96583). Note that by focusing on PBMCs, for the purpose of this tutorial, we assume that coordinated events occur among them.
This dataset is downloaded from a link on Figshare; preprocessed for pertpy.
[3]:
# load data as from CCC chapter
adata = li.testing.datasets.kang_2018()
Showcase anndata object
[4]:
adata
[4]:
AnnData object with n_obs × n_vars = 24673 × 15706
obs: 'nCount_RNA', 'nFeature_RNA', 'tsne1', 'tsne2', 'condition', 'cluster', 'cell_type', 'patient', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.4', 'seurat_clusters', 'sample', 'cell_abbr'
var: 'name'
obsm: 'X_pca', 'X_umap'
layers: 'counts'
[5]:
adata.obs.head()
[5]:
nCount_RNA | nFeature_RNA | tsne1 | tsne2 | condition | cluster | cell_type | patient | nCount_SCT | nFeature_SCT | integrated_snn_res.0.4 | seurat_clusters | sample | cell_abbr | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
index | ||||||||||||||
AAACATACATTTCC-1 | 3017.0 | 877 | -27.640373 | 14.966629 | ctrl | 9 | CD14+ Monocytes | patient_1016 | 1704.0 | 711 | 1 | 1 | ctrl&1016 | CD14 |
AAACATACCAGAAA-1 | 2481.0 | 713 | -27.493646 | 28.924885 | ctrl | 9 | CD14+ Monocytes | patient_1256 | 1614.0 | 662 | 1 | 1 | ctrl&1256 | CD14 |
AAACATACCATGCA-1 | 703.0 | 337 | -10.468194 | -5.984389 | ctrl | 3 | CD4 T cells | patient_1488 | 908.0 | 337 | 6 | 6 | ctrl&1488 | CD4T |
AAACATACCTCGCT-1 | 3420.0 | 850 | -24.367997 | 20.429285 | ctrl | 9 | CD14+ Monocytes | patient_1256 | 1738.0 | 653 | 1 | 1 | ctrl&1256 | CD14 |
AAACATACCTGGTA-1 | 3158.0 | 1111 | 27.952170 | 24.159738 | ctrl | 4 | Dendritic cells | patient_1039 | 1857.0 | 928 | 12 | 12 | ctrl&1039 | DCs |
[6]:
adata.obs["cell_type"].cat.categories
[6]:
Index(['CD4 T cells', 'CD14+ Monocytes', 'B cells', 'NK cells', 'CD8 T cells',
'FCGR3A+ Monocytes', 'Dendritic cells', 'Megakaryocytes'],
dtype='object')
[7]:
sample_key = 'sample'
condition_key = 'condition'
groupby = 'cell_type'
Basic QC
Note that this data has been largely pre-processed & annotated, we refer the user to the Quality Control and other relevant chapters from the best-practices book for information about pre-processing and annotation steps.
[8]:
# filter cells and genes
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
# log1p normalize the data
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
In addition to the basic QC steps, one needs to ensure that the cell groups on which they run the analysis are well defined, and stable across samples.
Show pre-computed UMAP
[9]:
sc.pl.umap(adata, color=[condition_key, groupby], frameon=False)
... storing 'sample' as categorical
Ligand-Receptor Inference by Sample
Before we decompose the CCC patterns across contexts/samples with tensor_cell2cell
, we need to run liana
on each sample. This is because tensor_cell2cell
uses LIANA’s output by sample to build a 4D tensor, that is later decomposed into CCC patterns. To do so, liana provides a utility function called by_sample
that runs each method in LIANA on each sample within the AnnData
object, and returns a long-format pandas.DataFrame
with the results.
In this example, we will use liana’s rank_aggregate method, which provides a robust rank consensus that combines the predictions of multiple ligand-receptor methods. Nevertheless, any other method can be used.
[10]:
li.mt.rank_aggregate.by_sample(
adata,
groupby=groupby,
sample_key=sample_key, # sample key by which we which to loop
use_raw=False,
verbose=True, # use 'full' to show all verbose information
n_perms=None, # exclude permutations for speed
return_all_lrs=True, # return all LR values
)
Now running: stim&1488: 100%|██████████| 16/16 [01:30<00:00, 5.64s/it]
Check results
[11]:
adata.uns["liana_res"].sort_values("magnitude_rank").head(10)
[11]:
sample | source | target | ligand_complex | receptor_complex | lr_means | expr_prod | scaled_weight | lr_logfc | spec_weight | lrscore | magnitude_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
328398 | ctrl&1488 | FCGR3A+ Monocytes | CD14+ Monocytes | TIMP1 | CD63 | 2.380374 | 4.891490 | 1.928655 | 2.288309 | 0.129141 | 0.984287 | 1.151878e-08 |
464128 | stim&1015 | NK cells | NK cells | B2M | KLRD1 | 2.471536 | 2.552514 | 1.650699 | 0.789530 | 0.096218 | 0.977725 | 1.176411e-08 |
222411 | ctrl&1244 | FCGR3A+ Monocytes | CD14+ Monocytes | TIMP1 | CD63 | 2.504210 | 5.661785 | 1.703509 | 2.209408 | 0.116978 | 0.985015 | 1.284372e-08 |
138180 | ctrl&1016 | FCGR3A+ Monocytes | CD14+ Monocytes | TIMP1 | CD63 | 2.299348 | 4.729413 | 1.456102 | 1.823954 | 0.112280 | 0.983242 | 1.449280e-08 |
646800 | stim&1256 | NK cells | NK cells | B2M | KLRD1 | 2.561136 | 2.982985 | 1.479990 | 0.931471 | 0.089038 | 0.979089 | 1.467004e-08 |
0 | ctrl&101 | FCGR3A+ Monocytes | CD14+ Monocytes | TIMP1 | CD63 | 2.627803 | 6.208072 | 1.545891 | 2.279768 | 0.132816 | 0.984116 | 1.765464e-08 |
186788 | ctrl&1039 | FCGR3A+ Monocytes | CD14+ Monocytes | TIMP1 | CD63 | 2.547695 | 5.674415 | 1.301281 | 1.767435 | 0.104337 | 0.981838 | 2.530672e-08 |
43757 | ctrl&107 | FCGR3A+ Monocytes | CD14+ Monocytes | TIMP1 | CD63 | 2.544711 | 5.852476 | 1.065290 | 1.678028 | 0.104096 | 0.982827 | 2.729847e-08 |
464129 | stim&1015 | CD8 T cells | NK cells | B2M | KLRD1 | 2.458305 | 2.537012 | 1.609142 | 0.755273 | 0.095633 | 0.977658 | 4.705531e-08 |
138181 | ctrl&1016 | CD14+ Monocytes | CD14+ Monocytes | TIMP1 | CD63 | 2.284614 | 4.683657 | 1.445400 | 2.444503 | 0.111194 | 0.983162 | 5.796964e-08 |
Even on a small subset interactions and cell types, the interpretation becomes challenging. To overcome this, we can use Tensor-cell2cell to find the variable CCC patterns across contexts/samples.
Building a Tensor
Before we can decompose the tensor, we need to build it. To do so, we will use the to_tensor_c2c
function from liana
. This function takes as input the pandas.DataFrame
with the results from liana.by_sample
, and returns a cell2cell.tensor.PrebuiltTensor
object. This object contains the tensor, as well as other useful utility functions.
Note that the way that we build the tensor can impact the results that we obtain. This is largely controlled by the how
, lr_fill
, and cell_fill
parameters, but these are out of the scope of this tutorial. For more information, please refer to the tensor_cell2cell documentation, as well as the c2c.tensor.external_scores.dataframes_to_tensor
function.
[12]:
tensor = li.multi.to_tensor_c2c(adata,
sample_key=sample_key,
score_key='magnitude_rank', # can be any score from liana
how='outer_cells' # how to join the samples
)
100%|██████████| 16/16 [00:46<00:00, 2.90s/it]
We can check the shape of the tensor, represented as (Contexts, Interactions, Senders, Receivers).
[13]:
tensor.tensor.shape
[13]:
torch.Size([16, 418, 7, 7])
One can save the tensor to disk, by using the c2c.io.export_variable_with_pickle
function
[14]:
c2c.io.export_variable_with_pickle(tensor, "tensor_tutorial.pkl")
tensor_tutorial.pkl was correctly saved.
Build Metadata
[15]:
context_dict = adata.obs[[sample_key, condition_key]].drop_duplicates()
context_dict = dict(zip(context_dict[sample_key], context_dict[condition_key]))
context_dict = defaultdict(lambda: 'Unknown', context_dict)
tensor_meta = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
metadata_dicts=[context_dict, None, None, None],
fill_with_order_elements=True
)
Running Tensor-cell2cell
Let’s now run the Tensor decomposition pipeline of Tensor-cell2cell. This function includes optimal rank estimation, as well as PARAFAC decomposition of the tensor. For more information, please refer to the Tensor-cell2cell manuscript.
Note
Optimal Rank Estimation
Here, we have omitted the optimal rank estimation step, as the optimal rank was precomputed. This can be a computationally intensive process, and we recommend using a GPU for this step.
If your machine does not have a GPU, you could use Google Colab to estimate the optimal rank. This is done automatically by setting the rank
parameter to None.
[16]:
tensor = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
tensor_meta,
copy_tensor=True, # Whether to output a new tensor or modifying the original
rank=6, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis. Here, it was precomuputed.
tf_optimization='regular', # To define how robust we want the analysis to be.
random_state=0, # Random seed for reproducibility
device=device, # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
elbow_metric='error', # Metric to use in the elbow analysis.
smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
upper_rank=20, # Max number of factors to try in the elbow analysis
tf_init='random', # Initialization method of the tensor factorization
tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
sample_col='Element', # Columns containing the elements in the tensor metadata
group_col='Category', # Columns containing the major groups in the tensor metadata
output_fig=False, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
)
Running Tensor Factorization
Plot Tensor Decomposition results
[17]:
factors, axes = c2c.plotting.tensor_factors_plot(interaction_tensor=tensor,
metadata = tensor_meta, # This is the metadata for each dimension
sample_col='Element',
group_col='Category',
meta_cmaps = ['viridis', 'Dark2_r', 'tab20', 'tab20'],
fontsize=10, # Font size of the figures generated
)
Factorization Results
To get a more detailed look we can access the factors and loadings of the decomposition. As expected, for each factor we get four vectors, one for each dimension of the tensor. We can access those as follows:
[18]:
factors = tensor.factors
[19]:
factors.keys()
[19]:
odict_keys(['Contexts', 'Ligand-Receptor Pairs', 'Sender Cells', 'Receiver Cells'])
Here, we see clearly that Factor 6 is associated with the IFN-beta stimulation, further supported by significance testing:
[20]:
_ = c2c.plotting.context_boxplot(context_loadings=factors['Contexts'],
metadict=context_dict,
nrows=2,
figsize=(8, 6),
statistical_test='t-test_ind',
pval_correction='fdr_bh',
cmap='plasma',
verbose=False,
)
The cell types associated with Factors of interest, in this case Factor 6 are CD14+ Monocytes, FCGR3A+ Monocytes, and Dendritic cells:
[21]:
c2c.plotting.ccc_networks_plot(factors,
included_factors=['Factor 6'],
network_layout='circular',
ccc_threshold=0.05, # Only important communication
nrows=1,
panel_size=(8, 8), # This changes the size of each figure panel.
)
[21]:
(<Figure size 800x800 with 1 Axes>, <Axes: title={'center': 'Factor 6'}>)
We can also check the loadings of each factor, which are the weights assigned to each interaction, sender, and receiver cell type. So, let’s check the ligand-receptor interactions with the highest loadings in Factor 6
.
[22]:
lr_loadings = factors['Ligand-Receptor Pairs']
lr_loadings.sort_values("Factor 6", ascending=False).head(10)
[22]:
Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | Factor 6 | |
---|---|---|---|---|---|---|
TNFSF13B^HLA-DPB1 | 0.000104 | 2.876963e-02 | 1.024479e-04 | 1.813596e-01 | 1.175934e-01 | 0.331742 |
LGALS9^CD47 | 0.027514 | 6.518205e-02 | 6.209285e-06 | 6.604026e-02 | 1.530567e-01 | 0.249330 |
TNFSF13B^CD40 | 0.004380 | 6.468394e-42 | 0.000000e+00 | 6.402386e-02 | 4.540565e-02 | 0.242946 |
CCL8^CCR1 | 0.021408 | 0.000000e+00 | 0.000000e+00 | 5.262714e-17 | 8.407791e-45 | 0.235348 |
LGALS9^PTPRC | 0.024144 | 8.636896e-02 | 8.615649e-03 | 7.164904e-02 | 1.608317e-01 | 0.230816 |
CCL2^CCR1 | 0.000718 | 0.000000e+00 | 0.000000e+00 | 6.474526e-02 | 0.000000e+00 | 0.225154 |
LGALS1^CD69 | 0.027558 | 1.308006e-01 | 2.494596e-10 | 3.079193e-02 | 2.857613e-01 | 0.224754 |
LGALS9^CD44 | 0.022000 | 7.814220e-02 | 7.626680e-05 | 8.022206e-02 | 1.576868e-01 | 0.205645 |
CCL8^CCR5 | 0.015222 | 0.000000e+00 | 0.000000e+00 | 1.890149e-17 | 6.600298e-38 | 0.197040 |
LGALS9^HAVCR2 | 0.038709 | 5.201748e-02 | 3.091704e-13 | 5.378049e-02 | 4.273469e-20 | 0.192646 |
Though anecdotal, in this example we can see that within the interactions with the highest loadings in the stimulation-associated factor is CCL8->CCR1
- previously associated with IFN-beta stimulation.
Downstream Analysis
Let’s also perform a basic enrichment analysis on the results above. We will use decoupler with pathway genesets from PROGENy.
[23]:
# load PROGENy pathways
net = dc.get_progeny(organism='human', top=5000)
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Downloading annotations for all proteins from the following resources: `['PROGENy']`
[24]:
# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')
[25]:
# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, net, lr_sep="^")
lr_progeny.head()
[25]:
source | interaction | weight | |
---|---|---|---|
60 | JAK-STAT | LGALS9^PTPRC | 1.307807 |
1568 | Androgen | SEMA4D^MET | -0.831693 |
1960 | Androgen | HGF^MET | -1.288956 |
2352 | Androgen | TIMP3^MET | -1.122612 |
3030 | NFkB | SELE^CD44 | 3.332552 |
[26]:
# run enrichment analysis
estimate, pvals = dc.run_ulm(lr_loadings.transpose(), lr_progeny, source="source", target="interaction", use_raw=False)
Check Enrichment results for Factor 5
[27]:
dc.plot_barplot(estimate, 'Factor 6', vertical=True, cmap='coolwarm', vmin=-7, vmax=7)
We can see that the most enriched PROGENy pathway in Factor 6 is the JAK-STAT signaling pathway, which is consistent with what we would expect.
Outlook & Further Analysis
There are different ways to explore these results downstream of the tensor decomposition, but these are out of scope for this tutorial.
Stay tuned for more in-depth tutorials with Tensor-cell2cell
and liana
! In the meantime, we refer the user to the extensive Tensor-cell2cell x LIANA tutorials