Intercellular Context Factorization with MOFA¶
Background¶
Here, we will adapt the statistical framework of multi-omics factor analysis (MOFA) to obtain intercellular communication programmes - in the form of ligand-receptor interaction scores observed to change across samples. This application of MOFA is inspired by and is in line with the factorization proposed by Tensor-cell2cell Armingol and Baghdassarian et al., 2022 - see existing tutorial.
Such factorization approaches essentially enable us to decipher context-driven intercellular communication by simultaneously accounting for an unlimited number of “contexts” in an untargeted manner. Similarly to Tensor-cell2cell, this application of MOFA is able to handle cell-cell communication results coming from any experimental design, regardless of its complexity.
Simply put, we will use LIANA’s output by sample to build a multi-view structure represented by samples and interactions by cell type pairs (views). We will then use MOFA+ to capture the CCC patterns across samples. To do so, we combine liana with the MuData/muon infrastructure.
Load Packages¶
mofa, decoupler, omnipath, and marsilea can be installed via pip with the following commands:
pip install decoupler
pip install mofax
pip install muon
pip install omnipath
pip install marsilea
[1]:
import numpy as np
import pandas as pd
import scanpy as sc
import plotnine as p9
import liana as li
# load muon and mofax
import muon as mu
import mofax as mofa
import decoupler as dc
Load & Prep Data¶
As a simple example, we will look at ~25k PBMCs from 8 pooled patient lupus samples, each before and after IFN-beta stimulation (Kang et al., 2018; GSE96583). Note that by focusing on PBMCs, for the purpose of this tutorial, we assume that coordinated events occur among them.
This dataset is downloaded from a link on Figshare; preprocessed for pertpy.
[2]:
adata = li.testing.datasets.kang_2018()
[3]:
adata
[3]:
AnnData object with n_obs × n_vars = 24673 × 15706
obs: 'nCount_RNA', 'nFeature_RNA', 'tsne1', 'tsne2', 'condition', 'cluster', 'cell_type', 'patient', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.4', 'seurat_clusters', 'sample', 'cell_abbr'
var: 'name'
obsm: 'X_pca', 'X_umap'
layers: 'counts'
Define columns of interest from .obs
Note that we use cell abbreviations because MOFA will use them as labels for the views.
[4]:
sample_key = 'sample'
condition_key = 'condition'
groupby = 'cell_abbr'
Basic Preparation¶
Note that this data has been largely pre-processed & annotated, we refer the user to the Quality Control and other relevant chapters from the best-practices book for information about pre-processing and annotation steps.
[5]:
# filter cells and genes
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
# log1p normalize the data
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
Showcase the data¶
[6]:
# Show pre-computed UMAP
sc.pl.umap(adata, color=[condition_key, sample_key, 'cell_type', groupby], frameon=False, ncols=2)
... storing 'sample' as categorical
/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

Ligand-Receptor Inference by Sample¶
Before we decompose the CCC patterns across contexts/samples with MOFA
, we first need to run liana
on each sample. To do so, liana provides a utility function called by_sample
that runs each method in LIANA on each sample within the AnnData
object, and returns a long-format pandas.DataFrame
with the results.
In this example, we will use liana’s rank_aggregate
method, which provides a robust rank consensus that combines the predictions of multiple ligand-receptor methods. Nevertheless, any other method can be used.
[7]:
li.mt.rank_aggregate.by_sample(
adata,
groupby=groupby,
resource_name='consensus', # NOTE: uses HUMAN gene symbols!
sample_key=sample_key, # sample key by which we which to loop
expr_prop = 0.1,
use_raw=False,
n_perms=100, # reduce permutations for speed
return_all_lrs=False, # we don't return all LR values to utilize MOFA's flexible views
verbose=True, # use 'full' to show all information
)
Now running: ctrl&101: 0%| | 0/16 [00:00<?, ?it/s]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: ctrl&107: 6%|▋ | 1/16 [00:05<01:18, 5.23s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: ctrl&1015: 12%|█▎ | 2/16 [00:10<01:14, 5.31s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: ctrl&1016: 19%|█▉ | 3/16 [00:17<01:18, 6.06s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: ctrl&1039: 25%|██▌ | 4/16 [00:23<01:12, 6.02s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: ctrl&1244: 31%|███▏ | 5/16 [00:29<01:05, 5.96s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: ctrl&1256: 38%|███▊ | 6/16 [00:36<01:03, 6.40s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: ctrl&1488: 44%|████▍ | 7/16 [00:43<00:59, 6.60s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&101: 50%|█████ | 8/16 [00:53<01:01, 7.70s/it] /home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&107: 56%|█████▋ | 9/16 [01:13<01:20, 11.46s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&1015: 62%|██████▎ | 10/16 [01:21<01:01, 10.27s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&1016: 69%|██████▉ | 11/16 [01:30<00:50, 10.08s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&1039: 75%|███████▌ | 12/16 [01:43<00:43, 10.83s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&1244: 81%|████████▏ | 13/16 [01:52<00:30, 10.22s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&1256: 88%|████████▊ | 14/16 [02:03<00:20, 10.47s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&1488: 94%|█████████▍| 15/16 [02:14<00:10, 10.65s/it]/home/dbdimitrov/anaconda3/envs/test-env/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:150: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
Now running: stim&1488: 100%|██████████| 16/16 [02:28<00:00, 9.31s/it]
Check results
[8]:
adata.uns["liana_res"].sort_values("magnitude_rank").head()
[8]:
sample | source | target | ligand_complex | receptor_complex | lr_means | cellphone_pvals | expr_prod | scaled_weight | lr_logfc | spec_weight | lrscore | specificity_rank | magnitude_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5806 | ctrl&1039 | FGR3 | CD14 | TIMP1 | CD63 | 2.547695 | 0.0 | 5.674415 | 1.301281 | 1.767435 | 0.104337 | 0.981838 | 0.014324 | 0.000002 |
15798 | stim&1015 | NK | NK | B2M | KLRD1 | 2.471536 | 0.0 | 2.552514 | 1.650699 | 0.789530 | 0.096218 | 0.977725 | 0.032177 | 0.000002 |
22885 | stim&1256 | NK | NK | B2M | KLRD1 | 2.561136 | 0.0 | 2.982985 | 1.479990 | 0.931471 | 0.089038 | 0.979089 | 0.032345 | 0.000003 |
7899 | ctrl&1244 | FGR3 | CD14 | TIMP1 | CD63 | 2.504210 | 0.0 | 5.661785 | 1.703509 | 2.209408 | 0.116978 | 0.985015 | 0.001211 | 0.000004 |
1399 | ctrl&107 | FGR3 | CD14 | TIMP1 | CD63 | 2.544711 | 0.0 | 5.852476 | 1.065290 | 1.678028 | 0.104096 | 0.982827 | 0.008478 | 0.000004 |
[9]:
adata.uns["liana_res"]['source'].unique()
[9]:
array(['FGR3', 'CD14', 'NK', 'CD8T', 'B', 'DCs', 'CD4T'], dtype=object)
Note
by_sample
We see that in addition to the usual results, we also get sample
as a column which corresponds to the name of the sample_key
in the AnnData
object.
Now that we have obtained results by sample, we can use a dotplot by sample to visualize the ligand-receptor interactions. Let’s pick arbitrarily the interactions with the highest magnitude_rank
.
[10]:
(li.pl.dotplot_by_sample(adata, sample_key=sample_key,
colour="magnitude_rank",
size="specificity_rank",
source_labels=["CD4T", "B", "FGR3"],
target_labels=["CD8T", 'DCs', 'CD14'],
ligand_complex=["B2M"],
inverse_colour=True,
inverse_size=True,
receptor_complex=["KLRD1", "LILRB2", "CD3D"],
figure_size=(12, 8),
size_range=(0.5, 5),
) +
# rotate facet labels
p9.theme(strip_text=p9.element_text(size=10, colour="black", angle=90))
)
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[10]:
<Figure Size: (1200 x 800)>
Even on a small subset interactions and cell types, we can see that interpretation becomes challenging. To overcome this, we can use MOFA to find the variable CCC patterns across contexts/samples.
Create a Multi-View Structure¶
Before we can identify the variable CCC patterns across contexts/samples, we need to create a multi-view structure. In this case, we will use the lrs_to_views
function from liana to create a list of views (stored in a MuData object), where each view corresponds to a pair of potentially interacting cell types. The scores of interactions between cell type pairs represent those inferred with liana, stored by default in adata.uns['liana_res']
. Here, we will use liana’s aggregate
‘magnitude_rank’.
[11]:
adata.write_h5ad("../../test.h5ad")
[12]:
mdata = li.multi.lrs_to_views(adata,
sample_key=sample_key,
score_key='magnitude_rank',
obs_keys=['patient', 'condition'], # add those to mdata.obs
lr_prop = 0.3, # minimum required proportion of samples to keep an LR
lrs_per_sample = 20, # minimum number of interactions to keep a sample in a specific view
lrs_per_view = 20, # minimum number of interactions to keep a view
samples_per_view = 5, # NOTE: minimum number of samples to keep a view
min_variance = 0, # minimum variance to keep an interaction
lr_fill = 0, # fill missing LR values across samples with this
verbose=True
)
51%|█████ | 22/43 [00:00<00:00, 24.20it/s]100%|██████████| 43/43 [00:01<00:00, 28.25it/s]
The lr_fill
parameter controls how we deal with missing interaction scores. The default is np.nan
and in that case the scores would be imputed by MOFA.
Here, we fill the missing ligand-receptors with 0s because the ligand-receptor interaction missing here, i.e. those when return_all_lrs=False
, are such which are not expressed above a certain proportion of cells per cell type (that is controlled via expr_prop
when running liana).
Given the assumption that the ligand-receptor interactions occur at the across cell type level, we could thus assume that the genes that are not present in a sufficient proportion of cells are unlikely to be involved in interactions that are relevant to the cell type as a whole.
Note
View Representation
MOFA supports the flexible representation of views, where each view can represent a different type of features (e.g. genes, proteins, metabolites, etc.). In this case, we simply allow for different ligand-receptor to be used in each cell type pair (view).
[13]:
mdata
[13]:
MuData object with n_obs × n_vars = 16 × 1755 obs: 'patient', 'condition' 36 modalities FGR3&CD14: 16 x 75 FGR3&DCs: 16 x 87 CD14&CD14: 16 x 81 NK&NK: 7 x 26 FGR3&NK: 16 x 39 B&NK: 7 x 26 DCs&NK: 16 x 43 FGR3&FGR3: 16 x 79 DCs&CD14: 16 x 79 CD14&NK: 16 x 41 NK&CD8T: 14 x 33 CD14&DCs: 16 x 86 FGR3&CD8T: 16 x 46 CD8T&CD8T: 12 x 29 DCs&DCs: 16 x 89 CD14&FGR3: 16 x 79 DCs&FGR3: 16 x 82 B&CD8T: 14 x 36 DCs&CD8T: 16 x 51 CD4T&CD8T: 11 x 27 CD14&CD8T: 16 x 50 FGR3&CD4T: 12 x 33 DCs&CD4T: 13 x 37 CD14&CD4T: 11 x 31 CD8T&CD14: 16 x 33 NK&CD14: 15 x 45 B&CD14: 14 x 40 NK&FGR3: 16 x 43 CD8T&FGR3: 16 x 35 CD4T&CD14: 15 x 32 B&FGR3: 16 x 36 CD4T&FGR3: 16 x 37 CD8T&DCs: 12 x 38 B&DCs: 12 x 50 NK&DCs: 14 x 46 CD4T&DCs: 11 x 35
Fitting a MOFA model¶
Now that the putative ligand-receptor interactions across samples aretransformed into a multi-view representation, we can use MOFA to run an intercellular communication factor analysis.
We will attempt to capture the variability across samples and the different cell-type pairs by reducing the data into a number of factors, where each factor captures the coordinated communication events across the cell types.
[14]:
mu.tl.mofa(mdata,
use_obs='union',
convergence_mode='medium',
outfile='models/mofatalk.h5ad',
n_factors=4,
)
#########################################################
### __ __ ____ ______ ###
### | \/ |/ __ \| ____/\ _ ###
### | \ / | | | | |__ / \ _| |_ ###
### | |\/| | | | | __/ /\ \_ _| ###
### | | | | |__| | | / ____ \|_| ###
### |_| |_|\____/|_|/_/ \_\ ###
### ###
#########################################################
Loaded view='FGR3&CD14' group='group1' with N=16 samples and D=75 features...
Loaded view='FGR3&DCs' group='group1' with N=16 samples and D=87 features...
Loaded view='CD14&CD14' group='group1' with N=16 samples and D=81 features...
Loaded view='NK&NK' group='group1' with N=16 samples and D=26 features...
Loaded view='FGR3&NK' group='group1' with N=16 samples and D=39 features...
Loaded view='B&NK' group='group1' with N=16 samples and D=26 features...
Loaded view='DCs&NK' group='group1' with N=16 samples and D=43 features...
Loaded view='FGR3&FGR3' group='group1' with N=16 samples and D=79 features...
Loaded view='DCs&CD14' group='group1' with N=16 samples and D=79 features...
Loaded view='CD14&NK' group='group1' with N=16 samples and D=41 features...
Loaded view='NK&CD8T' group='group1' with N=16 samples and D=33 features...
Loaded view='CD14&DCs' group='group1' with N=16 samples and D=86 features...
Loaded view='FGR3&CD8T' group='group1' with N=16 samples and D=46 features...
Loaded view='CD8T&CD8T' group='group1' with N=16 samples and D=29 features...
Loaded view='DCs&DCs' group='group1' with N=16 samples and D=89 features...
Loaded view='CD14&FGR3' group='group1' with N=16 samples and D=79 features...
Loaded view='DCs&FGR3' group='group1' with N=16 samples and D=82 features...
Loaded view='B&CD8T' group='group1' with N=16 samples and D=36 features...
Loaded view='DCs&CD8T' group='group1' with N=16 samples and D=51 features...
Loaded view='CD4T&CD8T' group='group1' with N=16 samples and D=27 features...
Loaded view='CD14&CD8T' group='group1' with N=16 samples and D=50 features...
Loaded view='FGR3&CD4T' group='group1' with N=16 samples and D=33 features...
Loaded view='DCs&CD4T' group='group1' with N=16 samples and D=37 features...
Loaded view='CD14&CD4T' group='group1' with N=16 samples and D=31 features...
Loaded view='CD8T&CD14' group='group1' with N=16 samples and D=33 features...
Loaded view='NK&CD14' group='group1' with N=16 samples and D=45 features...
Loaded view='B&CD14' group='group1' with N=16 samples and D=40 features...
Loaded view='NK&FGR3' group='group1' with N=16 samples and D=43 features...
Loaded view='CD8T&FGR3' group='group1' with N=16 samples and D=35 features...
Loaded view='CD4T&CD14' group='group1' with N=16 samples and D=32 features...
Loaded view='B&FGR3' group='group1' with N=16 samples and D=36 features...
Loaded view='CD4T&FGR3' group='group1' with N=16 samples and D=37 features...
Loaded view='CD8T&DCs' group='group1' with N=16 samples and D=38 features...
Loaded view='B&DCs' group='group1' with N=16 samples and D=50 features...
Loaded view='NK&DCs' group='group1' with N=16 samples and D=46 features...
Loaded view='CD4T&DCs' group='group1' with N=16 samples and D=35 features...
Model options:
- Automatic Relevance Determination prior on the factors: True
- Automatic Relevance Determination prior on the weights: True
- Spike-and-slab prior on the factors: False
- Spike-and-slab prior on the weights: True
Likelihoods:
- View 0 (FGR3&CD14): gaussian
- View 1 (FGR3&DCs): gaussian
- View 2 (CD14&CD14): gaussian
- View 3 (NK&NK): gaussian
- View 4 (FGR3&NK): gaussian
- View 5 (B&NK): gaussian
- View 6 (DCs&NK): gaussian
- View 7 (FGR3&FGR3): gaussian
- View 8 (DCs&CD14): gaussian
- View 9 (CD14&NK): gaussian
- View 10 (NK&CD8T): gaussian
- View 11 (CD14&DCs): gaussian
- View 12 (FGR3&CD8T): gaussian
- View 13 (CD8T&CD8T): gaussian
- View 14 (DCs&DCs): gaussian
- View 15 (CD14&FGR3): gaussian
- View 16 (DCs&FGR3): gaussian
- View 17 (B&CD8T): gaussian
- View 18 (DCs&CD8T): gaussian
- View 19 (CD4T&CD8T): gaussian
- View 20 (CD14&CD8T): gaussian
- View 21 (FGR3&CD4T): gaussian
- View 22 (DCs&CD4T): gaussian
- View 23 (CD14&CD4T): gaussian
- View 24 (CD8T&CD14): gaussian
- View 25 (NK&CD14): gaussian
- View 26 (B&CD14): gaussian
- View 27 (NK&FGR3): gaussian
- View 28 (CD8T&FGR3): gaussian
- View 29 (CD4T&CD14): gaussian
- View 30 (B&FGR3): gaussian
- View 31 (CD4T&FGR3): gaussian
- View 32 (CD8T&DCs): gaussian
- View 33 (B&DCs): gaussian
- View 34 (NK&DCs): gaussian
- View 35 (CD4T&DCs): gaussian
######################################
## Training the model with seed 1 ##
######################################
Converged!
#######################
## Training finished ##
#######################
Warning: Output file models/mofatalk.h5ad already exists, it will be replaced
Saving model in models/mofatalk.h5ad...
Saved MOFA embeddings in .obsm['X_mofa'] slot and their loadings in .varm['LFs'].
Exploring the MOFA model¶
For convenience, we provide simple getter function to access the model parameters, in addition to those available via the MuData API & the MOFA model itself.
Explore Metadata Associations to the Factor Scores¶
[15]:
dc.get_metadata_associations(
mdata,
obs_keys = ['patient', 'condition'], # Metadata columns to associate to PCs
obsm_key='X_mofa', # Where the PCs are stored
uns_key='mofa_anova', # Where the results are stored
inplace=True,
)
dc.plot_associations(
mdata,
uns_key='mofa_anova', # Summary statistics from the anova tests
obsm_key='X_mofa', # where the PCs are stored
stat_col='p_adj', # Which summary statistic to plot
obs_annotation_cols = ['patient', 'condition'], # which sample annotations to plot
titles=['Principle component scores', 'Adjusted p-values from ANOVA'],
figsize=(7, 5),
n_factors=10,
)

[16]:
# obtain the factor scores as a dataframe
factor_scores = li.ut.get_factor_scores(mdata, obsm_key='X_mofa', obs_keys=['patient', 'condition'])
factor_scores.head()
[16]:
sample | Factor1 | Factor2 | Factor3 | Factor4 | patient | condition | |
---|---|---|---|---|---|---|---|
0 | ctrl&101 | 0.371332 | -0.006307 | -0.017642 | 0.008625 | patient_101 | ctrl |
1 | ctrl&1015 | 0.311543 | -0.010989 | -0.017098 | 0.009555 | patient_1015 | ctrl |
2 | ctrl&1016 | 0.312934 | -0.018745 | -0.015433 | 0.009975 | patient_1016 | ctrl |
3 | ctrl&1039 | 0.276368 | 0.194729 | -0.015888 | 0.010346 | patient_1039 | ctrl |
4 | ctrl&107 | 0.359088 | -0.003679 | -0.014282 | 0.008719 | patient_107 | ctrl |
Let’s check if any of the factors are associated with the sample condition:
[17]:
# we use a paired t-test as the samples are paired
from scipy.stats import ttest_rel
[18]:
# split in control and stimulated
group1 = factor_scores[factor_scores['condition']=='ctrl']
group2 = factor_scores[factor_scores['condition']=='stim']
# get all columns that contain factor & loop
factors = [col for col in factor_scores.columns if 'Factor' in col]
for factor in factors:
print(ttest_rel(group1[factor], group2[factor]))
TtestResult(statistic=29.575916427002756, pvalue=1.3014800352111366e-08, df=7)
TtestResult(statistic=1.0332699913631296, pvalue=0.3358493202415181, df=7)
TtestResult(statistic=-1.00167320005426, pvalue=0.34986209595328777, df=7)
TtestResult(statistic=1.087734179209837, pvalue=0.3127402330324311, df=7)
We can see that the first factor is associated with the sample condition, let’s plot the factor scores:
[19]:
# scatterplot
(p9.ggplot(factor_scores) +
p9.aes(x='condition', colour='condition', y='Factor1') +
p9.geom_violin() +
p9.geom_jitter(size=4, width=0.2) +
p9.theme_bw(base_size=16) +
p9.theme(figure_size=(5, 4)) +
p9.scale_colour_manual(values=['#1f77b4', '#c20019']) +
p9.labs(x='Condition', y='Factor 1')
)
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[19]:
<Figure Size: (500 x 400)>
Explore Ligand-Receptor loadings¶
Now that we have identified a factor that is associated with the sample condition, we can check the ligand-receptor loadings with the highest loadings:
[20]:
variable_loadings = li.ut.get_variable_loadings(mdata,
varm_key='LFs',
view_sep=':',
pair_sep="&",
variable_sep="^") # get loadings for factor 1
variable_loadings.head()
[20]:
ligand_complex | receptor_complex | source | target | Factor1 | Factor2 | Factor3 | Factor4 | |
---|---|---|---|---|---|---|---|---|
1082 | HLA-DRA | LAG3 | DCs | CD8T | -3.289048 | 7.110284 | 0.045253 | 0.147521 |
174 | CCL8 | CCR1 | CD14 | CD14 | -2.320195 | -0.074425 | 0.009899 | 0.001225 |
236 | TIMP1 | CD63 | CD14 | CD14 | 2.289073 | 0.416292 | -0.003782 | 0.002598 |
1083 | HLA-DRB1 | LAG3 | DCs | CD8T | -2.063325 | 6.271296 | -0.029832 | 0.031709 |
166 | CCL2 | CCR1 | CD14 | CD14 | -2.009148 | -0.065779 | 0.000148 | -0.090824 |
[21]:
# here we will just assign the size of the dots, but this can be replace by any other statistic
variable_loadings['size'] = 4.5
[22]:
my_plot = li.pl.dotplot(liana_res = variable_loadings,
size='size',
colour='Factor1',
orderby='Factor1',
top_n=15,
source_labels=['NK', 'B', 'CD4T', 'CD8T', 'CD14'],
orderby_ascending=False,
size_range=(0.1, 5),
figure_size=(8, 5)
)
# change colour, with mid as white
my_plot + p9.scale_color_gradient2(low='#1f77b4', mid='lightgray', high='#c20019')
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[22]:
<Figure Size: (800 x 500)>
Here, we can see that certain interactions from Factor 1 have high positive loadings. These are interactions that are associated with the samples with high factor scores (i.e. the stimulated samples with high scores in Factor 1).
Explore the model¶
Finally, we can also explore the MOFA model itself and we will specifically check the variance explained by each pair of cell types.
[23]:
model = mofa.mofa_model("models/mofatalk.h5ad")
model
[23]:
MOFA+ model: mofatalk.h5ad
Samples (cells): 16
Features: 1755
Groups: group1 (16)
Views: B&CD14 (40), B&CD8T (36), B&DCs (50), B&FGR3 (36), B&NK (26), CD14&CD14 (81), CD14&CD4T (31), CD14&CD8T (50), CD14&DCs (86), CD14&FGR3 (79), CD14&NK (41), CD4T&CD14 (32), CD4T&CD8T (27), CD4T&DCs (35), CD4T&FGR3 (37), CD8T&CD14 (33), CD8T&CD8T (29), CD8T&DCs (38), CD8T&FGR3 (35), DCs&CD14 (79), DCs&CD4T (37), DCs&CD8T (51), DCs&DCs (89), DCs&FGR3 (82), DCs&NK (43), FGR3&CD14 (75), FGR3&CD4T (33), FGR3&CD8T (46), FGR3&DCs (87), FGR3&FGR3 (79), FGR3&NK (39), NK&CD14 (45), NK&CD8T (33), NK&DCs (46), NK&FGR3 (43), NK&NK (26)
Factors: 4
Expectations: W, Z
[24]:
# get variance explained by view and factor
rsq = model.get_r2()
factor1_rsq = rsq[rsq['Factor']=='Factor1']
# separate view column
factor1_rsq[['source', 'target']] = factor1_rsq['View'].str.split(pat='&', n=1, expand=True)
[25]:
(p9.ggplot(factor1_rsq.reset_index()) +
p9.aes(x='target', y='source') +
p9.geom_tile(p9.aes(fill='R2')) +
p9.scale_fill_gradient2(low='white', high='#c20019') +
p9.theme_bw(base_size=16) +
p9.theme(figure_size=(5, 4)) +
p9.labs(x='Target', y='Source', fill='R²')
)
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[25]:
<Figure Size: (500 x 400)>
Here, we can see that views that include CD14+ Monocytes have the highest variance explained both as source and as target of intercellular communication events. In particular, we see that putative autocrine interactions that occur between CD14+ Monocytes are highly explained by Factor 1.
Pathway enrichment¶
Let’s also perform an enrichment analysis on the ligand-receptor interactions that are associated with the factor of interest. We will use decoupler with pathway genesets from PROGENy to look for enrichments across the cell type pairs (views).
[26]:
# load PROGENy pathways
net = dc.get_progeny(organism='human', top=5000)
# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Downloading annotations for all proteins from the following resources: `['PROGENy']`
[27]:
# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, net, lr_sep="^")
lr_progeny.head()
[27]:
source | interaction | weight | |
---|---|---|---|
60 | JAK-STAT | LGALS9^PTPRC | 1.307807 |
1568 | Androgen | SEMA4D^MET | -0.831693 |
1960 | Androgen | HGF^MET | -1.288956 |
2352 | Androgen | TIMP3^MET | -1.122612 |
3030 | NFkB | SELE^CD44 | 3.332552 |
[28]:
lr_loadings = li.ut.get_variable_loadings(mdata,
varm_key='LFs',
view_sep=':',
)
lr_loadings.set_index('variable', inplace=True)
# pivot views to wide
lr_loadings = lr_loadings.pivot(columns='view', values='Factor1')
# replace NaN with 0
lr_loadings.replace(np.nan, 0, inplace=True)
lr_loadings.head()
[28]:
view | B&CD14 | B&CD8T | B&DCs | B&FGR3 | B&NK | CD14&CD14 | CD14&CD4T | CD14&CD8T | CD14&DCs | CD14&FGR3 | ... | FGR3&CD4T | FGR3&CD8T | FGR3&DCs | FGR3&FGR3 | FGR3&NK | NK&CD14 | NK&CD8T | NK&DCs | NK&FGR3 | NK&NK |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
variable | |||||||||||||||||||||
ADAM10^AXL | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.000000 | 0.00000 | -0.081331 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
ADAM10^CD44 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.116787 | -0.02685 | -0.038585 | 0.007103 | 0.001729 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
ADAM17^RHBDF2 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.000000 | 0.00000 | -0.147594 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
ADAM28^ITGA4 | 0.0 | 0.0 | 0.0 | -0.290671 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
ADM^CALCRL | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.257759 | 0.0 | ... | 0.000000 | 0.00000 | -0.035830 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 36 columns
[29]:
# run pathway enrichment analysis
estimate, pvals = dc.run_mlm(lr_loadings.transpose(), lr_progeny,
source="source", target="interaction",
use_raw=False, min_n=5)
# pivot columns to long
estimate = (estimate.
melt(ignore_index=False, value_name='estimate', var_name='pathway').
reset_index().
rename(columns={'index':'view'})
)
[30]:
## p9 tile plot
(p9.ggplot(estimate) +
p9.aes(x='pathway', y='view') +
p9.geom_tile(p9.aes(fill='estimate')) +
p9.scale_fill_gradient2(low='#1f77b4', high='#c20019') +
p9.theme_bw(base_size=14) +
p9.theme(figure_size=(8, 8))
)
Fontsize 0.00 < 1.0 pt not allowed by FreeType. Setting fontsize = 1 pt

[30]:
<Figure Size: (800 x 800)>
Some of these interactions are not expressed in the data, so we use the lr_fill
parameter to fill them with 0s. This is a rather arbitrary choice, but it is a simple way to deal with missing values. Additionally, since we the views (cell type pairs) are rather sparse, it’s possible that the enrichment analysis will not be very informative. Be sure to check the results carefully.
Outlook & Further Analysis¶
This tutorial is just a short introduction of the use of MOFA, we thus refer the users to the available MOFA & muon tutorials for more applications & details.
Similary, consider citing both muon & MOFA+ if you use them in your work!
[31]:
model.close()
[ ]: