Prior Knowledge

LIANA+ typically relies heavily on prior knowledge to infer intercellular communication and the intracellular signaling pathways that are activated in response to communication. This notebook provides a brief overview of the prior knowledge typically used by LIANA+.

[1]:

import liana as li
import omnipath as op
import decoupler as dc

/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`

Ligand-Receptor Interactions

In the simplest case, for reproducibility purposes, LIANA+ provides a frozen set of interactions across resources. These are accessible through the select_resource function in the resource module. The resources that are currently supported are:

[2]:

li.resource.show_resources()

[2]:

['baccin2019',
 'cellcall',
 'cellchatdb',
 'cellinker',
 'cellphonedb',
 'celltalkdb',
 'connectomedb2020',
 'consensus',
 'embrace',
 'guide2pharma',
 'hpmr',
 'icellnet',
 'italk',
 'kirouac2010',
 'lrdb',
 'mouseconsensus',
 'ramilowski2015']

By default, liana uses the consensus resource, which is composed by multiple expert-curated ligand-receptor resources, including CellPhoneDB, CellChat, ICELLNET, connectomeDB2020, and CellTalkDB.

[3]:

li.resource.select_resource().head()

[3]:

	ligand	receptor
0	LGALS9	PTPRC
1	LGALS9	MET
2	LGALS9	CD44
3	LGALS9	LRP1
4	LGALS9	CD47

All of these resources were pre-generated using the OmniPath meta-database. Though any custom resource can also be passed, including those provided by the user or generated using the omnipath client package.

Via this client, in addition to ligand-receptor interactions, users can obtain the PubMed IDs of the references (references) that were used support each interaction, as well as the database that reported the interaction in the first place.

Users can also modify the resource according to their preferences, for example:

[4]:

ligrec = op.interactions.import_intercell_network(
    interactions_params = {'license':'commercial'},
    transmitter_params = {'database':'CellChatDB'},
    receiver_params = {'database':'CellChatDB'},
    )
ligrec.head()

ligrec = ligrec.rename(columns={'genesymbol_intercell_source':'ligand', 'genesymbol_intercell_target':'receptor'})
ligrec = ligrec[['ligand', 'receptor', 'references'] + [col for col in ligrec.columns if col not in ['ligand', 'receptor', 'references']]]
ligrec.head()

[4]:

	ligand	receptor	references	source	target	is_stimulation	is_inhibition	consensus_direction	consensus_stimulation	consensus_inhibition	...	aspect_intercell_target	category_source_intercell_target	uniprot_intercell_target	entity_type_intercell_target	consensus_score_intercell_target	transmitter_intercell_target	receiver_intercell_target	secreted_intercell_target	plasma_membrane_transmembrane_intercell_target	plasma_membrane_peripheral_intercell_target
0	LCK	CTLA4	ProtMapper:9973379;SIGNOR:9973379;SPIKE:939833...	P06239	P16410	True	True	True	False	True	...	functional	resource_specific	P16410	protein	8	False	True	False	True	False
1	CD86	CTLA4	BioGRID:11279501;CellChatDB:23954143;CellTalkD...	P42081	P16410	True	False	True	True	False	...	functional	resource_specific	P16410	protein	8	False	True	False	True	False
2	CD80	CTLA4	BioGRID:11279502;CellChatDB:23954143;ICELLNET:...	P33681	P16410	True	False	True	True	False	...	functional	resource_specific	P16410	protein	8	False	True	False	True	False
3	ICOSLG	CTLA4	ICELLNET:21530327;connectomeDB2020:21530327	O75144	P16410	False	False	False	False	False	...	functional	resource_specific	P16410	protein	8	False	True	False	True	False
4	LCK	CD8A	NetPath:8814252;SPIKE:16818755;SPIKE_LC:16818755	P06239	P01732	False	False	False	False	False	...	functional	resource_specific	P01732	protein	6	False	True	True	True	False

5 rows × 45 columns

This function provides a rich list of annotations, such as the modes of action,inhibition or stimulation, the curation effort, types of signalling, etc. For a more comprehensive overview of the information that is available, please refer to the OmniPath documentation.

Annotating Ligand-Receptors

In addition to ligand-receptors, we can also obtain other annotations via OmniPath. While these can be tissue locations, TF regulons, cytokine signatures, or other types of annotations, the most common use case is to obtain the pathways that are associated with each ligand-receptor interaction.

Pathway Annotations

We use commonly PROGENy pathway weights to assign interactions to certain canonical pathways, such that all members of the interactions (i.e. incl. complex subunits) are present in the same pathway with the same weight sign. This is done to ensure that the interaction is not only present in the same pathway, but also that it is likely to be active in the same direction.

[5]:

# load PROGENy pathways, we use decoupler as a proxy as it formats the data in a more convenient way
progeny = dc.get_progeny(top=2500)
progeny.head()

Downloading annotations for all proteins from the following resources: `['PROGENy']`

[5]:

	source	target	weight	p_value
0	Androgen	TMPRSS2	11.490631	0.000000e+00
1	Androgen	NKX3-1	10.622551	2.242078e-44
2	Androgen	MBOAT2	10.472733	4.624285e-44
3	Androgen	KLK2	10.176186	1.944414e-40
4	Androgen	SARG	11.386852	2.790209e-40

[6]:

# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')

Then we use the generate_lr_geneset function from liana to assign the interactions to pathways. This function takes the ligand-receptor interactions and the pathway annotations, and returns a dataframe with annotated interactions.

[7]:

# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, progeny, lr_sep="^")
lr_progeny.head()

[7]:

	source	interaction	weight
1960	Androgen	HGF^MET	-1.288956
3030	NFkB	SELE^CD44	3.332552
3075	TNFa	SELE^CD44	3.322682
4251	TNFa	FN1^CD44	2.590177
6950	NFkB	LAMB3^CD44	4.055408

We can additionally performed enrichment analysis of certain ligand-receptor scores using this newly-generated dataframe. For example, see the application with Tensor-cell2cell

Disease Annotations

As another example, we can also annotate ligand-receptors to diseases in which both the ligand and the receptor are involved.

[8]:

diseases = op.requests.Annotations.get(
    resources = ['DisGeNet']
    )

Downloading annotations for all proteins from the following resources: `['DisGeNet']`

[9]:

diseases = diseases[['genesymbol', 'label', 'value']]
diseases = diseases.pivot_table(index='genesymbol',
                                columns='label', values='value',
                                aggfunc=lambda x: '; '.join(x)).reset_index()
diseases = diseases[['genesymbol', 'disease']]
diseases['disease'] = diseases['disease'].str.split('; ')
diseases = diseases.explode('disease')
lr_diseases = li.rs.generate_lr_geneset(lr_pairs, diseases, source='disease', target='genesymbol', weight=None, lr_sep="^")
lr_diseases.sort_values("interaction").head()

[9]:

	disease	interaction
693653	Hypertensive disease	ACE^AGTR2
693926	Malignant neoplasm of stomach	ACE^AGTR2
693991	Neoplasm Metastasis	ACE^AGTR2
694293	Stomach Neoplasms	ACE^AGTR2
693759	Left Ventricular Hypertrophy	ACE^AGTR2

Let’s check some protein of interest:

[10]:

lr_diseases[lr_diseases['interaction'].str.contains('SPP1')]

[10]:

	disease	interaction
31124	Acute Kidney Insufficiency	SPP1^CD44
31159	Acute kidney injury	SPP1^CD44
32630	Kidney Failure, Acute	SPP1^CD44
33038	Mammary Neoplasms, Experimental	SPP1^CD44
33163	Neoplasm Metastasis	SPP1^CD44
464305	Cerebral Hemorrhage	SPP1^ITGAV_ITGB3
1108109	Mammary Neoplasms, Experimental	SPP1^S1PR1

Following similar procedures, one may annotate ligand-receptors to any of the annotations available via OmniPath.

See op.requests.Annotations.resources()

Intracellular Signaling

While we can obtain the pathways that are associated with each ligand-receptor interaction, we can also obtain the intracellular signaling pathways that are activated in response to the interaction. This is again done using the omnipath client package, but this time in combination with decoupler, which enables the enrichment of pathways, transcription factors, and other annotations.

One specific scenario, heavily reliant on OmniPath knowledge and enrichment analysis with decoupler is presented in the Differential Analysis Vignette.

There, to find putative causal networks between deregulated CCC interactions and transcription factors (TFs) we use:

1) a protein-protein interaction network

[11]:

ppis = op.interactions.OmniPath().get(genesymbols = True)
ppis.head()

[11]:

	source	target	source_genesymbol	target_genesymbol	is_directed	is_stimulation	is_inhibition	consensus_direction	consensus_stimulation	consensus_inhibition	curation_effort	references	sources	n_sources	n_primary_sources	n_references	references_stripped
0	P0DP23	P48995	CALM1	TRPC1	True	False	True	True	False	True	3	TRIP:11290752;TRIP:11983166;TRIP:12601176	TRIP	1	1	3	11290752;11983166;12601176
1	P0DP25	P48995	CALM3	TRPC1	True	False	True	True	False	True	3	TRIP:11290752;TRIP:11983166;TRIP:12601176	TRIP	1	1	3	11290752;11983166;12601176
2	P0DP24	P48995	CALM2	TRPC1	True	False	True	True	False	True	3	TRIP:11290752;TRIP:11983166;TRIP:12601176	TRIP	1	1	3	11290752;11983166;12601176
3	Q03135	P48995	CAV1	TRPC1	True	True	False	True	True	False	13	DIP:19897728;HPRD:12732636;IntAct:19897728;Lit...	DIP;HPRD;IntAct;Lit-BM-17;TRIP	5	5	8	10980191;12732636;14551243;16822931;18430726;1...
4	P14416	P48995	DRD2	TRPC1	True	True	False	True	True	False	1	TRIP:18261457	TRIP	1	1	1	18261457

2) Transcription Factor Regulons

Provided via the CollecTRI resource:

[12]:

dc.get_collectri().head()

[12]:

	source	target	weight	PMID
0	MYC	TERT	1	10022128;10491298;10606235;10637317;10723141;1...
1	SPI1	BGLAP	1	10022617
2	SMAD3	JUN	1	10022869;12374795
3	SMAD4	JUN	1	10022869;12374795
4	STAT5A	IL2	1	10022878;11435608;17182565;17911616;22854263;2...

These are then linked using the a modification of the ILP problem proposed in CARNIVAL, solved using CORNETO - a Unified Omics-Driven Framework for Network Inference.

Metabolite-Receptor Interactions

Via LIANA+ we also provide access to the MetalinksDB knowledge graph - a customisable database of metabolite-receptor interactions, part of the BioCypher ecosystem. For more information please refer to Farr et al, 2023.

Specifically, to enable light-weight access, we have converted the MetalinksDB knowledge graph into a database.

This database is queried using sqllite3 and we provide basic queries to customize according to the user’s needs - e.g. disease, pathway, location.

We can check first the values within different tables of the database:

[13]:

li.resource.get_metalinks_values(table_name='disease', column_name='disease')[0:5]

[13]:

['Diabetes mellitus type 2',
 'Obesity',
 'Pancreatic cancer',
 'Colorectal cancer',
 'Schizophrenia']

Then we can obtain metabolite-receptor interactions, the metabolites of which have been reported to be associated with certain locations or diseases:

[14]:

li.resource.get_metalinks(source=['Stich', 'CellPhoneDB', 'NeuronChat'],
                          tissue_location='Brain',
                          biospecimen_location='Cerebrospinal Fluid (CSF)',
                          disease='Schizophrenia',
                          ).head()

[14]:

	metabolite	hmdb	uniprot	gene_symbol
0	Dopamine	HMDB0000073	A5X5Y0	HTR3E
1	Dopamine	HMDB0000073	O95264	HTR3B
2	Dopamine	HMDB0000073	P08908	HTR1A
3	Dopamine	HMDB0000073	P14416	DRD2
4	Dopamine	HMDB0000073	P21728	DRD1

This database can further be filtered according to the user’s needs, and can be queried as any other standard RDBMS.

For such cases, we also provide a utility function to print the database schema:

[15]:

li.rs.describe_metalinks()

Schema of table: metabolites
============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: metabolite, Type: TEXT, Primary Key: 0
Column ID: 2, Name: pubchem, Type: TEXT, Primary Key: 0
Column ID: 3, Name: metabolite_subclass, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: proteins
=========================
Column ID: 0, Name: uniprot, Type: TEXT, Primary Key: 1
Column ID: 1, Name: gene_symbol, Type: TEXT, Primary Key: 0
Column ID: 2, Name: protein_type, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: edges
======================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: uniprot, Type: TEXT, Primary Key: 2
Column ID: 2, Name: db_score, Type: REAL, Primary Key: 0
Column ID: 3, Name: experiment_score, Type: REAL, Primary Key: 0
Column ID: 4, Name: combined_score, Type: REAL, Primary Key: 0
Column ID: 5, Name: interaction_mode, Type: TEXT, Primary Key: 0
Column ID: 6, Name: mor, Type: INTEGER, Primary Key: 0

Foreign Keys:
ID: 0, Seq: 0, Table: proteins, From: uniprot, To: uniprot
ID: 1, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: source
=======================
Column ID: 0, Name: hmdb, Type: VARCHAR(255), Primary Key: 1
Column ID: 1, Name: uniprot, Type: VARCHAR(255), Primary Key: 2
Column ID: 2, Name: source, Type: VARCHAR(255), Primary Key: 3

Foreign Keys:
ID: 0, Seq: 0, Table: edges, From: hmdb, To: hmdb
ID: 0, Seq: 1, Table: edges, From: uniprot, To: uniprot
----------------------------------------
Schema of table: cell_location
==============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: cell_location, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: tissue_location
================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: tissue_location, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: biospecimen_location
=====================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: biospecimen_location, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: disease
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: disease, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: pathway
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: pathway, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------