Prior Knowledge

LIANA+ typically relies heavily on prior knowledge to infer intercellular communication and the intracellular signaling pathways that are activated in response to communication. This notebook provides a brief overview of the prior knowledge typically used by LIANA+.

[1]:
import liana as li
import omnipath as op
import decoupler as dc
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`

Ligand-Receptor Interactions

In the simplest case, for reproducibility purposes, LIANA+ provides a frozen set of interactions across resources. These are accessible through the select_resource function in the resource module. The resources that are currently supported are:

[2]:
li.resource.show_resources()
[2]:
['baccin2019',
 'cellcall',
 'cellchatdb',
 'cellinker',
 'cellphonedb',
 'celltalkdb',
 'connectomedb2020',
 'consensus',
 'embrace',
 'guide2pharma',
 'hpmr',
 'icellnet',
 'italk',
 'kirouac2010',
 'lrdb',
 'mouseconsensus',
 'ramilowski2015']

By default, liana uses the consensus resource, which is composed by multiple expert-curated ligand-receptor resources, including CellPhoneDB, CellChat, ICELLNET, connectomeDB2020, and CellTalkDB.

[3]:
li.resource.select_resource().head()
[3]:
ligand receptor
0 LGALS9 PTPRC
1 LGALS9 MET
2 LGALS9 CD44
3 LGALS9 LRP1
4 LGALS9 CD47

All of these resources were pre-generated using the OmniPath meta-database. Though any custom resource can also be passed, including those provided by the user or generated using the omnipath client package.

Via this client, in addition to ligand-receptor interactions, users can obtain the PubMed IDs of the references (references) that were used support each interaction, as well as the database that reported the interaction in the first place.

Users can also modify the resource according to their preferences, for example:

[4]:
ligrec = op.interactions.import_intercell_network(
    interactions_params = {'license':'commercial'},
    transmitter_params = {'database':'CellChatDB'},
    receiver_params = {'database':'CellChatDB'},
    )
ligrec.head()

ligrec = ligrec.rename(columns={'genesymbol_intercell_source':'ligand', 'genesymbol_intercell_target':'receptor'})
ligrec = ligrec[['ligand', 'receptor', 'references'] + [col for col in ligrec.columns if col not in ['ligand', 'receptor', 'references']]]
ligrec.head()
[4]:
ligand receptor references source target is_stimulation is_inhibition consensus_direction consensus_stimulation consensus_inhibition ... aspect_intercell_target category_source_intercell_target uniprot_intercell_target entity_type_intercell_target consensus_score_intercell_target transmitter_intercell_target receiver_intercell_target secreted_intercell_target plasma_membrane_transmembrane_intercell_target plasma_membrane_peripheral_intercell_target
0 LCK CTLA4 ProtMapper:9973379;SIGNOR:9973379;SPIKE:939833... P06239 P16410 True True True False True ... functional resource_specific P16410 protein 8 False True False True False
1 CD86 CTLA4 BioGRID:11279501;CellChatDB:23954143;CellTalkD... P42081 P16410 True False True True False ... functional resource_specific P16410 protein 8 False True False True False
2 CD80 CTLA4 BioGRID:11279502;CellChatDB:23954143;ICELLNET:... P33681 P16410 True False True True False ... functional resource_specific P16410 protein 8 False True False True False
3 ICOSLG CTLA4 ICELLNET:21530327;connectomeDB2020:21530327 O75144 P16410 False False False False False ... functional resource_specific P16410 protein 8 False True False True False
4 LCK CD8A NetPath:8814252;SPIKE:16818755;SPIKE_LC:16818755 P06239 P01732 False False False False False ... functional resource_specific P01732 protein 6 False True True True False

5 rows × 45 columns

This function provides a rich list of annotations, such as the modes of action,inhibition or stimulation, the curation effort, types of signalling, etc. For a more comprehensive overview of the information that is available, please refer to the OmniPath documentation.

Annotating Ligand-Receptors

In addition to ligand-receptors, we can also obtain other annotations via OmniPath. While these can be tissue locations, TF regulons, cytokine signatures, or other types of annotations, the most common use case is to obtain the pathways that are associated with each ligand-receptor interaction.

Pathway Annotations

We use commonly PROGENy pathway weights to assign interactions to certain canonical pathways, such that all members of the interactions (i.e. incl. complex subunits) are present in the same pathway with the same weight sign. This is done to ensure that the interaction is not only present in the same pathway, but also that it is likely to be active in the same direction.

[5]:
# load PROGENy pathways, we use decoupler as a proxy as it formats the data in a more convenient way
progeny = dc.get_progeny(top=2500)
progeny.head()
Downloading annotations for all proteins from the following resources: `['PROGENy']`
[5]:
source target weight p_value
0 Androgen TMPRSS2 11.490631 0.000000e+00
1 Androgen NKX3-1 10.622551 2.242078e-44
2 Androgen MBOAT2 10.472733 4.624285e-44
3 Androgen KLK2 10.176186 1.944414e-40
4 Androgen SARG 11.386852 2.790209e-40
[6]:
# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')

Then we use the generate_lr_geneset function from liana to assign the interactions to pathways. This function takes the ligand-receptor interactions and the pathway annotations, and returns a dataframe with annotated interactions.

[7]:
# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, progeny, lr_sep="^")
lr_progeny.head()
[7]:
source interaction weight
1960 Androgen HGF^MET -1.288956
3030 NFkB SELE^CD44 3.332552
3075 TNFa SELE^CD44 3.322682
4251 TNFa FN1^CD44 2.590177
6950 NFkB LAMB3^CD44 4.055408

We can additionally performed enrichment analysis of certain ligand-receptor scores using this newly-generated dataframe. For example, see the application with Tensor-cell2cell

Disease Annotations

As another example, we can also annotate ligand-receptors to diseases in which both the ligand and the receptor are involved.

[8]:
diseases = op.requests.Annotations.get(
    resources = ['DisGeNet']
    )
Downloading annotations for all proteins from the following resources: `['DisGeNet']`
[9]:
diseases = diseases[['genesymbol', 'label', 'value']]
diseases = diseases.pivot_table(index='genesymbol',
                                columns='label', values='value',
                                aggfunc=lambda x: '; '.join(x)).reset_index()
diseases = diseases[['genesymbol', 'disease']]
diseases['disease'] = diseases['disease'].str.split('; ')
diseases = diseases.explode('disease')
lr_diseases = li.rs.generate_lr_geneset(lr_pairs, diseases, source='disease', target='genesymbol', weight=None, lr_sep="^")
lr_diseases.sort_values("interaction").head()
[9]:
disease interaction
693653 Hypertensive disease ACE^AGTR2
693926 Malignant neoplasm of stomach ACE^AGTR2
693991 Neoplasm Metastasis ACE^AGTR2
694293 Stomach Neoplasms ACE^AGTR2
693759 Left Ventricular Hypertrophy ACE^AGTR2

Let’s check some protein of interest:

[10]:
lr_diseases[lr_diseases['interaction'].str.contains('SPP1')]
[10]:
disease interaction
31124 Acute Kidney Insufficiency SPP1^CD44
31159 Acute kidney injury SPP1^CD44
32630 Kidney Failure, Acute SPP1^CD44
33038 Mammary Neoplasms, Experimental SPP1^CD44
33163 Neoplasm Metastasis SPP1^CD44
464305 Cerebral Hemorrhage SPP1^ITGAV_ITGB3
1108109 Mammary Neoplasms, Experimental SPP1^S1PR1

Following similar procedures, one may annotate ligand-receptors to any of the annotations available via OmniPath.

See op.requests.Annotations.resources()

Intracellular Signaling

While we can obtain the pathways that are associated with each ligand-receptor interaction, we can also obtain the intracellular signaling pathways that are activated in response to the interaction. This is again done using the omnipath client package, but this time in combination with decoupler, which enables the enrichment of pathways, transcription factors, and other annotations.

One specific scenario, heavily reliant on OmniPath knowledge and enrichment analysis with decoupler is presented in the Differential Analysis Vignette.

There, to find putative causal networks between deregulated CCC interactions and transcription factors (TFs) we use:

1) a protein-protein interaction network

[11]:
ppis = op.interactions.OmniPath().get(genesymbols = True)
ppis.head()
[11]:
source target source_genesymbol target_genesymbol is_directed is_stimulation is_inhibition consensus_direction consensus_stimulation consensus_inhibition curation_effort references sources n_sources n_primary_sources n_references references_stripped
0 P0DP23 P48995 CALM1 TRPC1 True False True True False True 3 TRIP:11290752;TRIP:11983166;TRIP:12601176 TRIP 1 1 3 11290752;11983166;12601176
1 P0DP25 P48995 CALM3 TRPC1 True False True True False True 3 TRIP:11290752;TRIP:11983166;TRIP:12601176 TRIP 1 1 3 11290752;11983166;12601176
2 P0DP24 P48995 CALM2 TRPC1 True False True True False True 3 TRIP:11290752;TRIP:11983166;TRIP:12601176 TRIP 1 1 3 11290752;11983166;12601176
3 Q03135 P48995 CAV1 TRPC1 True True False True True False 13 DIP:19897728;HPRD:12732636;IntAct:19897728;Lit... DIP;HPRD;IntAct;Lit-BM-17;TRIP 5 5 8 10980191;12732636;14551243;16822931;18430726;1...
4 P14416 P48995 DRD2 TRPC1 True True False True True False 1 TRIP:18261457 TRIP 1 1 1 18261457

2) Transcription Factor Regulons

Provided via the CollecTRI resource:

[12]:
dc.get_collectri().head()
[12]:
source target weight PMID
0 MYC TERT 1 10022128;10491298;10606235;10637317;10723141;1...
1 SPI1 BGLAP 1 10022617
2 SMAD3 JUN 1 10022869;12374795
3 SMAD4 JUN 1 10022869;12374795
4 STAT5A IL2 1 10022878;11435608;17182565;17911616;22854263;2...

These are then linked using the a modification of the ILP problem proposed in CARNIVAL, solved using CORNETO - a Unified Omics-Driven Framework for Network Inference.

Metabolite-Receptor Interactions

Via LIANA+ we also provide access to the MetalinksDB knowledge graph - a customisable database of metabolite-receptor interactions, part of the BioCypher ecosystem. For more information please refer to Farr et al, 2023.

Specifically, to enable light-weight access, we have converted the MetalinksDB knowledge graph into a database.

This database is queried using sqllite3 and we provide basic queries to customize according to the user’s needs - e.g. disease, pathway, location.

We can check first the values within different tables of the database:

[13]:
li.resource.get_metalinks_values(table_name='disease', column_name='disease')[0:5]
[13]:
['Diabetes mellitus type 2',
 'Obesity',
 'Pancreatic cancer',
 'Colorectal cancer',
 'Schizophrenia']

Then we can obtain metabolite-receptor interactions, the metabolites of which have been reported to be associated with certain locations or diseases:

[14]:
li.resource.get_metalinks(source=['Stich', 'CellPhoneDB', 'NeuronChat'],
                          tissue_location='Brain',
                          biospecimen_location='Cerebrospinal Fluid (CSF)',
                          disease='Schizophrenia',
                          ).head()
[14]:
metabolite hmdb uniprot gene_symbol
0 Dopamine HMDB0000073 A5X5Y0 HTR3E
1 Dopamine HMDB0000073 O95264 HTR3B
2 Dopamine HMDB0000073 P08908 HTR1A
3 Dopamine HMDB0000073 P14416 DRD2
4 Dopamine HMDB0000073 P21728 DRD1

This database can further be filtered according to the user’s needs, and can be queried as any other standard RDBMS.

For such cases, we also provide a utility function to print the database schema:

[15]:
li.rs.describe_metalinks()
Schema of table: metabolites
============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: metabolite, Type: TEXT, Primary Key: 0
Column ID: 2, Name: pubchem, Type: TEXT, Primary Key: 0
Column ID: 3, Name: metabolite_subclass, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: proteins
=========================
Column ID: 0, Name: uniprot, Type: TEXT, Primary Key: 1
Column ID: 1, Name: gene_symbol, Type: TEXT, Primary Key: 0
Column ID: 2, Name: protein_type, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: edges
======================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: uniprot, Type: TEXT, Primary Key: 2
Column ID: 2, Name: db_score, Type: REAL, Primary Key: 0
Column ID: 3, Name: experiment_score, Type: REAL, Primary Key: 0
Column ID: 4, Name: combined_score, Type: REAL, Primary Key: 0
Column ID: 5, Name: interaction_mode, Type: TEXT, Primary Key: 0
Column ID: 6, Name: mor, Type: INTEGER, Primary Key: 0

Foreign Keys:
ID: 0, Seq: 0, Table: proteins, From: uniprot, To: uniprot
ID: 1, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: source
=======================
Column ID: 0, Name: hmdb, Type: VARCHAR(255), Primary Key: 1
Column ID: 1, Name: uniprot, Type: VARCHAR(255), Primary Key: 2
Column ID: 2, Name: source, Type: VARCHAR(255), Primary Key: 3

Foreign Keys:
ID: 0, Seq: 0, Table: edges, From: hmdb, To: hmdb
ID: 0, Seq: 1, Table: edges, From: uniprot, To: uniprot
----------------------------------------
Schema of table: cell_location
==============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: cell_location, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: tissue_location
================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: tissue_location, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: biospecimen_location
=====================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: biospecimen_location, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: disease
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: disease, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: pathway
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: pathway, Type: TEXT, Primary Key: 2

Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------