Prior Knowledge
LIANA+ typically relies heavily on prior knowledge to infer intercellular communication and the intracellular signaling pathways that are activated in response to communication. This notebook provides a brief overview of the prior knowledge typically used by LIANA+.
[1]:
import liana as li
import omnipath as op
import decoupler as dc
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Ligand-Receptor Interactions
In the simplest case, for reproducibility purposes, LIANA+ provides a frozen set of interactions across resources. These are accessible through the select_resource
function in the resource
module. The resources that are currently supported are:
[2]:
li.resource.show_resources()
[2]:
['baccin2019',
'cellcall',
'cellchatdb',
'cellinker',
'cellphonedb',
'celltalkdb',
'connectomedb2020',
'consensus',
'embrace',
'guide2pharma',
'hpmr',
'icellnet',
'italk',
'kirouac2010',
'lrdb',
'mouseconsensus',
'ramilowski2015']
By default, liana
uses the consensus
resource, which is composed by multiple expert-curated ligand-receptor resources, including CellPhoneDB, CellChat, ICELLNET, connectomeDB2020, and CellTalkDB.
[3]:
li.resource.select_resource().head()
[3]:
ligand | receptor | |
---|---|---|
0 | LGALS9 | PTPRC |
1 | LGALS9 | MET |
2 | LGALS9 | CD44 |
3 | LGALS9 | LRP1 |
4 | LGALS9 | CD47 |
All of these resources were pre-generated using the OmniPath meta-database. Though any custom resource can also be passed, including those provided by the user or generated using the omnipath
client package.
Via this client, in addition to ligand-receptor interactions, users can obtain the PubMed IDs of the references (references
) that were used support each interaction, as well as the database that reported the interaction in the first place.
Users can also modify the resource according to their preferences, for example:
[4]:
ligrec = op.interactions.import_intercell_network(
interactions_params = {'license':'commercial'},
transmitter_params = {'database':'CellChatDB'},
receiver_params = {'database':'CellChatDB'},
)
ligrec.head()
ligrec = ligrec.rename(columns={'genesymbol_intercell_source':'ligand', 'genesymbol_intercell_target':'receptor'})
ligrec = ligrec[['ligand', 'receptor', 'references'] + [col for col in ligrec.columns if col not in ['ligand', 'receptor', 'references']]]
ligrec.head()
[4]:
ligand | receptor | references | source | target | is_stimulation | is_inhibition | consensus_direction | consensus_stimulation | consensus_inhibition | ... | aspect_intercell_target | category_source_intercell_target | uniprot_intercell_target | entity_type_intercell_target | consensus_score_intercell_target | transmitter_intercell_target | receiver_intercell_target | secreted_intercell_target | plasma_membrane_transmembrane_intercell_target | plasma_membrane_peripheral_intercell_target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LCK | CTLA4 | ProtMapper:9973379;SIGNOR:9973379;SPIKE:939833... | P06239 | P16410 | True | True | True | False | True | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
1 | CD86 | CTLA4 | BioGRID:11279501;CellChatDB:23954143;CellTalkD... | P42081 | P16410 | True | False | True | True | False | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
2 | CD80 | CTLA4 | BioGRID:11279502;CellChatDB:23954143;ICELLNET:... | P33681 | P16410 | True | False | True | True | False | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
3 | ICOSLG | CTLA4 | ICELLNET:21530327;connectomeDB2020:21530327 | O75144 | P16410 | False | False | False | False | False | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
4 | LCK | CD8A | NetPath:8814252;SPIKE:16818755;SPIKE_LC:16818755 | P06239 | P01732 | False | False | False | False | False | ... | functional | resource_specific | P01732 | protein | 6 | False | True | True | True | False |
5 rows × 45 columns
This function provides a rich list of annotations, such as the modes of action,inhibition or stimulation, the curation effort, types of signalling, etc. For a more comprehensive overview of the information that is available, please refer to the OmniPath documentation.
Annotating Ligand-Receptors
In addition to ligand-receptors, we can also obtain other annotations via OmniPath. While these can be tissue locations, TF regulons, cytokine signatures, or other types of annotations, the most common use case is to obtain the pathways that are associated with each ligand-receptor interaction.
Pathway Annotations
We use commonly PROGENy pathway weights to assign interactions to certain canonical pathways, such that all members of the interactions (i.e. incl. complex subunits) are present in the same pathway with the same weight sign. This is done to ensure that the interaction is not only present in the same pathway, but also that it is likely to be active in the same direction.
[5]:
# load PROGENy pathways, we use decoupler as a proxy as it formats the data in a more convenient way
progeny = dc.get_progeny(top=2500)
progeny.head()
Downloading annotations for all proteins from the following resources: `['PROGENy']`
[5]:
source | target | weight | p_value | |
---|---|---|---|---|
0 | Androgen | TMPRSS2 | 11.490631 | 0.000000e+00 |
1 | Androgen | NKX3-1 | 10.622551 | 2.242078e-44 |
2 | Androgen | MBOAT2 | 10.472733 | 4.624285e-44 |
3 | Androgen | KLK2 | 10.176186 | 1.944414e-40 |
4 | Androgen | SARG | 11.386852 | 2.790209e-40 |
[6]:
# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')
Then we use the generate_lr_geneset
function from liana to assign the interactions to pathways. This function takes the ligand-receptor interactions and the pathway annotations, and returns a dataframe with annotated interactions.
[7]:
# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, progeny, lr_sep="^")
lr_progeny.head()
[7]:
source | interaction | weight | |
---|---|---|---|
1960 | Androgen | HGF^MET | -1.288956 |
3030 | NFkB | SELE^CD44 | 3.332552 |
3075 | TNFa | SELE^CD44 | 3.322682 |
4251 | TNFa | FN1^CD44 | 2.590177 |
6950 | NFkB | LAMB3^CD44 | 4.055408 |
We can additionally performed enrichment analysis of certain ligand-receptor scores using this newly-generated dataframe. For example, see the application with Tensor-cell2cell
Disease Annotations
As another example, we can also annotate ligand-receptors to diseases in which both the ligand and the receptor are involved.
[8]:
diseases = op.requests.Annotations.get(
resources = ['DisGeNet']
)
Downloading annotations for all proteins from the following resources: `['DisGeNet']`
[9]:
diseases = diseases[['genesymbol', 'label', 'value']]
diseases = diseases.pivot_table(index='genesymbol',
columns='label', values='value',
aggfunc=lambda x: '; '.join(x)).reset_index()
diseases = diseases[['genesymbol', 'disease']]
diseases['disease'] = diseases['disease'].str.split('; ')
diseases = diseases.explode('disease')
lr_diseases = li.rs.generate_lr_geneset(lr_pairs, diseases, source='disease', target='genesymbol', weight=None, lr_sep="^")
lr_diseases.sort_values("interaction").head()
[9]:
disease | interaction | |
---|---|---|
693653 | Hypertensive disease | ACE^AGTR2 |
693926 | Malignant neoplasm of stomach | ACE^AGTR2 |
693991 | Neoplasm Metastasis | ACE^AGTR2 |
694293 | Stomach Neoplasms | ACE^AGTR2 |
693759 | Left Ventricular Hypertrophy | ACE^AGTR2 |
Let’s check some protein of interest:
[10]:
lr_diseases[lr_diseases['interaction'].str.contains('SPP1')]
[10]:
disease | interaction | |
---|---|---|
31124 | Acute Kidney Insufficiency | SPP1^CD44 |
31159 | Acute kidney injury | SPP1^CD44 |
32630 | Kidney Failure, Acute | SPP1^CD44 |
33038 | Mammary Neoplasms, Experimental | SPP1^CD44 |
33163 | Neoplasm Metastasis | SPP1^CD44 |
464305 | Cerebral Hemorrhage | SPP1^ITGAV_ITGB3 |
1108109 | Mammary Neoplasms, Experimental | SPP1^S1PR1 |
Following similar procedures, one may annotate ligand-receptors to any of the annotations available via OmniPath.
See op.requests.Annotations.resources()
Intracellular Signaling
While we can obtain the pathways that are associated with each ligand-receptor interaction, we can also obtain the intracellular signaling pathways that are activated in response to the interaction. This is again done using the omnipath
client package, but this time in combination with decoupler, which enables the enrichment of pathways, transcription factors, and other annotations.
One specific scenario, heavily reliant on OmniPath knowledge and enrichment analysis with decoupler is presented in the Differential Analysis Vignette.
There, to find putative causal networks between deregulated CCC interactions and transcription factors (TFs) we use:
1) a protein-protein interaction network
[11]:
ppis = op.interactions.OmniPath().get(genesymbols = True)
ppis.head()
[11]:
source | target | source_genesymbol | target_genesymbol | is_directed | is_stimulation | is_inhibition | consensus_direction | consensus_stimulation | consensus_inhibition | curation_effort | references | sources | n_sources | n_primary_sources | n_references | references_stripped | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P0DP23 | P48995 | CALM1 | TRPC1 | True | False | True | True | False | True | 3 | TRIP:11290752;TRIP:11983166;TRIP:12601176 | TRIP | 1 | 1 | 3 | 11290752;11983166;12601176 |
1 | P0DP25 | P48995 | CALM3 | TRPC1 | True | False | True | True | False | True | 3 | TRIP:11290752;TRIP:11983166;TRIP:12601176 | TRIP | 1 | 1 | 3 | 11290752;11983166;12601176 |
2 | P0DP24 | P48995 | CALM2 | TRPC1 | True | False | True | True | False | True | 3 | TRIP:11290752;TRIP:11983166;TRIP:12601176 | TRIP | 1 | 1 | 3 | 11290752;11983166;12601176 |
3 | Q03135 | P48995 | CAV1 | TRPC1 | True | True | False | True | True | False | 13 | DIP:19897728;HPRD:12732636;IntAct:19897728;Lit... | DIP;HPRD;IntAct;Lit-BM-17;TRIP | 5 | 5 | 8 | 10980191;12732636;14551243;16822931;18430726;1... |
4 | P14416 | P48995 | DRD2 | TRPC1 | True | True | False | True | True | False | 1 | TRIP:18261457 | TRIP | 1 | 1 | 1 | 18261457 |
2) Transcription Factor Regulons
Provided via the CollecTRI resource:
[12]:
dc.get_collectri().head()
[12]:
source | target | weight | PMID | |
---|---|---|---|---|
0 | MYC | TERT | 1 | 10022128;10491298;10606235;10637317;10723141;1... |
1 | SPI1 | BGLAP | 1 | 10022617 |
2 | SMAD3 | JUN | 1 | 10022869;12374795 |
3 | SMAD4 | JUN | 1 | 10022869;12374795 |
4 | STAT5A | IL2 | 1 | 10022878;11435608;17182565;17911616;22854263;2... |
These are then linked using the a modification of the ILP problem proposed in CARNIVAL, solved using CORNETO - a Unified Omics-Driven Framework for Network Inference.
Metabolite-Receptor Interactions
Via LIANA+ we also provide access to the MetalinksDB knowledge graph - a customisable database of metabolite-receptor interactions, part of the BioCypher ecosystem. For more information please refer to Farr et al, 2023.
Specifically, to enable light-weight access, we have converted the MetalinksDB knowledge graph into a database.
This database is queried using sqllite3
and we provide basic queries to customize according to the user’s needs - e.g. disease, pathway, location.
We can check first the values within different tables of the database:
[13]:
li.resource.get_metalinks_values(table_name='disease', column_name='disease')[0:5]
[13]:
['Diabetes mellitus type 2',
'Obesity',
'Pancreatic cancer',
'Colorectal cancer',
'Schizophrenia']
Then we can obtain metabolite-receptor interactions, the metabolites of which have been reported to be associated with certain locations or diseases:
[14]:
li.resource.get_metalinks(source=['Stich', 'CellPhoneDB', 'NeuronChat'],
tissue_location='Brain',
biospecimen_location='Cerebrospinal Fluid (CSF)',
disease='Schizophrenia',
).head()
[14]:
metabolite | hmdb | uniprot | gene_symbol | |
---|---|---|---|---|
0 | Dopamine | HMDB0000073 | A5X5Y0 | HTR3E |
1 | Dopamine | HMDB0000073 | O95264 | HTR3B |
2 | Dopamine | HMDB0000073 | P08908 | HTR1A |
3 | Dopamine | HMDB0000073 | P14416 | DRD2 |
4 | Dopamine | HMDB0000073 | P21728 | DRD1 |
This database can further be filtered according to the user’s needs, and can be queried as any other standard RDBMS.
For such cases, we also provide a utility function to print the database schema:
[15]:
li.rs.describe_metalinks()
Schema of table: metabolites
============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: metabolite, Type: TEXT, Primary Key: 0
Column ID: 2, Name: pubchem, Type: TEXT, Primary Key: 0
Column ID: 3, Name: metabolite_subclass, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: proteins
=========================
Column ID: 0, Name: uniprot, Type: TEXT, Primary Key: 1
Column ID: 1, Name: gene_symbol, Type: TEXT, Primary Key: 0
Column ID: 2, Name: protein_type, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: edges
======================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: uniprot, Type: TEXT, Primary Key: 2
Column ID: 2, Name: db_score, Type: REAL, Primary Key: 0
Column ID: 3, Name: experiment_score, Type: REAL, Primary Key: 0
Column ID: 4, Name: combined_score, Type: REAL, Primary Key: 0
Column ID: 5, Name: interaction_mode, Type: TEXT, Primary Key: 0
Column ID: 6, Name: mor, Type: INTEGER, Primary Key: 0
Foreign Keys:
ID: 0, Seq: 0, Table: proteins, From: uniprot, To: uniprot
ID: 1, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: source
=======================
Column ID: 0, Name: hmdb, Type: VARCHAR(255), Primary Key: 1
Column ID: 1, Name: uniprot, Type: VARCHAR(255), Primary Key: 2
Column ID: 2, Name: source, Type: VARCHAR(255), Primary Key: 3
Foreign Keys:
ID: 0, Seq: 0, Table: edges, From: hmdb, To: hmdb
ID: 0, Seq: 1, Table: edges, From: uniprot, To: uniprot
----------------------------------------
Schema of table: cell_location
==============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: cell_location, Type: TEXT, Primary Key: 2
Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: tissue_location
================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: tissue_location, Type: TEXT, Primary Key: 2
Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: biospecimen_location
=====================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: biospecimen_location, Type: TEXT, Primary Key: 2
Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: disease
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: disease, Type: TEXT, Primary Key: 2
Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------
Schema of table: pathway
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 1
Column ID: 1, Name: pathway, Type: TEXT, Primary Key: 2
Foreign Keys:
ID: 0, Seq: 0, Table: metabolites, From: hmdb, To: hmdb
----------------------------------------