Prior Knowledge¶
LIANA+ (typically) relies heavily on prior knowledge to infer intercellular communication and the intracellular signaling pathways that are activated in response to communication. This notebook provides a brief overview of the prior knowledge typically used by LIANA+.
[1]:
import liana as li
import omnipath as op
import decoupler as dc
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Ligand-Receptor Interactions¶
In the simplest case, for reproducibility purposes, LIANA+ provides a frozen set of interactions across resources. These are accessible through the select_resource
function in the resource
module. The resources that are currently supported are:
[2]:
li.resource.show_resources()
[2]:
['baccin2019',
'cellcall',
'cellchatdb',
'cellinker',
'cellphonedb',
'celltalkdb',
'connectomedb2020',
'consensus',
'embrace',
'guide2pharma',
'hpmr',
'icellnet',
'italk',
'kirouac2010',
'lrdb',
'mouseconsensus',
'ramilowski2015']
By default, liana
uses the consensus
resource, which is composed by multiple expert-curated ligand-receptor resources, including CellPhoneDB, CellChat, ICELLNET, connectomeDB2020, and CellTalkDB.
[3]:
resource = li.rs.select_resource('consensus')
resource.head()
[3]:
ligand | receptor | |
---|---|---|
0 | LGALS9 | PTPRC |
1 | LGALS9 | MET |
2 | LGALS9 | CD44 |
3 | LGALS9 | LRP1 |
4 | LGALS9 | CD47 |
All of the ligand-receptor resource in LIANA+ were pre-generated using the OmniPath meta-database. Though any custom resource can also be passed, including those provided by the user or generated using the omnipath
client package.
Via this client, in addition to ligand-receptor interactions, users can obtain the PubMed IDs of the references (references
) that were used support each interaction, as well as the database that reported the interaction in the first place.
Users can also modify the resource according to their preferences, for example:
[4]:
ligrec = op.interactions.import_intercell_network(
interactions_params = {'license':'commercial'},
transmitter_params = {'database':'CellChatDB'},
receiver_params = {'database':'CellChatDB'},
)
ligrec.head()
ligrec = ligrec.rename(columns={'genesymbol_intercell_source':'ligand', 'genesymbol_intercell_target':'receptor'})
ligrec = ligrec[['ligand', 'receptor', 'references'] + [col for col in ligrec.columns if col not in ['ligand', 'receptor', 'references']]]
ligrec.head()
[4]:
ligand | receptor | references | source | target | is_stimulation | is_inhibition | consensus_direction | consensus_stimulation | consensus_inhibition | ... | aspect_intercell_target | category_source_intercell_target | uniprot_intercell_target | entity_type_intercell_target | consensus_score_intercell_target | transmitter_intercell_target | receiver_intercell_target | secreted_intercell_target | plasma_membrane_transmembrane_intercell_target | plasma_membrane_peripheral_intercell_target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LCK | CTLA4 | ProtMapper:9973379;SIGNOR:9973379;SPIKE:939833... | P06239 | P16410 | True | True | True | False | True | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
1 | CD86 | CTLA4 | BioGRID:11279501;CellChatDB:23954143;CellTalkD... | P42081 | P16410 | True | False | True | True | False | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
2 | CD80 | CTLA4 | BioGRID:11279502;CellChatDB:23954143;ICELLNET:... | P33681 | P16410 | True | False | True | True | False | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
3 | ICOSLG | CTLA4 | ICELLNET:21530327;connectomeDB2020:21530327 | O75144 | P16410 | False | False | False | False | False | ... | functional | resource_specific | P16410 | protein | 8 | False | True | False | True | False |
4 | LCK | CD8A | NetPath:8814252;SPIKE:16818755;SPIKE_LC:16818755 | P06239 | P01732 | False | False | False | False | False | ... | functional | resource_specific | P01732 | protein | 6 | False | True | True | True | False |
5 rows × 45 columns
This function provides a rich list of annotations, such as the modes of action,inhibition or stimulation, the curation effort, types of signalling, etc. For a more comprehensive overview of the information that is available, please refer to the OmniPath documentation.
Homology Mapping¶
Similarly, LIANA+ provides on demand homology mapping beyond mouse symbols. It utilises the HCOP database to obtain homologous genes across species. Specifically, we download the resource from the frequently-updated Bulk Download FTP section of the HCOP database: https://ftp.ebi.ac.uk/pub/databases/genenames/hcop/.
The homology mapping is accessible through the resource
module:
[5]:
# let's say we are interested in zebrafish homologs of human genes
map_df = li.rs.get_hcop_orthologs(url='https://ftp.ebi.ac.uk/pub/databases/genenames/hcop/human_zebrafish_hcop_fifteen_column.txt.gz',
columns=['human_symbol', 'zebrafish_symbol'],
# NOTE: HCOP integrates multiple resource, so we can filter out mappings in at least 3 of them for confidence
min_evidence=3
)
# rename the columns to source and target, respectively for the original organism and the target organism
map_df = map_df.rename(columns={'human_symbol':'source', 'zebrafish_symbol':'target'})
map_df.tail()
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/resource/_orthology.py:199: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
[5]:
source | target | |
---|---|---|
132672 | ZYG11B | zyg11 |
132673 | ZYG11B | zyg11l |
132674 | ZYX | zyx |
132676 | ZZEF1 | zzef1 |
132677 | ZZZ3 | zzz3 |
Now that we’ve obtained the homologous genes, let’s convert the resource to those genes:
[6]:
zfish = li.rs.translate_resource(resource,
map_df=map_df,
columns=['ligand', 'receptor'],
replace=True,
# NOTE that we need to define the threshold of redundancies for the mapping
# in this case, we would keep mappings as long as they don't map to more than 2 zebrafish genes
one_to_many=3
)
Obtain Mouse Homologs¶
[7]:
map_df = li.rs.get_hcop_orthologs(url='https://ftp.ebi.ac.uk/pub/databases/genenames/hcop/human_mouse_hcop_fifteen_column.txt.gz',
columns=['human_symbol', 'mouse_symbol'],
# NOTE: HCOP integrates multiple resource, so we can filter out mappings in at least 3 of them for confidence
min_evidence=3
)
# rename the columns to source and target, respectively for the original organism and the target organism
map_df = map_df.rename(columns={'human_symbol':'source', 'mouse_symbol':'target'})
# We will then translate
mouse = li.rs.translate_resource(resource,
map_df=map_df,
columns=['ligand', 'receptor'],
replace=True,
# Here, we will be harsher and only keep mappings that don't map to more than 1 mouse gene
one_to_many=1
)
mouse
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/resource/_orthology.py:199: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
[7]:
ligand | receptor | |
---|---|---|
0 | Lgals9 | Ptprc |
1 | Lgals9 | Met |
2 | Lgals9 | Cd44 |
3 | Lgals9 | Lrp1 |
4 | Lgals9 | Cd47 |
... | ... | ... |
4619 | Bmp2 | Actr2 |
4620 | Bmp15 | Actr2 |
4621 | Csf1 | Csf3r |
4622 | Il36g | Ifnar1 |
4623 | Il36g | Ifnar2 |
4055 rows × 2 columns
If you use HCOP function, please reference the original HCOP papers:
Eyre, T.A., Wright, M.W., Lush, M.J. and Bruford, E.A., 2007. HCOP: a searchable database of human orthology predictions. Briefings in bioinformatics, 8(1), pp.2-5.
Yates, B., Gray, K.A., Jones, T.E. and Bruford, E.A., 2021. Updates to HCOP: the HGNC comparison of orthology predictions tool. Briefings in Bioinformatics, 22(6), p.bbab155.
All methods of LIANA+ accept a resource
parameter that can be used to pass any custom resource, beyond such from homology conversion.
Annotating Ligand-Receptors¶
In addition to ligand-receptors, we can also obtain other annotations via OmniPath. While these can be tissue locations, TF regulons, cytokine signatures, or other types of annotations, the most common use case is to obtain the pathways that are associated with each ligand-receptor interaction.
Pathway Annotations¶
We use commonly PROGENy pathway weights to assign interactions to certain canonical pathways, such that all members of the interactions (i.e. incl. complex subunits) are present in the same pathway with the same weight sign. This is done to ensure that the interaction is not only present in the same pathway, but also that it is likely to be active in the same direction.
[8]:
# load PROGENy pathways, we use decoupler as a proxy as it formats the data in a more convenient way
progeny = dc.get_progeny(top=2500)
progeny.head()
Downloading annotations for all proteins from the following resources: `['PROGENy']`
[8]:
source | target | weight | p_value | |
---|---|---|---|---|
0 | Androgen | TMPRSS2 | 11.490631 | 0.000000e+00 |
1 | Androgen | NKX3-1 | 10.622551 | 2.242078e-44 |
2 | Androgen | MBOAT2 | 10.472733 | 4.624285e-44 |
3 | Androgen | KLK2 | 10.176186 | 1.944414e-40 |
4 | Androgen | SARG | 11.386852 | 2.790209e-40 |
[9]:
# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')
Then we use the generate_lr_geneset
function from liana to assign the interactions to pathways. This function takes the ligand-receptor interactions and the pathway annotations, and returns a dataframe with annotated interactions.
[10]:
# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, progeny, lr_sep="^")
lr_progeny.head()
[10]:
source | interaction | weight | |
---|---|---|---|
1960 | Androgen | HGF^MET | -1.288956 |
3030 | NFkB | SELE^CD44 | 3.332552 |
3075 | TNFa | SELE^CD44 | 3.322682 |
4251 | TNFa | FN1^CD44 | 2.590177 |
6950 | NFkB | LAMB3^CD44 | 4.055408 |
We can additionally performed enrichment analysis of certain ligand-receptor scores using this newly-generated dataframe. For example, see the application with Tensor-cell2cell
Disease Annotations¶
As another example, we can also annotate ligand-receptors to diseases in which both the ligand and the receptor are involved.
[11]:
diseases = op.requests.Annotations.get(
resources = ['DisGeNet']
)
Downloading annotations for all proteins from the following resources: `['DisGeNet']`
[12]:
diseases = diseases[['genesymbol', 'label', 'value']]
diseases = diseases.pivot_table(index='genesymbol',
columns='label', values='value',
aggfunc=lambda x: '; '.join(x)).reset_index()
diseases = diseases[['genesymbol', 'disease']]
diseases['disease'] = diseases['disease'].str.split('; ')
diseases = diseases.explode('disease')
lr_diseases = li.rs.generate_lr_geneset(lr_pairs, diseases, source='disease', target='genesymbol', weight=None, lr_sep="^")
lr_diseases.sort_values("interaction").head()
[12]:
disease | interaction | |
---|---|---|
693653 | Hypertensive disease | ACE^AGTR2 |
693926 | Malignant neoplasm of stomach | ACE^AGTR2 |
693991 | Neoplasm Metastasis | ACE^AGTR2 |
694293 | Stomach Neoplasms | ACE^AGTR2 |
693759 | Left Ventricular Hypertrophy | ACE^AGTR2 |
Let’s check some protein of interest:
[13]:
lr_diseases[lr_diseases['interaction'].str.contains('SPP1')]
[13]:
disease | interaction | |
---|---|---|
31124 | Acute Kidney Insufficiency | SPP1^CD44 |
31159 | Acute kidney injury | SPP1^CD44 |
32630 | Kidney Failure, Acute | SPP1^CD44 |
33038 | Mammary Neoplasms, Experimental | SPP1^CD44 |
33163 | Neoplasm Metastasis | SPP1^CD44 |
464305 | Cerebral Hemorrhage | SPP1^ITGAV_ITGB3 |
1108109 | Mammary Neoplasms, Experimental | SPP1^S1PR1 |
Following similar procedures, one may annotate ligand-receptors to any of the annotations available via OmniPath.
See op.requests.Annotations.resources()
Intracellular Signaling¶
While we can obtain the pathways that are associated with each ligand-receptor interaction, we can also obtain the intracellular signaling pathways that are activated in response to the interaction. This is again done using the omnipath
client package, but this time in combination with decoupler, which enables the enrichment of pathways, transcription factors, and other annotations.
One specific scenario, heavily reliant on OmniPath knowledge and enrichment analysis with decoupler is presented in the Differential Analysis Vignette.
There, to find putative causal networks between deregulated CCC interactions and transcription factors (TFs) we use:
1) a protein-protein interaction network¶
[14]:
ppis = op.interactions.OmniPath().get(genesymbols = True)
ppis.head()
[14]:
source | target | source_genesymbol | target_genesymbol | is_directed | is_stimulation | is_inhibition | consensus_direction | consensus_stimulation | consensus_inhibition | curation_effort | references | sources | n_sources | n_primary_sources | n_references | references_stripped | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P0DP23 | P48995 | CALM1 | TRPC1 | True | False | True | True | False | True | 3 | TRIP:11290752;TRIP:11983166;TRIP:12601176 | TRIP | 1 | 1 | 3 | 11290752;11983166;12601176 |
1 | P0DP25 | P48995 | CALM3 | TRPC1 | True | False | True | True | False | True | 3 | TRIP:11290752;TRIP:11983166;TRIP:12601176 | TRIP | 1 | 1 | 3 | 11290752;11983166;12601176 |
2 | P0DP24 | P48995 | CALM2 | TRPC1 | True | False | True | True | False | True | 3 | TRIP:11290752;TRIP:11983166;TRIP:12601176 | TRIP | 1 | 1 | 3 | 11290752;11983166;12601176 |
3 | Q03135 | P48995 | CAV1 | TRPC1 | True | True | False | True | True | False | 13 | DIP:19897728;HPRD:12732636;IntAct:19897728;Lit... | DIP;HPRD;IntAct;Lit-BM-17;TRIP | 5 | 5 | 8 | 10980191;12732636;14551243;16822931;18430726;1... |
4 | P14416 | P48995 | DRD2 | TRPC1 | True | True | False | True | True | False | 1 | TRIP:18261457 | TRIP | 1 | 1 | 1 | 18261457 |
2) Transcription Factor Regulons¶
Provided via the CollecTRI resource:
[15]:
dc.get_collectri().head()
[15]:
source | target | weight | PMID | |
---|---|---|---|---|
0 | MYC | TERT | 1 | 10022128;10491298;10606235;10637317;10723141;1... |
1 | SPI1 | BGLAP | 1 | 10022617 |
2 | SMAD3 | JUN | 1 | 10022869;12374795 |
3 | SMAD4 | JUN | 1 | 10022869;12374795 |
4 | STAT5A | IL2 | 1 | 10022878;11435608;17182565;17911616;22854263;2... |
These are then linked using the a modification of the ILP problem proposed in CARNIVAL, solved using CORNETO - a Unified Omics-Driven Framework for Network Inference.
Metabolite-Receptor Interactions¶
Via LIANA+ we also provide access to the MetalinksDB knowledge graph - a customisable database of metabolite-receptor interactions, part of the BioCypher ecosystem. For more information please refer to Farr et al, 2023.
Specifically, to enable light-weight access, we have converted the MetalinksDB knowledge graph into a database.
This database is queried using sqllite3
and we provide basic queries to customize according to the user’s needs - e.g. disease, pathway, location.
We can check first the values within different tables of the database:
[16]:
li.resource.get_metalinks_values(table_name='disease', column_name='disease')[0:5]
Downloading database...
Database downloaded and saved to /mnt/97efc476-fe88-4281-aa1a-cf9a249ca294/liana-py/docs/source/notebooks/metalinksdb.db.
[16]:
['Menstrual cycle',
'Adrenal hyperplasia',
' congenital',
' due to 3-beta-hydroxysteroid dehydrogenase 2 deficiency',
'Aromatase deficiency']
Then we can obtain metabolite-receptor interactions, the metabolites of which have been reported to be associated with certain locations or diseases:
[17]:
li.resource.get_metalinks(source=['Stich', 'CellPhoneDB', 'NeuronChat'],
tissue_location='Brain',
biospecimen_location='Cerebrospinal Fluid (CSF)',
disease='Schizophrenia',
).head()
[17]:
hmdb | uniprot | gene_symbol | metabolite | mor | transport_direction | type | source | |
---|---|---|---|---|---|---|---|---|
0 | HMDB0000234 | P10275 | AR | Testosterone | -1 | None | lr | CellPhoneDB |
1 | HMDB0000234 | P10275 | AR | Testosterone | 0 | None | lr | CellPhoneDB |
2 | HMDB0000234 | P10275 | AR | Testosterone | 1 | None | lr | CellPhoneDB |
3 | HMDB0000253 | Q14994 | NR1I3 | Pregnenolone | 1 | None | lr | CellPhoneDB |
4 | HMDB0000216 | P07550 | ADRB2 | Norepinephrine | 0 | None | lr | CellPhoneDB |
This database contains both ligand-receptor (lr) and production-degradation (pd) metabolite-protein interactions - note type
. It can further be filtered according to the user’s needs, and can be queried as any other standard RDBMS.
For such cases, we also provide a utility function to print the database schema:
[18]:
li.rs.describe_metalinks()
Schema of table: metabolites
============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: metabolite, Type: TEXT, Primary Key: 0
Column ID: 2, Name: pubchem, Type: TEXT, Primary Key: 0
Column ID: 3, Name: metabolite_subclass, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: proteins
=========================
Column ID: 0, Name: uniprot, Type: TEXT, Primary Key: 0
Column ID: 1, Name: gene_symbol, Type: TEXT, Primary Key: 0
Column ID: 2, Name: protein_type, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: edges
======================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: uniprot, Type: TEXT, Primary Key: 0
Column ID: 2, Name: source, Type: TEXT, Primary Key: 0
Column ID: 3, Name: db_score, Type: REAL, Primary Key: 0
Column ID: 4, Name: experiment_score, Type: REAL, Primary Key: 0
Column ID: 5, Name: combined_score, Type: REAL, Primary Key: 0
Column ID: 6, Name: mor, Type: INTEGER, Primary Key: 0
Column ID: 7, Name: type, Type: TEXT, Primary Key: 0
Column ID: 8, Name: transport_direction, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: cell_location
==============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: cell_location, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: tissue_location
================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: tissue_location, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: biospecimen_location
=====================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: biospecimen_location, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: disease
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: disease, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------
Schema of table: pathway
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: pathway, Type: TEXT, Primary Key: 0
No Foreign Keys.
----------------------------------------