Prior Knowledge

LIANA+ (typically) relies heavily on prior knowledge to infer intercellular communication and the intracellular signaling pathways that are activated in response to communication. This notebook provides a brief overview of the prior knowledge typically used by LIANA+.

[1]:
import liana as li
import omnipath as op
import decoupler as dc
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`

Ligand-Receptor Interactions

In the simplest case, for reproducibility purposes, LIANA+ provides a frozen set of interactions across resources. These are accessible through the select_resource function in the resource module. The resources that are currently supported are:

[2]:
li.resource.show_resources()
[2]:
['baccin2019',
 'cellcall',
 'cellchatdb',
 'cellinker',
 'cellphonedb',
 'celltalkdb',
 'connectomedb2020',
 'consensus',
 'embrace',
 'guide2pharma',
 'hpmr',
 'icellnet',
 'italk',
 'kirouac2010',
 'lrdb',
 'mouseconsensus',
 'ramilowski2015']

By default, liana uses the consensus resource, which is composed by multiple expert-curated ligand-receptor resources, including CellPhoneDB, CellChat, ICELLNET, connectomeDB2020, and CellTalkDB.

[3]:
resource = li.rs.select_resource('consensus')
resource.head()
[3]:
ligand receptor
0 LGALS9 PTPRC
1 LGALS9 MET
2 LGALS9 CD44
3 LGALS9 LRP1
4 LGALS9 CD47

All of the ligand-receptor resource in LIANA+ were pre-generated using the OmniPath meta-database. Though any custom resource can also be passed, including those provided by the user or generated using the omnipath client package.

Via this client, in addition to ligand-receptor interactions, users can obtain the PubMed IDs of the references (references) that were used support each interaction, as well as the database that reported the interaction in the first place.

Users can also modify the resource according to their preferences, for example:

[4]:
ligrec = op.interactions.import_intercell_network(
    interactions_params = {'license':'commercial'},
    transmitter_params = {'database':'CellChatDB'},
    receiver_params = {'database':'CellChatDB'},
    )
ligrec.head()

ligrec = ligrec.rename(columns={'genesymbol_intercell_source':'ligand', 'genesymbol_intercell_target':'receptor'})
ligrec = ligrec[['ligand', 'receptor', 'references'] + [col for col in ligrec.columns if col not in ['ligand', 'receptor', 'references']]]
ligrec.head()
[4]:
ligand receptor references source target is_stimulation is_inhibition consensus_direction consensus_stimulation consensus_inhibition ... aspect_intercell_target category_source_intercell_target uniprot_intercell_target entity_type_intercell_target consensus_score_intercell_target transmitter_intercell_target receiver_intercell_target secreted_intercell_target plasma_membrane_transmembrane_intercell_target plasma_membrane_peripheral_intercell_target
0 LCK CTLA4 ProtMapper:9973379;SIGNOR:9973379;SPIKE:939833... P06239 P16410 True True True False True ... functional resource_specific P16410 protein 8 False True False True False
1 CD86 CTLA4 BioGRID:11279501;CellChatDB:23954143;CellTalkD... P42081 P16410 True False True True False ... functional resource_specific P16410 protein 8 False True False True False
2 CD80 CTLA4 BioGRID:11279502;CellChatDB:23954143;ICELLNET:... P33681 P16410 True False True True False ... functional resource_specific P16410 protein 8 False True False True False
3 ICOSLG CTLA4 ICELLNET:21530327;connectomeDB2020:21530327 O75144 P16410 False False False False False ... functional resource_specific P16410 protein 8 False True False True False
4 LCK CD8A NetPath:8814252;SPIKE:16818755;SPIKE_LC:16818755 P06239 P01732 False False False False False ... functional resource_specific P01732 protein 6 False True True True False

5 rows × 45 columns

This function provides a rich list of annotations, such as the modes of action,inhibition or stimulation, the curation effort, types of signalling, etc. For a more comprehensive overview of the information that is available, please refer to the OmniPath documentation.

Homology Mapping

Similarly, LIANA+ provides on demand homology mapping beyond mouse symbols. It utilises the HCOP database to obtain homologous genes across species. Specifically, we download the resource from the frequently-updated Bulk Download FTP section of the HCOP database: https://ftp.ebi.ac.uk/pub/databases/genenames/hcop/.

The homology mapping is accessible through the resource module:

[5]:
# let's say we are interested in zebrafish homologs of human genes
map_df = li.rs.get_hcop_orthologs(url='https://ftp.ebi.ac.uk/pub/databases/genenames/hcop/human_zebrafish_hcop_fifteen_column.txt.gz',
                                   columns=['human_symbol', 'zebrafish_symbol'],
                                   # NOTE: HCOP integrates multiple resource, so we can filter out mappings in at least 3 of them for confidence
                                   min_evidence=3
                                   )
# rename the columns to source and target, respectively for the original organism and the target organism
map_df = map_df.rename(columns={'human_symbol':'source', 'zebrafish_symbol':'target'})
map_df.tail()
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/resource/_orthology.py:199: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
[5]:
source target
132672 ZYG11B zyg11
132673 ZYG11B zyg11l
132674 ZYX zyx
132676 ZZEF1 zzef1
132677 ZZZ3 zzz3

Now that we’ve obtained the homologous genes, let’s convert the resource to those genes:

[6]:
zfish = li.rs.translate_resource(resource,
                                 map_df=map_df,
                                 columns=['ligand', 'receptor'],
                                 replace=True,
                                 # NOTE that we need to define the threshold of redundancies for the mapping
                                 # in this case, we would keep mappings as long as they don't map to more than 2 zebrafish genes
                                 one_to_many=3
                                 )

Obtain Mouse Homologs

[7]:
map_df = li.rs.get_hcop_orthologs(url='https://ftp.ebi.ac.uk/pub/databases/genenames/hcop/human_mouse_hcop_fifteen_column.txt.gz',
                                  columns=['human_symbol', 'mouse_symbol'],
                                   # NOTE: HCOP integrates multiple resource, so we can filter out mappings in at least 3 of them for confidence
                                   min_evidence=3
                                   )
# rename the columns to source and target, respectively for the original organism and the target organism
map_df = map_df.rename(columns={'human_symbol':'source', 'mouse_symbol':'target'})

# We will then translate
mouse = li.rs.translate_resource(resource,
                                 map_df=map_df,
                                 columns=['ligand', 'receptor'],
                                 replace=True,
                                 # Here, we will be harsher and only keep mappings that don't map to more than 1 mouse gene
                                 one_to_many=1
                                 )
mouse
/home/dbdimitrov/anaconda3/envs/spiana/lib/python3.10/site-packages/liana/resource/_orthology.py:199: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
[7]:
ligand receptor
0 Lgals9 Ptprc
1 Lgals9 Met
2 Lgals9 Cd44
3 Lgals9 Lrp1
4 Lgals9 Cd47
... ... ...
4619 Bmp2 Actr2
4620 Bmp15 Actr2
4621 Csf1 Csf3r
4622 Il36g Ifnar1
4623 Il36g Ifnar2

4055 rows × 2 columns

If you use HCOP function, please reference the original HCOP papers:

  • Eyre, T.A., Wright, M.W., Lush, M.J. and Bruford, E.A., 2007. HCOP: a searchable database of human orthology predictions. Briefings in bioinformatics, 8(1), pp.2-5.

  • Yates, B., Gray, K.A., Jones, T.E. and Bruford, E.A., 2021. Updates to HCOP: the HGNC comparison of orthology predictions tool. Briefings in Bioinformatics, 22(6), p.bbab155.

All methods of LIANA+ accept a resource parameter that can be used to pass any custom resource, beyond such from homology conversion.

Annotating Ligand-Receptors

In addition to ligand-receptors, we can also obtain other annotations via OmniPath. While these can be tissue locations, TF regulons, cytokine signatures, or other types of annotations, the most common use case is to obtain the pathways that are associated with each ligand-receptor interaction.

Pathway Annotations

We use commonly PROGENy pathway weights to assign interactions to certain canonical pathways, such that all members of the interactions (i.e. incl. complex subunits) are present in the same pathway with the same weight sign. This is done to ensure that the interaction is not only present in the same pathway, but also that it is likely to be active in the same direction.

[8]:
# load PROGENy pathways, we use decoupler as a proxy as it formats the data in a more convenient way
progeny = dc.get_progeny(top=2500)
progeny.head()
Downloading annotations for all proteins from the following resources: `['PROGENy']`
[8]:
source target weight p_value
0 Androgen TMPRSS2 11.490631 0.000000e+00
1 Androgen NKX3-1 10.622551 2.242078e-44
2 Androgen MBOAT2 10.472733 4.624285e-44
3 Androgen KLK2 10.176186 1.944414e-40
4 Androgen SARG 11.386852 2.790209e-40
[9]:
# load full list of ligand-receptor pairs
lr_pairs = li.resource.select_resource('consensus')

Then we use the generate_lr_geneset function from liana to assign the interactions to pathways. This function takes the ligand-receptor interactions and the pathway annotations, and returns a dataframe with annotated interactions.

[10]:
# generate ligand-receptor geneset
lr_progeny = li.rs.generate_lr_geneset(lr_pairs, progeny, lr_sep="^")
lr_progeny.head()
[10]:
source interaction weight
1960 Androgen HGF^MET -1.288956
3030 NFkB SELE^CD44 3.332552
3075 TNFa SELE^CD44 3.322682
4251 TNFa FN1^CD44 2.590177
6950 NFkB LAMB3^CD44 4.055408

We can additionally performed enrichment analysis of certain ligand-receptor scores using this newly-generated dataframe. For example, see the application with Tensor-cell2cell

Disease Annotations

As another example, we can also annotate ligand-receptors to diseases in which both the ligand and the receptor are involved.

[11]:
diseases = op.requests.Annotations.get(
    resources = ['DisGeNet']
    )
Downloading annotations for all proteins from the following resources: `['DisGeNet']`
[12]:
diseases = diseases[['genesymbol', 'label', 'value']]
diseases = diseases.pivot_table(index='genesymbol',
                                columns='label', values='value',
                                aggfunc=lambda x: '; '.join(x)).reset_index()
diseases = diseases[['genesymbol', 'disease']]
diseases['disease'] = diseases['disease'].str.split('; ')
diseases = diseases.explode('disease')
lr_diseases = li.rs.generate_lr_geneset(lr_pairs, diseases, source='disease', target='genesymbol', weight=None, lr_sep="^")
lr_diseases.sort_values("interaction").head()
[12]:
disease interaction
693653 Hypertensive disease ACE^AGTR2
693926 Malignant neoplasm of stomach ACE^AGTR2
693991 Neoplasm Metastasis ACE^AGTR2
694293 Stomach Neoplasms ACE^AGTR2
693759 Left Ventricular Hypertrophy ACE^AGTR2

Let’s check some protein of interest:

[13]:
lr_diseases[lr_diseases['interaction'].str.contains('SPP1')]
[13]:
disease interaction
31124 Acute Kidney Insufficiency SPP1^CD44
31159 Acute kidney injury SPP1^CD44
32630 Kidney Failure, Acute SPP1^CD44
33038 Mammary Neoplasms, Experimental SPP1^CD44
33163 Neoplasm Metastasis SPP1^CD44
464305 Cerebral Hemorrhage SPP1^ITGAV_ITGB3
1108109 Mammary Neoplasms, Experimental SPP1^S1PR1

Following similar procedures, one may annotate ligand-receptors to any of the annotations available via OmniPath.

See op.requests.Annotations.resources()

Intracellular Signaling

While we can obtain the pathways that are associated with each ligand-receptor interaction, we can also obtain the intracellular signaling pathways that are activated in response to the interaction. This is again done using the omnipath client package, but this time in combination with decoupler, which enables the enrichment of pathways, transcription factors, and other annotations.

One specific scenario, heavily reliant on OmniPath knowledge and enrichment analysis with decoupler is presented in the Differential Analysis Vignette.

There, to find putative causal networks between deregulated CCC interactions and transcription factors (TFs) we use:

1) a protein-protein interaction network

[14]:
ppis = op.interactions.OmniPath().get(genesymbols = True)
ppis.head()
[14]:
source target source_genesymbol target_genesymbol is_directed is_stimulation is_inhibition consensus_direction consensus_stimulation consensus_inhibition curation_effort references sources n_sources n_primary_sources n_references references_stripped
0 P0DP23 P48995 CALM1 TRPC1 True False True True False True 3 TRIP:11290752;TRIP:11983166;TRIP:12601176 TRIP 1 1 3 11290752;11983166;12601176
1 P0DP25 P48995 CALM3 TRPC1 True False True True False True 3 TRIP:11290752;TRIP:11983166;TRIP:12601176 TRIP 1 1 3 11290752;11983166;12601176
2 P0DP24 P48995 CALM2 TRPC1 True False True True False True 3 TRIP:11290752;TRIP:11983166;TRIP:12601176 TRIP 1 1 3 11290752;11983166;12601176
3 Q03135 P48995 CAV1 TRPC1 True True False True True False 13 DIP:19897728;HPRD:12732636;IntAct:19897728;Lit... DIP;HPRD;IntAct;Lit-BM-17;TRIP 5 5 8 10980191;12732636;14551243;16822931;18430726;1...
4 P14416 P48995 DRD2 TRPC1 True True False True True False 1 TRIP:18261457 TRIP 1 1 1 18261457

2) Transcription Factor Regulons

Provided via the CollecTRI resource:

[15]:
dc.get_collectri().head()
[15]:
source target weight PMID
0 MYC TERT 1 10022128;10491298;10606235;10637317;10723141;1...
1 SPI1 BGLAP 1 10022617
2 SMAD3 JUN 1 10022869;12374795
3 SMAD4 JUN 1 10022869;12374795
4 STAT5A IL2 1 10022878;11435608;17182565;17911616;22854263;2...

These are then linked using the a modification of the ILP problem proposed in CARNIVAL, solved using CORNETO - a Unified Omics-Driven Framework for Network Inference.

Metabolite-Receptor Interactions

Via LIANA+ we also provide access to the MetalinksDB knowledge graph - a customisable database of metabolite-receptor interactions, part of the BioCypher ecosystem. For more information please refer to Farr et al, 2023.

Specifically, to enable light-weight access, we have converted the MetalinksDB knowledge graph into a database.

This database is queried using sqllite3 and we provide basic queries to customize according to the user’s needs - e.g. disease, pathway, location.

We can check first the values within different tables of the database:

[16]:
li.resource.get_metalinks_values(table_name='disease', column_name='disease')[0:5]
Downloading database...
Database downloaded and saved to /mnt/97efc476-fe88-4281-aa1a-cf9a249ca294/liana-py/docs/source/notebooks/metalinksdb.db.
[16]:
['Menstrual cycle',
 'Adrenal hyperplasia',
 ' congenital',
 ' due to 3-beta-hydroxysteroid dehydrogenase 2 deficiency',
 'Aromatase deficiency']

Then we can obtain metabolite-receptor interactions, the metabolites of which have been reported to be associated with certain locations or diseases:

[17]:
li.resource.get_metalinks(source=['Stich', 'CellPhoneDB', 'NeuronChat'],
                          tissue_location='Brain',
                          biospecimen_location='Cerebrospinal Fluid (CSF)',
                          disease='Schizophrenia',
                          ).head()
[17]:
hmdb uniprot gene_symbol metabolite mor transport_direction type source
0 HMDB0000234 P10275 AR Testosterone -1 None lr CellPhoneDB
1 HMDB0000234 P10275 AR Testosterone 0 None lr CellPhoneDB
2 HMDB0000234 P10275 AR Testosterone 1 None lr CellPhoneDB
3 HMDB0000253 Q14994 NR1I3 Pregnenolone 1 None lr CellPhoneDB
4 HMDB0000216 P07550 ADRB2 Norepinephrine 0 None lr CellPhoneDB

This database contains both ligand-receptor (lr) and production-degradation (pd) metabolite-protein interactions - note type. It can further be filtered according to the user’s needs, and can be queried as any other standard RDBMS.

For such cases, we also provide a utility function to print the database schema:

[18]:
li.rs.describe_metalinks()
Schema of table: metabolites
============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: metabolite, Type: TEXT, Primary Key: 0
Column ID: 2, Name: pubchem, Type: TEXT, Primary Key: 0
Column ID: 3, Name: metabolite_subclass, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: proteins
=========================
Column ID: 0, Name: uniprot, Type: TEXT, Primary Key: 0
Column ID: 1, Name: gene_symbol, Type: TEXT, Primary Key: 0
Column ID: 2, Name: protein_type, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: edges
======================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: uniprot, Type: TEXT, Primary Key: 0
Column ID: 2, Name: source, Type: TEXT, Primary Key: 0
Column ID: 3, Name: db_score, Type: REAL, Primary Key: 0
Column ID: 4, Name: experiment_score, Type: REAL, Primary Key: 0
Column ID: 5, Name: combined_score, Type: REAL, Primary Key: 0
Column ID: 6, Name: mor, Type: INTEGER, Primary Key: 0
Column ID: 7, Name: type, Type: TEXT, Primary Key: 0
Column ID: 8, Name: transport_direction, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: cell_location
==============================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: cell_location, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: tissue_location
================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: tissue_location, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: biospecimen_location
=====================================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: biospecimen_location, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: disease
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: disease, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------
Schema of table: pathway
========================
Column ID: 0, Name: hmdb, Type: TEXT, Primary Key: 0
Column ID: 1, Name: pathway, Type: TEXT, Primary Key: 0

No Foreign Keys.
----------------------------------------