Chapter 4 Target retrieval and DEG analysis

In network pharmacology analysis, disease target retrieval and differential gene screening are two essential preparatory steps. This chapter introduces common and free disease target databases and their retrieval strategies, demonstrates volcano plot visualization of DESeq2 results, and performs intersection analysis between DEGs and disease targets using Venn and UpSet plots.

4.1 Programmatic disease target retrieval with TCMDATA

Before manually browsing individual databases, it is worth noting that TCMDATA provides convenient functions for programmatic disease–gene association retrieval. Currently, two complementary routes are supported:

search_disease() and search_gene_disease() retrieve disease-gene associations from the DisGeNET dataset bundled in the DOSE Bioconductor package.
search_disease_efo(), get_disease_targets(), and query_disease_targets() retrieve scored disease-target associations from the Open Targets Platform GraphQL API.

The DOSE/DisGeNET route is useful for fast local lookup, while Open Targets is preferred when target-level association scores are needed for target weighting or PPI-based prioritization.

For the DN case study used in this book, TCMDATA also includes dn_otp_tbl, a built-in Open Targets-derived table that retains the target-level score column, in addition to the legacy symbol vector dn_otp.

4.1.1 `search_disease()`: disease → genes

search_disease() accepts a disease name (fuzzy matching supported) or a UMLS CUI identifier, and returns all associated genes with their Entrez IDs and gene symbols:

library(TCMDATA)

## search by disease name
dn_targets <- search_disease("diabetic nephropathy")
head(dn_targets)

  disease_id   gene_id         disease_name     symbol
1   C0011881        10 Diabetic Nephropathy       NAT2
2   C0011881      1000 Diabetic Nephropathy       CDH2
3   C0011881 100033819 Diabetic Nephropathy     MIR675
4   C0011881 100048912 Diabetic Nephropathy CDKN2B-AS1
5   C0011881 100124700 Diabetic Nephropathy     HOTAIR
6   C0011881 100125288 Diabetic Nephropathy      ZGLP1

You can also query by UMLS CUI for precise matching, or search multiple diseases at once:

## exact match by CUI
sepsis <- search_disease("C0243026")

## multiple diseases
multi <- search_disease(c("sepsis", "asthma"))

4.1.2 `search_gene_disease()`: gene → diseases

The reverse lookup function search_gene_disease() finds all diseases associated with a given gene, which is useful for validating whether hub genes from network analysis have known disease relevance:

## which diseases are associated with TNF?
tnf_diseases <- search_gene_disease("TNF")
cat("TNF is associated with", length(unique(tnf_diseases$disease_name)), "diseases\n")
head(tnf_diseases[, c("disease_name", "gene_id", "symbol")])

TNF is associated with 2724 diseases
                        disease_name gene_id symbol
1 (non-specific) purulent meningitis    7124    TNF
2                     AA amyloidosis    7124    TNF
3                  Abdominal Abscess    7124    TNF
4                     Abdominal Pain    7124    TNF
5                  Abdominal symptom    7124    TNF
6    ABLEPHARON-MACROSTOMIA SYNDROME    7124    TNF

## query multiple genes
hub_diseases <- search_gene_disease(c("IL6", "TNF", "PPARG"))
## top diseases shared by all three genes
shared <- Reduce(intersect, split(hub_diseases$disease_name, hub_diseases$symbol))
cat("Diseases shared by IL6, TNF, PPARG:", length(shared), "\n")
head(shared)

Diseases shared by IL6, TNF, PPARG: 516
[1] "Abnormal behavior"    "Acanthosis Nigricans" "Acne"
[4] "Acne Vulgaris"        "Acute Chest Syndrome" "Acute colitis"

4.1.3 Open Targets API: disease → scored targets

Open Targets provides an overall association score for each disease-target pair. This score is especially useful when disease targets should be treated as weighted evidence rather than a simple unweighted gene set.

The recommended workflow is:

Search the disease ontology identifier.
Retrieve associated targets by EFO/MONDO/Orphanet ID.
Keep the score column for downstream weighting.

library(TCMDATA)

## Step 1: inspect disease ontology matches
search_disease_efo("diabetic nephropathy", size = 5)

## Step 2: retrieve associated targets from the selected disease ID
dn_ot <- get_disease_targets("EFO_0000401", size = 500, score_threshold = 0.05)
head(dn_ot)

  ensembl_id      gene_symbol gene_name       biotype        score
1 ENSG00000159640 ACE         angiotensin... protein_coding 0.7654
2 ENSG00000144891 AGTR1       angiotensin... protein_coding 0.5903
3 ENSG00000148737 TCF7L2      transcription... protein_coding 0.5131
4 ENSG00000254647 INS         insulin         protein_coding 0.4939

Alternatively, query_disease_targets() combines disease-name search and target retrieval in one call:

dn_ot <- query_disease_targets(
  disease_name = "diabetic nephropathy",
  size = 500,
  score_threshold = 0.05
)

dn_ot_targets <- dn_ot$gene_symbol

If the disease name is ambiguous, first run search_disease_efo() and then pass the exact disease ID to get_disease_targets(). This is usually safer for reproducible analysis.

For reproducible examples without online API calls, the built-in dn_otp_tbl object can be used directly:

data(dn_otp_tbl)
head(dn_otp_tbl[, c("symbol", "score", "source")])

4.2 Disease target databases

Multiple public databases catalogue gene–disease associations with varying evidence levels. In practice, combining targets from multiple sources (often intersection) improves both coverage and reliability. Here, we introduce some commonly used and user-friendly databases for disease target retrieval in TCM network pharmacology research.

4.2.1 GeneCards

GeneCards is an integrative database that provides comprehensive information on human genes, including disease associations (Stelzer et al., 2016). Each gene–disease link is assigned a Relevance Score that summarizes evidence from multiple sources.

However, GeneCards does not offer a direct API for bulk retrieval, so the common workflow involves manual search and export.

For example, to retrieve targets for “Diabetic Nephropathy”, we can manually search GeneCards, export the results as a CSV file, and filter by Relevance Score.

Then, we can read the exported CSV and filter genes based on the Relevance Score:

## uploaded "GeneCards-SearchResults.csv" from GeneCards export
dn_raw <- read.csv("GeneCards-SearchResults.csv")

## subset genes with Relevance Score > 10
dn_targets <- dn_raw$Gene.Symbol[dn_raw$Relevance.Score > 10]
head(dn_targets)

[1] "INS"   "HNF1B" "ACE"   "HNF1A" "GCK"   "NLRP3"

Notably, TCMDATA provides a built-in GeneCards-derived target list for Diabetic Nephropathy for case study demonstration:

library(TCMDATA)
data(dn_gcds)
head(dn_gcds, 20)

#>  [1] "INS"     "ACE"     "GCK"     "HNF1A"   "HNF1B"   "KCNJ11"  "UMOD"   
#>  [8] "ABCC8"   "IL6"     "PPARG"   "HNF4A"   "WFS1"    "COL4A1"  "NLRP3"  
#> [15] "TCF7L2"  "PDX1"    "INSR"    "TNF"     "NEUROD1" "TGFBR2"

4.2.2 Open Targets Platform

The Open Targets Platform is a comprehensive and robust resource designed for therapeutic target identification and prioritization (Buniello et al., 2025). It systematically aggregates, validates, and scores evidence linking targets to diseases across heterogeneous data sources. By organizing its knowledge base around five core entities—Target, Disease/Phenotype, Variant, Study, and Drug—the platform provides a highly structured framework that facilitates hypothesis generation and evidence-based target selection in drug discovery.

User can retrieve disease targets by both API and web interface. The API allows programmatic access to the data, while the web interface provides an intuitive way to explore target–disease associations.

For web interface, “Diabetic Nephropathy” can be searched directly, and the resulting target list can be exported as json and tsv format for downstream analysis:

ot <- read.delim("OT-EFO_0000401-associated-targets-2_21_2026-v25_12.tsv")
head(ot)

  symbol globalScore   gwasCredibleSets geneBurden                eva
1    ACE   0.7654601            No data    No data 0.8841439697057125
2  AGTR1   0.5902786            No data    No data            No data
3 TCF7L2   0.5130827 0.8296366016990858    No data            No data
4    INS   0.4939346 0.7894596444737526    No data            No data
5   UMOD   0.4840121 0.7620249013066388    No data            No data
6 COL4A3   0.4821762 0.7850753632678814    No data            No data

For API access, please refer to the Open Targets API documentation for detailed instructions on how to query target–disease associations programmatically.

4.2.3 CTD (Comparative Toxicogenomics Database)

The Comparative Toxicogenomics Database (CTD) is a robust, publicly available resource that aims to advance understanding about how environmental exposures affect human health (Davis et al., 2024). It provides manually curated information about chemical–gene/protein interactions, chemical–disease, and gene–disease relationships. By integrating these data with functional and pathway annotations, CTD helps researchers develop hypotheses about the mechanisms underlying environmentally influenced diseases. It is particularly valuable for network pharmacology studies involving environmental toxins or pharmacological exposures.

CTD provides both a web interface and a RESTful API for data retrieval. The web interface allows users to perform batch queries for diseases, chemicals, or genes, and export the results manually.

For programmatic access, the CTD Batch Query API is highly efficient. You can construct a query URL specifying the input type (disease), the query term (Diabetic Nephropathy), the desired report type (genes_curated or genes_inferred), and the output format (csv or tsv).

The report parameter controls which association types are returned:

`report` value	Description	DN example
`genes_curated`	Manually curated from literature (high confidence)	46 genes
`genes_inferred`	Inferred via chemical–gene–disease links	~26,000 genes
`genes`	All associations (curated + inferred)	~26,300 genes

For example, to retrieve all gene targets associated with “Diabetic Nephropathy” directly into R:

# Construct the CTD Batch Query API URL
# Change report to "genes_curated" for high-confidence curated associations only
options(timeout = 300)
ctd_url <- paste0(
  "https://ctdbase.org/tools/batchQuery.go?",
  "inputType=disease&",
  "inputTerms=Diabetic%20Nephropathy&",
  "report=genes&",
  "format=tsv"
)

lines <- readLines(ctd_url)
lines[1] <- sub("^# ", "", lines[1])
ctd <- read.delim(textConnection(lines))

# Extract unique gene symbols
ctd_targets <- unique(ctd$GeneSymbol)
cat("Total gene targets:", length(ctd_targets), "\n")
head(ctd[, c("GeneSymbol", "GeneID", "DirectEvidence", "InferenceScore")])

Total gene targets: 26330

      GeneSymbol GeneID   DirectEvidence InferenceScore
1 1700001K19RIKL 299330                            3.99
2           1-SF 100049428                            2.47
3 9530082P21RIKL 360487                            3.94
4 9930111J21RIK2 245240                            3.74
5              A  50518 marker/mechanism             NA
6              A  50518                            2.63

The InferenceScore column quantifies the strength of inferred associations — higher values indicate more atypical (and potentially more meaningful) connectivity in the chemical–gene–disease network. Rows with DirectEvidence filled and InferenceScore = NA are curated associations.

Further details about the CTD data retrieval and interpretation can be found in the CTD documentation.

4.2.4 Other databases

In addition to the databases detailed above, several other resources are frequently used in network pharmacology to ensure comprehensive target collection:

DisGeNET: One of the largest publicly available collections of genes and variants associated with human diseases. It integrates data from expert-curated repositories, GWAS catalogues, animal models, and text-mining of the scientific literature.
OMIM (Online Mendelian Inheritance in Man): A comprehensive, authoritative compendium of human genes and genetic phenotypes. It is highly reliable for identifying genes with strong, well-documented genetic links to specific diseases.
TTD (Therapeutic Target Database): Focuses on known and explored therapeutic protein and nucleic acid targets, providing detailed information about the targeted diseases, pathway information, and corresponding drugs.
DrugBank: While primarily a drug database, it provides extensive information on drug targets, making it useful for finding targets of existing drugs used to treat the disease of interest.

Researchers typically query multiple databases and take intersection of the results to form a robust disease target set.

4.3 Compound target prediction

In addition to disease-associated genes, target retrieval on the compound side is also an essential step in network pharmacology analysis. For many herbal ingredients or small-molecule constituents, experimentally validated targets are often incomplete or unavailable in public databases. Therefore, in silico target prediction tools are commonly used to expand the candidate target space before downstream intersection, network construction, and enrichment analysis.

Among the currently most widely used strategies, SwissTargetPrediction and Similarity Ensemble Approach (SEA) are two representative and user-friendly resources for predicting potential targets of small molecules. Although both methods are based on the principle that structurally or chemically similar compounds tend to interact with similar proteins, they differ in their implementation details, scoring systems, and output formats. In practice, researchers often use one or both resources to obtain a broader and more comprehensive target set for compound-level network pharmacology studies.

4.3.1 SwissTargetPrediction

SwissTargetPrediction is one of the most commonly used web-based tools for predicting the potential targets of bioactive small molecules (Daina et al., 2019). It was developed based on the assumption that similar molecules are likely to bind similar targets, and it combines both 2D chemical similarity and 3D molecular similarity to compare a query compound against a curated collection of ligands with known experimentally validated targets.

Users can submit a compound by entering its SMILES string, drawing the chemical structure, or uploading a molecular file. After the query is processed, the platform returns a ranked list of predicted protein targets, usually accompanied by target class annotation, probability scores, and related chemical information. The results are intuitive and convenient for manual browsing, making SwissTargetPrediction particularly suitable for studies involving a limited number of herbal ingredients or representative active compounds.

In network pharmacology studies of TCM, SwissTargetPrediction is often used to supplement missing target annotations for monomeric compounds derived from herbs. Its main advantage lies in its ease of use and clear output, which facilitates quick target collection and downstream standardization of gene symbols. However, users should note that the predictions are still model-based inferences rather than direct experimental evidence. Therefore, the predicted targets are usually filtered further by species, probability, target type, or by taking the intersection with disease targets and DEGs, so as to improve biological plausibility and reduce false-positive results.

To get SMILES of your query compounds, it is easy to use resolve_cid() and getprops() functions in TCMDATA to retrieve the canonical SMILES for a list of CIDs:

## suppose you have a vector of CIDs for your compounds of interest (lingzhi example)
library(TCMDATA)
herbs <- c("灵芝")
lz_mol <- search_herb(herb = herbs, type = "Herb_cn_name")$molecule |> unique() |> head(1)
lz_mol_cid <- resolve_cid(lz_mol, from = "name")
lz_mol_smiles <- getprops(lz_mol_cid, properties = "CanonicalSMILES")

print(lz_mol_smiles)

# A tibble: 1 × 3
#  cid   CID   ConnectivitySMILES     
#  <chr> <chr> <chr>                  
#1 72    72    C1=CC(=C(C=C1C(=O)O)O)O

The predicted targets can be downloaded as a CSV file, which can be read into R for further processing and integration with disease targets and DEGs in the network pharmacology workflow using TCMDATA.

4.3.2 Similarity Ensemble Approach (SEA)

The Similarity Ensemble Approach (SEA) is another widely used ligand-based method for target prediction. Unlike single-pair similarity scoring, SEA evaluates the relationship between a query compound and an entire ligand set known to bind a given protein target. In other words, it compares the chemical similarity between the submitted molecule and the ensemble of known ligands for each target, and then assesses whether the observed similarity is greater than expected by chance.

A key feature of SEA is that it provides a more statistically oriented framework for target prediction. Its output commonly includes predicted targets together with measures such as E-values, significance scores, or confidence-related statistics, which help users judge the relative reliability of the predictions. Because of this set-to-set comparison strategy, SEA is often considered a useful complement to other chemical similarity tools, especially when researchers want to broaden the search space for potential compound–target associations.

In the context of TCM network pharmacology, SEA is frequently applied to predict candidate protein targets for herbal monomers whose direct experimental annotations are sparse. The predicted targets can then be merged with results from SwissTargetPrediction or other databases to form a more comprehensive compound target pool. As with all computational prediction tools, SEA results should be interpreted cautiously and are generally recommended for candidate expansion and prioritization, rather than being treated as definitive evidence. A common practice is to retain overlapping targets supported by multiple prediction resources and then integrate them with disease targets and transcriptomic signals for subsequent network analysis.

Also, we set “C1=CC(=C(C=C1C(=O)O)O)O” as SEA input, and results are as follows:

The predicted targets can be downloaded and read into R for further integration with disease targets and DEGs in the network pharmacology workflow using TCMDATA.

4.4 DEG visualization

Differential expression analysis identifies genes that are significantly altered between disease and control conditions. In the pharmacology network analysis, these DEGs can be integrated with disease targets to prioritize genes that are both statistically significant in expression and biologically linked to the disease phenotype. The intersection of DEGs and disease targets often represents the most promising candidates for further network analysis and experimental validation.

TCMDATA includes a demo DESeq2 result^[4] from GSE142025^[5] (early DN vs. control) for illustration. Further details about the dataset can be found in the original GEO page.

4.4.1 Load and inspect data

data(deg_earlydn)
str(deg_earlydn)

#> 'data.frame':    27183 obs. of  8 variables:
#>  $ baseMean      : num  31.9 1128.6 61.8 11.7 260.2 ...
#>  $ log2FoldChange: num  0.8175 -0.0325 -0.2766 0.0321 0.0982 ...
#>  $ lfcSE         : num  0.219 0.133 0.266 0.421 0.178 ...
#>  $ stat          : num  3.7246 -0.2439 -1.0382 0.0763 0.5528 ...
#>  $ pvalue        : num  0.000196 0.807271 0.299167 0.939176 0.580368 ...
#>  $ padj          : num  0.00409 0.91812 0.56789 0.97605 0.78759 ...
#>  $ names         : chr  "DDX11L1" "WASH7P" "MIR6859-1" "FAM138A" ...
#>  $ g             : chr  "up" "normal" "normal" "normal" ...

table(deg_earlydn$g)

#> 
#>   down normal     up 
#>    678  25852    653

4.4.2 Volcano plot with `ivolcano`

ivolcano^[6] is an R package that provides both static and interactive volcano plot visualizations for differential expression results. The interactive mode allows users to explore DEGs dynamically, which hovers to display gene details, click to redirect to external databases, and zoom into specific regions.

By specifying dual thresholds, ivolcano automatically applies a FigureYa-styled color scheme that clearly separates genes into significance tiers.

library(ivolcano)

p <- ivolcano(deg_earlydn,
              logFC_col  = "log2FoldChange",
              pval_col   = "padj",
              gene_col   = "names",
              pval_cutoff  = 0.05,
              logFC_cutoff = 1,
              pval_cutoff2 = 0.01,
              logFC_cutoff2 = 2,
              size_by    = "manual",
              top_n      = 10,
              onclick_fun = onclick_genecards)
print(p)

The interactive plot supports hovering to view gene details (name, logFC, adjusted P-value) and clicking to redirect to external databases. For example, onclick_genecards opens the GeneCards page for any clicked gene. Other built-in redirects include onclick_ncbi, onclick_ensembl, onclick_uniprot, and onclick_pubmed.

To generate a static ggplot2 figure (e.g., for PDF output or manuscript submission), simply set interactive = FALSE:

p_static <- ivolcano(deg_earlydn,
                     logFC_col  = "log2FoldChange",
                     pval_col   = "padj",
                     gene_col   = "names",
                     pval_cutoff  = 0.05,
                     logFC_cutoff = 1,
                     pval_cutoff2 = 0.01,
                     logFC_cutoff2 = 2,
                     size_by    = "manual",
                     top_n      = 10,
                     interactive = FALSE)
print(p_static)

4.5 Intersection analysis

Integrating DEGs with disease targets helps prioritize genes that are both statistically significant in expression and biologically linked to the disease phenotype. Here, we combine DN targets from GeneCards and Open Targets Platform with DEGs for intersection analysis using TCMDATA.

4.5.1 Prepare gene sets

TCMDATA provides three built-in datasets for the DN case study: deg_earlydn (DEGs), dn_gcds (GeneCards targets), and dn_otp (Open Targets targets).

# select DEGs
degs <- deg_earlydn$names[deg_earlydn$g != "normal"]
cat("DEGs:", length(degs), "\n")

#> DEGs: 1331

# GeneCards disease targets
data(dn_gcds)
cat("GeneCards targets:", length(dn_gcds), "\n")

#> GeneCards targets: 4760

# Open Targets Platform targets
data(dn_otp)
cat("Open Targets targets:", length(dn_otp), "\n")

#> Open Targets targets: 3944

4.5.2 Venn diagram

getvenndata() constructs a logical membership matrix for the input vectors (it’s recommended when sets <= 4), and ggvenn_plot() renders the corresponding Venn diagram.

venn_df <- getvenndata(degs, dn_gcds, dn_otp,
                       set_names = c("DEGs", "GeneCards", "OpenTargets"))

venn_df |> head()

#>    Element DEGs GeneCards OpenTargets
#> 1  DDX11L1 TRUE     FALSE       FALSE
#> 2 MIR12136 TRUE     FALSE       FALSE
#> 3   FAM87B TRUE     FALSE       FALSE
#> 4     HES4 TRUE     FALSE       FALSE
#> 5     TP73 TRUE     FALSE       FALSE
#> 6   GPR153 TRUE     FALSE       FALSE

Then, the Venn diagram can be plotted with ggvenn_plot():

venn1 <- ggvenn_plot(venn_df)

venn2 <- ggvenn_plot(venn_df, set.color = c("#FF8748", "#5BAA56", "#B8BB5B"), stroke.color = "white")

aplot::plot_list(venn1, venn2, ncol = 1)

4.5.3 UpSet plot

When comparing 4 or more sets, an UpSet plot provides a clearer intersection overview than a Venn diagram. TCMDATA provides an upsetplot() function (a convenience wrapper around aplotExtra::upset_plot()) that takes a named list of character vectors and renders an UpSet-style visualization:

gene_list <- list(
  DEGs = degs,
  GeneCards  = dn_gcds,
  OpenTargets = dn_otp
)

upsetplot(gene_list, color.intersect.by = "Set2", color.set.by = "Dark2")

4.5.4 Extract intersection results

getvennresult() extracts all intersection subsets from the Venn membership matrix:

venn_res <- getvennresult(venn_df)
venn_res[, c("Set_Combination", "Gene_Count")]

#>              Set_Combination Gene_Count
#> 1 DEGs&GeneCards&OpenTargets        149
#> 2      GeneCards&OpenTargets       1900
#> 3           DEGs&OpenTargets         75
#> 4                OpenTargets       1820
#> 5             DEGs&GeneCards        135
#> 6                  GeneCards       2576
#> 7                       DEGs        972

Then, extract the gene lists for each intersection combination:

venn_res$Set_Combination

#> [1] "DEGs&GeneCards&OpenTargets" "GeneCards&OpenTargets"     
#> [3] "DEGs&OpenTargets"           "OpenTargets"               
#> [5] "DEGs&GeneCards"             "GeneCards"                 
#> [7] "DEGs"

For example, to extract the core targets shared by all three databases (the first row DEGs&GeneCards&OpenTargets):

core_genes <- strsplit(venn_res$Genes[1], ",\\s*")[[1]]
cat("Core targets:", length(core_genes), "\n")

#> Core targets: 149

head(core_genes, 20)

#>  [1] "ERRFI1"  "EPHA2"   "LIN28A"  "NR0B2"   "JUN"     "GADD45A" "CCN1"   
#>  [8] "GBP2"    "VCAM1"   "MCL1"    "S100A9"  "S100A8"  "S100A4"  "FCGR3B" 
#> [15] "RXRG"    "SELE"    "PTGS2"   "RGS1"    "BTG2"    "CD55"

The core intersection genes, those shared across multiple databases and DEGs, can then be carried forward to PPI network construction (Chapter 7) and enrichment analysis (Chapter 6).

4.6 PPI-weighted target ranking

The classical network pharmacology workflow often retains only the direct intersection between herb targets and disease targets. This is simple, but it ignores two important issues:

Disease targets from different databases may have different levels of evidence.
A non-overlapping herb target may still be biologically relevant if it is close to the disease module in the human PPI network.

TCMDATA therefore provides rank_tcm_targets_by_ppi(), which ranks herb targets by their proximity to a weighted disease module on a global PPI background. The function uses the built-in human STRING PPI network by default, calculates weighted shortest-path proximity to disease targets, and estimates a degree-matched random background for each candidate target. The final score is the no-self proximity Z-score:

Score_final = z_proximity_no_self

For direct-overlap targets, the no-self score evaluates whether the target is close to other disease-module genes, rather than giving it an automatic advantage simply because its distance to itself is zero.

4.6.1 Example: Lingzhi targets against DN disease module

For the DN case study, TCMDATA includes scored GeneCards and Open Targets tables:

dn_gcds_tbl: GeneCards targets with Relevance Score
dn_otp_tbl: Open Targets targets with overall association score

These scored sources can be combined into a weighted disease module using prepare_disease_weights():

library(TCMDATA)

## scored DN disease targets
data(dn_gcds_tbl)
data(dn_otp_tbl)

disease_w <- prepare_disease_weights(
  GeneCards = dn_gcds_tbl,
  OpenTargets = dn_otp_tbl
)

## Lingzhi herb targets from TCMDATA
lingzhi_df <- search_herb("lingzhi", type = "Herb_pinyin_name")
herb_w <- prepare_herb_target_weights(lingzhi_df, method = "binary")

## PPI-weighted target ranking
rank_res <- rank_tcm_targets_by_ppi(
  herb_targets = herb_w,
  disease_targets = disease_w,
  n_perm = 1000,
  seed = 20260525
)

head(rank_res$result[, c(
  "target", "Rank_final", "Score_final", "direct_overlap",
  "disease_weight_if_overlap", "proximity_no_self",
  "z_proximity_no_self", "p_empirical_no_self"
)], 10)

A typical output table has one row per herb target:

   target Rank_final Score_final direct_overlap disease_weight_if_overlap
1    TP53          1        3.24           TRUE                    0.227
2   PTPN1          2        3.10           TRUE                    0.276
3    EGFR          3        2.95           TRUE                    0.176
4     INS          4        2.89           TRUE                    1.000
5     ALB          5        2.87           TRUE                    0.508

  proximity_no_self z_proximity_no_self p_empirical_no_self
1             0.312                3.24               0.001
2             0.262                3.10               0.002
3             0.309                2.95               0.004
4             0.309                2.89               0.003
5             0.295                2.87               0.004

The main columns are interpreted as follows:

Column	Meaning
`target`	Herb target gene symbol
`direct_overlap`	Whether the herb target is also a disease target
`disease_weight_if_overlap`	Disease evidence weight if the target is in the disease module
`proximity_no_self`	Weighted PPI proximity to the disease module after excluding self-contribution for overlap targets
`Score_final`	Final ranking score, currently equal to `z_proximity_no_self`
`p_empirical_no_self`	Empirical P-value from degree-matched random background

A practical interpretation is:

Score_final > 0  closer to the disease module than same-degree random nodes
Score_final > 1  useful exploratory cutoff
Score_final > 2  stronger PPI-proximity evidence

This ranking can be used to select prioritized targets for enrichment, PPI subnetwork construction, or downstream validation:

top_targets <- select_tcm_targets(
  rank_res,
  top_n = 50,
  min_score = 1,
  return = "targets"
)

head(top_targets)

The method should be viewed as a target-prioritization strategy rather than a replacement for biological validation. It keeps direct overlaps visible, but it also allows high-ranking non-overlap targets to be discovered when they are network-proximal to the weighted disease module.

4.7 References

Stelzer G, Rosen R, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Iny Stein T, Nudel R, Lieder I, Mazor Y, Kaplan S, Dahary D, Warshawsky D, Guan-Golan Y, Kohn A, Rappaport N, Safran M, and Lancet D. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Current Protocols in Bioinformatics (2016), 54:1.30.1–1.30.33. doi: 10.1002/cpbi.5.
Buniello A, et al. Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery. Nucleic Acids Research (2025). doi: 10.1093/nar/gkae1128.
Davis AP, Wiegers TC, Sciaky D, Barkalow F, Strong M, Wyatt B, Wiegers J, McMorran R, Abrar S, Mattingly CJ. Comparative Toxicogenomics Database’s 20th anniversary: update 2025. Nucleic Acids Research (2024). doi: 10.1093/nar/gkae822.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology (2014), 15, 550. doi: 10.1186/s13059-014-0550-8.
Fan Y, Yi Z, D’Agati VD, Sun Z, et al. Comparison of Kidney Transcriptomic Profiles of Early and Advanced Diabetic Nephropathy Reveals Potential New Mechanisms for Disease Progression. Diabetes (2019), 68:2301–2314. doi: 10.2337/db19-0204. PMID: 32086290.
Yu G (2025). ivolcano: Interactive Volcano Plot. R package version 0.1.0. https://CRAN.R-project.org/package=ivolcano.

4.8 Session information

sessionInfo()

#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ivolcano_0.0.5    enrichplot_1.32.0 TCMDATA_0.1.0    
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.3.0               gson_0.1.0              httr2_1.2.2            
#>   [4] rlang_1.2.0             magrittr_2.0.5          DOSE_4.6.0             
#>   [7] compiler_4.6.0          RSQLite_3.53.1          png_0.1-9              
#>  [10] systemfonts_1.3.2       callr_3.7.6             vctrs_0.7.3            
#>  [13] reshape2_1.4.5          stringr_1.6.0           pkgconfig_2.0.3        
#>  [16] crayon_1.5.3            fastmap_1.2.0           XVector_0.52.0         
#>  [19] labeling_0.4.3          rmarkdown_2.31          ps_1.9.3               
#>  [22] purrr_1.2.2             bit_4.6.0               xfun_0.57              
#>  [25] cachem_1.1.0            aplot_0.2.9             jsonlite_2.0.0         
#>  [28] blob_1.3.0              tidydr_0.0.6            tweenr_2.0.3           
#>  [31] parallel_4.6.0          cluster_2.1.8.2         R6_2.6.1               
#>  [34] bslib_0.11.0            stringi_1.8.7           RColorBrewer_1.1-3     
#>  [37] enrichit_0.1.4          jquerylib_0.1.4         GOSemSim_2.38.0        
#>  [40] Rcpp_1.1.1-1.1          Seqinfo_1.2.0           bookdown_0.46          
#>  [43] knitr_1.51              ggtangle_0.1.2          IRanges_2.46.0         
#>  [46] splines_4.6.0           igraph_2.3.1            aisdk_1.4.8            
#>  [49] tidyselect_1.2.1        qvalue_2.44.0           rstudioapi_0.18.0      
#>  [52] yaml_2.3.12             processx_3.9.0          lattice_0.22-9         
#>  [55] tibble_3.3.1            plyr_1.8.9              Biobase_2.72.0         
#>  [58] treeio_1.36.1           withr_3.0.2             KEGGREST_1.52.0        
#>  [61] S7_0.2.2                evaluate_1.0.5          gridGraphics_0.5-1     
#>  [64] scatterpie_0.2.6        polyclip_1.10-7         Biostrings_2.80.1      
#>  [67] pillar_1.11.1           ggtree_4.2.0            stats4_4.6.0           
#>  [70] clusterProfiler_4.20.0  ggfun_0.2.0             generics_0.1.4         
#>  [73] S4Vectors_0.50.1        ggplot2_4.0.3           scales_1.4.0           
#>  [76] tidytree_0.4.7          glue_1.8.1              gdtools_0.5.1          
#>  [79] lazyeval_0.2.3          tools_4.6.0             ggnewscale_0.5.2       
#>  [82] ggvenn_0.1.19           ggiraph_0.9.6           fs_2.1.0               
#>  [85] grid_4.6.0              tidyr_1.3.2             ape_5.8-1              
#>  [88] AnnotationDbi_1.74.0    nlme_3.1-169            patchwork_1.3.2        
#>  [91] ggforce_0.5.0           cli_3.6.6               rappdirs_0.3.4         
#>  [94] fontBitstreamVera_0.1.1 dplyr_1.2.1             gtable_0.3.6           
#>  [97] yulab.utils_0.2.4       sass_0.4.10             digest_0.6.39          
#> [100] fontquiver_0.2.1        BiocGenerics_0.58.1     ggrepel_0.9.8          
#> [103] ggplotify_0.1.3         htmlwidgets_1.6.4       farver_2.1.2           
#> [106] memoise_2.0.1           htmltools_0.5.9         lifecycle_1.0.5        
#> [109] httr_1.4.8              GO.db_3.23.1            fontLiberation_0.1.0   
#> [112] bit64_4.8.2             MASS_7.3-65