Chapter 5 TCM network construction
A core objective of network pharmacology is to reveal the multi-component, multi-target therapeutic mechanism of Traditional Chinese Medicine through network visualization. TCMDATA provides two complementary approaches for constructing and visualizing herb–molecule–target relationship networks: Sankey diagrams via ggalluvial[1] for depicting the directional flow of interactions, and force-directed network graphs via ggtangle[2] for revealing topological structure and hub nodes.
5.1 Preparing the data
All network visualizations start from the herb–molecule–target data frame returned by search_herb() or search_target().
library(TCMDATA)
library(dplyr)
# Query herbs and their molecular targets
herbs <- c("灵芝", "黄芪")
df <- search_herb(herb = herbs, type = "Herb_cn_name")
df |> head(10)#> herb molecule target
#> 1 lingzhi 3,4-Dihydroxybenzoic acid GAA
#> 2 lingzhi 3,4-Dihydroxybenzoic acid POLB
#> 3 lingzhi 3,4-Dihydroxybenzoic acid APP
#> 4 lingzhi 3,4-Dihydroxybenzoic acid CA1
#> 5 lingzhi 3,4-Dihydroxybenzoic acid CA12
#> 6 lingzhi 3,4-Dihydroxybenzoic acid CA14
#> 7 lingzhi 3,4-Dihydroxybenzoic acid CA2
#> 8 lingzhi 3,4-Dihydroxybenzoic acid CA4
#> 9 lingzhi 3,4-Dihydroxybenzoic acid CA6
#> 10 lingzhi 3,4-Dihydroxybenzoic acid CA7
The returned data frame contains three core columns: herb, molecule, and target, representing the tripartite relationship.
5.2 Sankey diagram visualization
The Sankey diagram displays herb → molecule → target flows, with band width proportional to shared frequency. tcm_sankey() is powered by ggalluvial[1] and supports rich color customization.
5.2.1 Basic usage
library(ggplot2)
## prepare input data
top_mol <- df %>%
count(molecule, sort = TRUE) %>%
head(5) %>%
pull(molecule)
df_sub <- df %>% filter(molecule %in% top_mol)
set.seed(2026)
sampled_targets <- df_sub %>%
distinct(target) %>%
sample_n(min(30, n())) %>%
pull(target)
df_sankey <- df_sub %>% filter(target %in% sampled_targets)
p_sankey <- tcm_sankey(df_sankey)
print(p_sankey)
Each vertical band represents a stratum (herb, molecule, or target), and the colored flows trace the associations between them. Wider bands indicate higher interaction frequency.
5.2.2 Customizing colors and appearance
tcm_sankey() offers fine-grained control over color palettes, font style, flow transparency, and curvature. Below is an example with a publication-ready color scheme:
p_sankey2 <- tcm_sankey(
df_sankey,
herb_cols = c("#C44E52", "#4C72B0"),
mol_cols = c("#55A868", "#DD8452", "#8172B3", "#C44E52",
"#DA8BC3"),
target_cols = c("#1A7A6D", "#BF6D2A", "#6A5ACD", "#C04779",
"#5B8C2A", "#D4A017", "#8B6914", "#607D8B"),
font_size = 3.5,
alpha = 0.4,
knot.pos = 0.25
)
print(p_sankey2)
5.3 Network graph with ggtangle
While the Sankey diagram excels at showing flows, a force-directed network layout can reveal topological structure, hub nodes, and community patterns. prepare_herb_graph() builds an igraph object with pre-computed node metrics, and ggtangle[2] renders it as a ggplot-based network.
5.3.1 Building the graph
library(igraph)
# Build the network (using a moderate subset for clarity)
g <- prepare_herb_graph(df, n = 60, compute_metrics = TRUE)
# Inspect graph summary
cat("Nodes:", vcount(g), "\n")#> Nodes: 55
#> Edges: 120
#> Node types: Herb, Molecule, Target
prepare_herb_graph() automatically:
- Creates directed edges: Herb → Molecule → Target
- Computes per-node metrics:
degree,centrality(eigenvector),betweenness,closeness,pagerank - Assigns
typelabels ("Herb","Molecule","Target") to each node
5.3.2 Basic network visualization
ggtangle[2] extends ggplot2 to natively support igraph objects. You can use familiar aesthetics (color, size, shape) and layers (geom_point, geom_text_repel).
library(ggtangle)
library(ggrepel)
# Define a publication-quality color palette
node_colors <- c(
"Herb" = "#C44E52",
"Molecule" = "#4C72B0",
"Target" = "#55A868"
)
# Define node shape mapping
node_shapes <- c(
"Herb" = 18, # diamond
"Molecule" = 16, # filled circle
"Target" = 15 # filled square
)
set.seed(42)
p_net <- ggplot(g, layout = "kk") +
geom_edge(alpha = 0.18, color = "grey60", linewidth = 0.3) +
geom_point(aes(color = type, size = centrality, shape = type), alpha = 0.85) +
scale_color_manual(values = node_colors, name = "Node Type") +
scale_shape_manual(values = node_shapes, name = "Node Type") +
scale_size_continuous(name = "Centrality", range = c(2, 9)) +
guides(
color = guide_legend(order = 1, override.aes = list(size = 4)),
shape = guide_legend(order = 1),
size = guide_legend(order = 2)
) +
theme_void() +
theme(
legend.position = "right",
legend.text = element_text(size = 10),
legend.title = element_text(size = 11, face = "bold"),
plot.margin = margin(5, 5, 5, 5)
)
print(p_net)
In this plot:
- Diamond nodes (■) represent herbs, circles (●) represent molecules, and squares (■) represent targets.
- Larger nodes have higher eigenvector centrality, indicating greater topological importance.
- The Kamada-Kawai (
"kk") layout positions strongly connected nodes closer together.
5.3.3 Adding node labels
For networks with many nodes, ggrepel::geom_text_repel() prevents label overlap. Here we label only high-centrality nodes (top 50%):
set.seed(42)
# Extract centrality threshold (top 50%)
cent_thresh <- quantile(V(g)$centrality, 0.5)
p_labeled <- ggplot(g, layout = "kk") +
geom_edge(alpha = 0.15, color = "grey55", linewidth = 0.3) +
geom_point(aes(color = type, size = centrality, shape = type), alpha = 0.85) +
geom_text_repel(
aes(label = ifelse(centrality >= cent_thresh, name, "")),
size = 2.8, fontface = "italic",
max.overlaps = 30, segment.alpha = 0.3,
segment.size = 0.3, segment.color = "grey50",
box.padding = 0.4, point.padding = 0.3
) +
scale_color_manual(values = node_colors, name = "Node Type") +
scale_shape_manual(values = node_shapes, name = "Node Type") +
scale_size_continuous(name = "Centrality", range = c(1.5, 10)) +
guides(
color = guide_legend(order = 1, override.aes = list(size = 4)),
shape = guide_legend(order = 1),
size = guide_legend(order = 2)
) +
theme_void() +
theme(
legend.position = "right",
legend.text = element_text(size = 10),
legend.title = element_text(size = 11, face = "bold"),
plot.margin = margin(5, 5, 5, 5)
)
print(p_labeled)
5.3.4 Alternative layouts
ggtangle supports all igraph layout algorithms. Common choices include:
library(aplot)
set.seed(42)
base_layers <- list(
geom_edge(alpha = 0.15, color = "grey55", linewidth = 0.3),
geom_point(aes(color = type, size = centrality, shape = type), alpha = 0.85),
scale_color_manual(values = node_colors, name = "Node Type"),
scale_shape_manual(values = node_shapes, name = "Node Type"),
scale_size_continuous(name = "Centrality", range = c(1.5, 7)),
guides(
color = guide_legend(order = 1, override.aes = list(size = 3.5)),
shape = guide_legend(order = 1),
size = guide_legend(order = 2)
),
theme_void(),
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 13),
legend.position = "bottom",
legend.text = element_text(size = 9),
legend.title = element_text(size = 10, face = "bold")
)
)
p_fr <- ggplot(g, layout = "fr") + base_layers + ggtitle("Fruchterman–Reingold")
p_kk <- ggplot(g, layout = "kk") + base_layers + ggtitle("Kamada–Kawai")
p_star <- ggplot(g, layout = "star") + base_layers + ggtitle("Star")
p_nicely <- ggplot(g, layout = "nicely") + base_layers + ggtitle("Nicely (auto)")
plot_list(p_fr, p_kk, p_star, p_nicely, ncol = 2)
5.3.5 Enriching the network with external data
Using the %<+% operator from ggfun, you can bind additional data to the network for richer visual encodings. Here, node size is mapped to degree (the number of direct interactions), and all node labels are displayed:
# Create a data frame with node-level metrics
node_df <- data.frame(
label = V(g)$name,
type = V(g)$type,
degree = V(g)$degree,
betweenness = V(g)$betweenness,
pagerank = V(g)$pagerank
)
set.seed(42)
p_enriched <- ggplot(g, layout = "kk") +
geom_edge(alpha = 0.15, color = "grey55", linewidth = 0.3) +
geom_point(aes(color = type, size = degree, shape = type), alpha = 0.85) +
geom_text_repel(
aes(label = name, color = type),
size = 2.5, fontface = "italic",
max.overlaps = 100, segment.alpha = 0.25,
segment.size = 0.2, segment.color = "grey60",
box.padding = 0.3, point.padding = 0.2,
show.legend = FALSE
) +
scale_color_manual(values = node_colors, name = "Node Type") +
scale_shape_manual(values = node_shapes, name = "Node Type") +
scale_size_continuous(name = "Degree", range = c(2, 11)) +
guides(
color = guide_legend(order = 1, override.aes = list(size = 4)),
shape = guide_legend(order = 1),
size = guide_legend(order = 2)
) +
theme_void() +
theme(
legend.position = "right",
legend.text = element_text(size = 10),
legend.title = element_text(size = 11, face = "bold"),
plot.margin = margin(10, 10, 10, 10)
)
print(p_enriched)
5.4 References
Brunson JC (2020). “ggalluvial: Layered Grammar for Alluvial Plots.” Journal of Open Source Software, 5(49), 2017. doi: 10.21105/joss.02017.
Yu G (2025). ggtangle: Draw Network with Data. R package version 0.1.1. doi: 10.32614/CRAN.package.ggtangle.
5.5 Session information
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] aplot_0.2.9 ggrepel_0.9.7 ggtangle_0.1.1 igraph_2.2.2
#> [5] ggplot2_4.0.2 dplyr_1.2.0 aplotExtra_0.0.4 ivolcano_0.0.5
#> [9] enrichplot_1.30.5 TCMDATA_0.0.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.3.0 gson_0.1.0 gridExtra_2.3
#> [4] rematch2_2.1.2 rlang_1.1.7 magrittr_2.0.4
#> [7] DOSE_4.4.0 compiler_4.5.2 RSQLite_2.4.6
#> [10] ggalluvial_0.12.6 png_0.1-8 systemfonts_1.3.2
#> [13] vctrs_0.7.1 reshape2_1.4.5 stringr_1.6.0
#> [16] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.2.0
#> [19] XVector_0.50.0 labeling_0.4.3 rmarkdown_2.30
#> [22] purrr_1.2.1 bit_4.6.0 xfun_0.56
#> [25] cachem_1.1.0 jsonlite_2.0.0 blob_1.3.0
#> [28] tidydr_0.0.6 tweenr_2.0.3 BiocParallel_1.44.0
#> [31] cluster_2.1.8.1 parallel_4.5.2 R6_2.6.1
#> [34] bslib_0.10.0 stringi_1.8.7 RColorBrewer_1.1-3
#> [37] DNAcopy_1.84.0 jquerylib_0.1.4 GOSemSim_2.36.0
#> [40] Rcpp_1.1.1 Seqinfo_1.0.0 bookdown_0.46
#> [43] knitr_1.51 R.utils_2.13.0 IRanges_2.44.0
#> [46] Matrix_1.7-4 splines_4.5.2 tidyselect_1.2.1
#> [49] qvalue_2.42.0 rstudioapi_0.18.0 yaml_2.3.12
#> [52] codetools_0.2-20 lattice_0.22-7 tibble_3.3.1
#> [55] plyr_1.8.9 withr_3.0.2 Biobase_2.70.0
#> [58] treeio_1.34.0 KEGGREST_1.50.0 S7_0.2.1
#> [61] evaluate_1.0.5 survival_3.8-3 gridGraphics_0.5-1
#> [64] polyclip_1.10-7 scatterpie_0.2.6 Biostrings_2.78.0
#> [67] pillar_1.11.1 ggtree_4.0.4 stats4_4.5.2
#> [70] clusterProfiler_4.18.4 ggfun_0.2.0 generics_0.1.4
#> [73] paletteer_1.7.0 S4Vectors_0.48.0 scales_1.4.0
#> [76] tidytree_0.4.7 glue_1.8.0 gdtools_0.5.0
#> [79] lazyeval_0.2.2 tools_4.5.2 ggnewscale_0.5.2
#> [82] data.table_1.18.2.1 fgsea_1.36.2 ggvenn_0.1.19
#> [85] forcats_1.0.1 ggiraph_0.9.6 maftools_2.26.0
#> [88] fs_1.6.7 fastmatch_1.1-8 cowplot_1.2.0
#> [91] grid_4.5.2 tidyr_1.3.2 ape_5.8-1
#> [94] ggstar_1.0.6 AnnotationDbi_1.72.0 nlme_3.1-168
#> [97] patchwork_1.3.2 ggforce_0.5.0 cli_3.6.5
#> [100] rappdirs_0.3.4 fontBitstreamVera_0.1.1 gtable_0.3.6
#> [103] R.methodsS3_1.8.2 yulab.utils_0.2.4 sass_0.4.10
#> [106] digest_0.6.39 fontquiver_0.2.1 BiocGenerics_0.56.0
#> [109] ggplotify_0.1.3 htmlwidgets_1.6.4 farver_2.1.2
#> [112] memoise_2.0.1 htmltools_0.5.9 R.oo_1.27.1
#> [115] lifecycle_1.0.5 httr_1.4.8 GO.db_3.22.0
#> [118] fontLiberation_0.1.0 bit64_4.6.0-1 MASS_7.3-65