Chapter 5 TCM network construction

A core objective of network pharmacology is to reveal the multi-component, multi-target therapeutic mechanism of Traditional Chinese Medicine through network visualization. TCMDATA provides two complementary approaches for constructing and visualizing herb–molecule–target relationship networks: Sankey diagrams via ggalluvial[1] for depicting the directional flow of interactions, and force-directed network graphs via ggtangle[2] for revealing topological structure and hub nodes.


5.1 Preparing the data

All network visualizations start from the herb–molecule–target data frame returned by search_herb() or search_target().

library(TCMDATA)
library(dplyr)

# Query herbs and their molecular targets
herbs <- c("灵芝", "黄芪")
df <- search_herb(herb = herbs, type = "Herb_cn_name")

df |> head(10)
#>       herb                  molecule target
#> 1  lingzhi 3,4-Dihydroxybenzoic acid    GAA
#> 2  lingzhi 3,4-Dihydroxybenzoic acid   POLB
#> 3  lingzhi 3,4-Dihydroxybenzoic acid    APP
#> 4  lingzhi 3,4-Dihydroxybenzoic acid    CA1
#> 5  lingzhi 3,4-Dihydroxybenzoic acid   CA12
#> 6  lingzhi 3,4-Dihydroxybenzoic acid   CA14
#> 7  lingzhi 3,4-Dihydroxybenzoic acid    CA2
#> 8  lingzhi 3,4-Dihydroxybenzoic acid    CA4
#> 9  lingzhi 3,4-Dihydroxybenzoic acid    CA6
#> 10 lingzhi 3,4-Dihydroxybenzoic acid    CA7

The returned data frame contains three core columns: herb, molecule, and target, representing the tripartite relationship.


5.2 Sankey diagram visualization

The Sankey diagram displays herb → molecule → target flows, with band width proportional to shared frequency. tcm_sankey() is powered by ggalluvial[1] and supports rich color customization.

5.2.1 Basic usage

library(ggplot2)

## prepare input data
top_mol <- df %>%
  count(molecule, sort = TRUE) %>%
  head(5) %>%
  pull(molecule)

df_sub <- df %>% filter(molecule %in% top_mol)

set.seed(2026)
sampled_targets <- df_sub %>%
  distinct(target) %>%
  sample_n(min(30, n())) %>%
  pull(target)

df_sankey <- df_sub %>% filter(target %in% sampled_targets)

p_sankey <- tcm_sankey(df_sankey)
print(p_sankey)

Each vertical band represents a stratum (herb, molecule, or target), and the colored flows trace the associations between them. Wider bands indicate higher interaction frequency.

5.2.2 Customizing colors and appearance

tcm_sankey() offers fine-grained control over color palettes, font style, flow transparency, and curvature. Below is an example with a publication-ready color scheme:

p_sankey2 <- tcm_sankey(
  df_sankey,
  herb_cols   = c("#C44E52", "#4C72B0"),
  mol_cols    = c("#55A868", "#DD8452", "#8172B3", "#C44E52",
                  "#DA8BC3"),
  target_cols = c("#1A7A6D", "#BF6D2A", "#6A5ACD", "#C04779",
                  "#5B8C2A", "#D4A017", "#8B6914", "#607D8B"),
  font_size   = 3.5,
  alpha       = 0.4,
  knot.pos    = 0.25
)
print(p_sankey2)


5.3 Network graph with ggtangle

While the Sankey diagram excels at showing flows, a force-directed network layout can reveal topological structure, hub nodes, and community patterns. prepare_herb_graph() builds an igraph object with pre-computed node metrics, and ggtangle[2] renders it as a ggplot-based network.

5.3.1 Building the graph

library(igraph)

# Build the network (using a moderate subset for clarity)
g <- prepare_herb_graph(df, n = 60, compute_metrics = TRUE)

# Inspect graph summary
cat("Nodes:", vcount(g), "\n")
#> Nodes: 55
cat("Edges:", ecount(g), "\n")
#> Edges: 120
cat("Node types:", paste(unique(V(g)$type), collapse = ", "), "\n")
#> Node types: Herb, Molecule, Target

prepare_herb_graph() automatically:

  • Creates directed edges: Herb → Molecule → Target
  • Computes per-node metrics: degree, centrality (eigenvector), betweenness, closeness, pagerank
  • Assigns type labels ("Herb", "Molecule", "Target") to each node

5.3.2 Basic network visualization

ggtangle[2] extends ggplot2 to natively support igraph objects. You can use familiar aesthetics (color, size, shape) and layers (geom_point, geom_text_repel).

library(ggtangle)
library(ggrepel)

# Define a publication-quality color palette
node_colors <- c(
  "Herb"     = "#C44E52",
  "Molecule" = "#4C72B0",
  "Target"   = "#55A868"
)

# Define node shape mapping
node_shapes <- c(
  "Herb"     = 18,   # diamond
  "Molecule" = 16,   # filled circle
  "Target"   = 15    # filled square
)

set.seed(42)
p_net <- ggplot(g, layout = "kk") +
  geom_edge(alpha = 0.18, color = "grey60", linewidth = 0.3) +
  geom_point(aes(color = type, size = centrality, shape = type), alpha = 0.85) +
  scale_color_manual(values = node_colors, name = "Node Type") +
  scale_shape_manual(values = node_shapes, name = "Node Type") +
  scale_size_continuous(name = "Centrality", range = c(2, 9)) +
  guides(
    color = guide_legend(order = 1, override.aes = list(size = 4)),
    shape = guide_legend(order = 1),
    size  = guide_legend(order = 2)
  ) +
  theme_void() +
  theme(
    legend.position  = "right",
    legend.text      = element_text(size = 10),
    legend.title     = element_text(size = 11, face = "bold"),
    plot.margin      = margin(5, 5, 5, 5)
  )

print(p_net)

In this plot:

  • Diamond nodes () represent herbs, circles () represent molecules, and squares () represent targets.
  • Larger nodes have higher eigenvector centrality, indicating greater topological importance.
  • The Kamada-Kawai ("kk") layout positions strongly connected nodes closer together.

5.3.3 Adding node labels

For networks with many nodes, ggrepel::geom_text_repel() prevents label overlap. Here we label only high-centrality nodes (top 50%):

set.seed(42)

# Extract centrality threshold (top 50%)
cent_thresh <- quantile(V(g)$centrality, 0.5)

p_labeled <- ggplot(g, layout = "kk") +
  geom_edge(alpha = 0.15, color = "grey55", linewidth = 0.3) +
  geom_point(aes(color = type, size = centrality, shape = type), alpha = 0.85) +
  geom_text_repel(
    aes(label = ifelse(centrality >= cent_thresh, name, "")),
    size = 2.8, fontface = "italic",
    max.overlaps = 30, segment.alpha = 0.3,
    segment.size = 0.3, segment.color = "grey50",
    box.padding = 0.4, point.padding = 0.3
  ) +
  scale_color_manual(values = node_colors, name = "Node Type") +
  scale_shape_manual(values = node_shapes, name = "Node Type") +
  scale_size_continuous(name = "Centrality", range = c(1.5, 10)) +
  guides(
    color = guide_legend(order = 1, override.aes = list(size = 4)),
    shape = guide_legend(order = 1),
    size  = guide_legend(order = 2)
  ) +
  theme_void() +
  theme(
    legend.position  = "right",
    legend.text      = element_text(size = 10),
    legend.title     = element_text(size = 11, face = "bold"),
    plot.margin      = margin(5, 5, 5, 5)
  )

print(p_labeled)

5.3.4 Alternative layouts

ggtangle supports all igraph layout algorithms. Common choices include:

library(aplot)

set.seed(42)
base_layers <- list(
  geom_edge(alpha = 0.15, color = "grey55", linewidth = 0.3),
  geom_point(aes(color = type, size = centrality, shape = type), alpha = 0.85),
  scale_color_manual(values = node_colors, name = "Node Type"),
  scale_shape_manual(values = node_shapes, name = "Node Type"),
  scale_size_continuous(name = "Centrality", range = c(1.5, 7)),
  guides(
    color = guide_legend(order = 1, override.aes = list(size = 3.5)),
    shape = guide_legend(order = 1),
    size  = guide_legend(order = 2)
  ),
  theme_void(),
  theme(
    plot.title       = element_text(hjust = 0.5, face = "bold", size = 13),
    legend.position  = "bottom",
    legend.text      = element_text(size = 9),
    legend.title     = element_text(size = 10, face = "bold")
  )
)

p_fr     <- ggplot(g, layout = "fr")     + base_layers + ggtitle("Fruchterman–Reingold")
p_kk     <- ggplot(g, layout = "kk")     + base_layers + ggtitle("Kamada–Kawai")
p_star   <- ggplot(g, layout = "star")   + base_layers + ggtitle("Star")
p_nicely <- ggplot(g, layout = "nicely") + base_layers + ggtitle("Nicely (auto)")

plot_list(p_fr, p_kk, p_star, p_nicely, ncol = 2)

5.3.5 Enriching the network with external data

Using the %<+% operator from ggfun, you can bind additional data to the network for richer visual encodings. Here, node size is mapped to degree (the number of direct interactions), and all node labels are displayed:

# Create a data frame with node-level metrics
node_df <- data.frame(
  label       = V(g)$name,
  type        = V(g)$type,
  degree      = V(g)$degree,
  betweenness = V(g)$betweenness,
  pagerank    = V(g)$pagerank
)

set.seed(42)
p_enriched <- ggplot(g, layout = "kk") +
  geom_edge(alpha = 0.15, color = "grey55", linewidth = 0.3) +
  geom_point(aes(color = type, size = degree, shape = type), alpha = 0.85) +
  geom_text_repel(
    aes(label = name, color = type),
    size = 2.5, fontface = "italic",
    max.overlaps = 100, segment.alpha = 0.25,
    segment.size = 0.2, segment.color = "grey60",
    box.padding = 0.3, point.padding = 0.2,
    show.legend = FALSE
  ) +
  scale_color_manual(values = node_colors, name = "Node Type") +
  scale_shape_manual(values = node_shapes, name = "Node Type") +
  scale_size_continuous(name = "Degree", range = c(2, 11)) +
  guides(
    color = guide_legend(order = 1, override.aes = list(size = 4)),
    shape = guide_legend(order = 1),
    size  = guide_legend(order = 2)
  ) +
  theme_void() +
  theme(
    legend.position  = "right",
    legend.text      = element_text(size = 10),
    legend.title     = element_text(size = 11, face = "bold"),
    plot.margin      = margin(10, 10, 10, 10)
  )

print(p_enriched)


5.4 References

  1. Brunson JC (2020). “ggalluvial: Layered Grammar for Alluvial Plots.” Journal of Open Source Software, 5(49), 2017. doi: 10.21105/joss.02017.

  2. Yu G (2025). ggtangle: Draw Network with Data. R package version 0.1.1. doi: 10.32614/CRAN.package.ggtangle.


5.5 Session information

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] aplot_0.2.9        ggrepel_0.9.7      ggtangle_0.1.1     igraph_2.2.2      
#>  [5] ggplot2_4.0.2      dplyr_1.2.0        aplotExtra_0.0.4   ivolcano_0.0.5    
#>  [9] enrichplot_1.30.5  TCMDATA_0.0.0.9000
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.3.0               gson_0.1.0              gridExtra_2.3          
#>   [4] rematch2_2.1.2          rlang_1.1.7             magrittr_2.0.4         
#>   [7] DOSE_4.4.0              compiler_4.5.2          RSQLite_2.4.6          
#>  [10] ggalluvial_0.12.6       png_0.1-8               systemfonts_1.3.2      
#>  [13] vctrs_0.7.1             reshape2_1.4.5          stringr_1.6.0          
#>  [16] pkgconfig_2.0.3         crayon_1.5.3            fastmap_1.2.0          
#>  [19] XVector_0.50.0          labeling_0.4.3          rmarkdown_2.30         
#>  [22] purrr_1.2.1             bit_4.6.0               xfun_0.56              
#>  [25] cachem_1.1.0            jsonlite_2.0.0          blob_1.3.0             
#>  [28] tidydr_0.0.6            tweenr_2.0.3            BiocParallel_1.44.0    
#>  [31] cluster_2.1.8.1         parallel_4.5.2          R6_2.6.1               
#>  [34] bslib_0.10.0            stringi_1.8.7           RColorBrewer_1.1-3     
#>  [37] DNAcopy_1.84.0          jquerylib_0.1.4         GOSemSim_2.36.0        
#>  [40] Rcpp_1.1.1              Seqinfo_1.0.0           bookdown_0.46          
#>  [43] knitr_1.51              R.utils_2.13.0          IRanges_2.44.0         
#>  [46] Matrix_1.7-4            splines_4.5.2           tidyselect_1.2.1       
#>  [49] qvalue_2.42.0           rstudioapi_0.18.0       yaml_2.3.12            
#>  [52] codetools_0.2-20        lattice_0.22-7          tibble_3.3.1           
#>  [55] plyr_1.8.9              withr_3.0.2             Biobase_2.70.0         
#>  [58] treeio_1.34.0           KEGGREST_1.50.0         S7_0.2.1               
#>  [61] evaluate_1.0.5          survival_3.8-3          gridGraphics_0.5-1     
#>  [64] polyclip_1.10-7         scatterpie_0.2.6        Biostrings_2.78.0      
#>  [67] pillar_1.11.1           ggtree_4.0.4            stats4_4.5.2           
#>  [70] clusterProfiler_4.18.4  ggfun_0.2.0             generics_0.1.4         
#>  [73] paletteer_1.7.0         S4Vectors_0.48.0        scales_1.4.0           
#>  [76] tidytree_0.4.7          glue_1.8.0              gdtools_0.5.0          
#>  [79] lazyeval_0.2.2          tools_4.5.2             ggnewscale_0.5.2       
#>  [82] data.table_1.18.2.1     fgsea_1.36.2            ggvenn_0.1.19          
#>  [85] forcats_1.0.1           ggiraph_0.9.6           maftools_2.26.0        
#>  [88] fs_1.6.7                fastmatch_1.1-8         cowplot_1.2.0          
#>  [91] grid_4.5.2              tidyr_1.3.2             ape_5.8-1              
#>  [94] ggstar_1.0.6            AnnotationDbi_1.72.0    nlme_3.1-168           
#>  [97] patchwork_1.3.2         ggforce_0.5.0           cli_3.6.5              
#> [100] rappdirs_0.3.4          fontBitstreamVera_0.1.1 gtable_0.3.6           
#> [103] R.methodsS3_1.8.2       yulab.utils_0.2.4       sass_0.4.10            
#> [106] digest_0.6.39           fontquiver_0.2.1        BiocGenerics_0.56.0    
#> [109] ggplotify_0.1.3         htmlwidgets_1.6.4       farver_2.1.2           
#> [112] memoise_2.0.1           htmltools_0.5.9         R.oo_1.27.1            
#> [115] lifecycle_1.0.5         httr_1.4.8              GO.db_3.22.0           
#> [118] fontLiberation_0.1.0    bit64_4.6.0-1           MASS_7.3-65