Chapter 7 PPI network analysis
Protein–protein interaction (PPI) networks are essential for deciphering the functional relationships among disease-related targets. TCMDATA provides a comprehensive PPI analysis pipeline that integrates data retrieval from STRING, network filtering, topological metric computation, community detection, and publication-ready visualization.
7.1 Retrieving PPI data from STRING
clusterProfiler provides getPPI() function to retrieve Protein-Protein Interaction (PPI) information from the STRING database and return an igraph object containing known and predicted protein–protein interactions, with edge weights representing combined confidence scores (range 0–1).
User can retrieve PPI network for a given set of genes using getPPI() by specifying the taxonomic ID (e.g., 9606 for human, 10090 for mouse):
library(TCMDATA)
library(clusterProfiler)
data("dn_gcds")
ppi <- getPPI(dn_gcds[1:50], taxID = 9606)In this example, we use a demo ppi from Diabetic Nephropathy (DN) gene set to illustrate the workflow:
#> IGRAPH 3f74811 UN-- 58 611 --
#> + attr: name (v/c), score (e/n)
#> + edges from 3f74811 (vertex names):
#> [1] SLC2A3 --SERPINE1 SLC2A3 --MYC RELB --CCL3 RELB --ATF3
#> [5] RELB --CXCR4 RELB --SELE RELB --KLF6 RELB --CXCL1
#> [9] RELB --CEBPB RELB --VCAM1 RELB --CCL20 RELB --FOS
#> [13] RELB --PTGS2 RELB --IL1RN RELB --MYC RELB --CCL2
#> [17] RELB --JUN RELB --CXCL8 RELB --ICAM1 RELB --TRAF1
#> [21] RELB --IL6 RELB --IL1B RELB --BIRC3 RELB --NFKB2
#> [25] SERPINE1--CCL4 SERPINE1--MCL1 SERPINE1--S100A8 SERPINE1--IL1RN
#> [29] SERPINE1--EGR1 SERPINE1--FOS SERPINE1--IL18 SERPINE1--CCL3
#> + ... omitted several edges
The returned igraph object contains a score edge attribute representing STRING combined confidence scores, where higher values indicate stronger evidence for the interaction.
7.2 Network filtering
Raw PPI networks often contain low-confidence edges that introduce noise. ppi_subset() provides a two-stage filter:
- Edge score cutoff — removes edges below a confidence threshold.
- Top-n degree filter — retains only the n most connected nodes.
In this case, we keep edges with a STRING score ≥ 0.7 and then retain the top 100 nodes by degree:
ppi_filtered <- ppi_subset(ppi, score_cutoff = 0.7, n = 100)
cat("Before filtering:", vcount(ppi), "nodes,", ecount(ppi), "edges\n")#> Before filtering: 58 nodes, 611 edges
#> After filtering: 49 nodes, 196 edges
| Parameter | Description | Default |
|---|---|---|
score_cutoff |
Minimum edge confidence to retain | 0.7 |
n |
Top-n nodes by degree (NULL = all) | NULL |
rm_isolates |
Remove nodes with degree 0 after filtering | TRUE |
7.3 Topological metrics
Topological analysis of PPI networks is fundamental for identifying disease-critical nodes. By quantifying centrality metrics such as degree, betweenness, and closeness, researchers can pinpoint hub proteins that serve as key regulators, bottleneck nodes that control information flow, and bridge proteins that connect functional modules. These topologically important nodes often represent promising therapeutic targets.
Traditionally, most network pharmacology studies rely on Cytoscape, which is a widely adopted desktop application, and its CytoHubba plugin[1] to compute centrality metrics for target prioritization. While effective, this workflow requires exporting data from other software, manual operation in a GUI environment, and re-importing results, which disrupts the analytical pipeline and reduces reproducibility.
TCMDATA addresses these limitations with compute_nodeinfo(), which calculates a comprehensive set of centrality measures directly within the R ecosystem. This approach offers two key advantages:
- Workflow integration — All analyses remain in R, eliminating context-switching and ensuring full reproducibility through scripted pipelines.
- Extended metric coverage — Beyond the 11 metrics provided by CytoHubba,
compute_nodeinfo()includes additional measures, enabling more comprehensive topological characterization.
In this example, we calculate a panel of 19 topological metrics for the filtered PPI network, using the score edge attribute as weights where applicable:
#> | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
#> Class 'igraph' hidden list of 10
#> $ : num 49
#> $ : logi FALSE
#> $ : num [1:196] 11 31 37 14 17 12 10 26 38 2 ...
#> $ : num [1:196] 0 0 1 1 1 1 1 1 1 1 ...
#> $ : NULL
#> $ : NULL
#> $ : NULL
#> $ : NULL
#> $ :List of 4
#> ..$ : num [1:3] 1 0 1
#> ..$ : Named list()
#> ..$ :List of 20
#> .. ..$ name : chr [1:49] "RELB" "SERPINE1" "CCL2" "EGR1" ...
#> .. ..$ degree : num [1:49] 2 9 20 12 6 5 2 7 7 6 ...
#> .. ..$ strength : num [1:49] 1.72 7.22 17.41 10.35 5.46 ...
#> .. ..$ betweenness : num [1:49] 0 0.012866 0.052748 0.018597 0.000148 ...
#> .. ..$ betweenness_w : num [1:49] 0 0.00709 0.01418 0.00621 0 ...
#> .. ..$ closeness : num [1:49] 0.319 0.465 0.59 0.511 0.397 ...
#> .. ..$ closeness_w : num [1:49] 0.28 0.395 0.511 0.441 0.359 ...
#> .. ..$ eigen_centrality: num [1:49] 0.0198 0.3747 0.8258 0.3252 0.196 ...
#> .. ..$ pagerank : num [1:49] 0.0105 0.0198 0.0416 0.0287 0.0158 ...
#> .. ..$ coreness : num [1:49] 2 7 8 6 6 4 2 6 6 6 ...
#> .. ..$ clustering_coef : num [1:49] 1 0.611 0.453 0.545 0.933 ...
#> .. ..$ eccentricity : num [1:49] 5 4 4 4 5 4 5 5 4 4 ...
#> .. ..$ is_articulation : logi [1:49] FALSE FALSE FALSE FALSE FALSE FALSE ...
#> .. ..$ MCC : num [1:49] 2 5043 28584 1254 240 ...
#> .. ..$ MNC : num [1:49] 2 8 20 12 6 5 2 7 7 6 ...
#> .. ..$ DMNC : num [1:49] 0.308 0.641 0.528 0.527 0.666 ...
#> .. ..$ BN : num [1:49] 1 3 23 5 1 2 1 1 2 1 ...
#> .. ..$ radiality : num [1:49] 3.85 4.79 5.23 4.98 4.44 ...
#> .. ..$ Stress : num [1:49] 0 176 836 278 2 52 0 20 180 2 ...
#> .. ..$ EPC : num [1:49] 25.7 40.4 40.6 40.5 39.9 ...
#> ..$ :List of 1
#> .. ..$ score: num [1:196] 0.717 0.999 0.701 0.719 0.736 0.742 0.746 0.815 0.849 0.918 ...
#> $ :<environment: 0x555f92c95130>
The following metrics are computed and stored as vertex attributes:
| Category | Metric | Description |
|---|---|---|
| Local | degree |
Number of direct neighbors |
strength |
Sum of incident edge weights | |
clustering_coef |
Local clustering coefficient (transitivity) | |
coreness |
k-core decomposition level | |
| Global | betweenness |
Fraction of all-pairs shortest paths passing through the node |
closeness |
Inverse of average shortest-path distance to all other nodes | |
eccentricity |
Maximum shortest-path distance to any reachable node | |
eigen_centrality |
Influence score based on connections to high-scoring neighbors | |
pagerank |
Importance based on random-walk visitation probability | |
| CytoHubba[1] | MCC |
Maximal Clique Centrality — sum of ( |
MNC |
Maximum Neighborhood Component — size of the largest connected component among neighbors | |
DMNC |
Density of MNC — edge density of the MNC, penalized by component size (α = 1.7) | |
BN |
BottleNeck — frequency of being a high-flow node in shortest-path trees | |
EPC |
Edge Percolated Component — average component size under random edge removal | |
radiality |
Radiality — average gain in reachability relative to graph diameter | |
Stress |
Stress centrality — total count of shortest paths passing through the node |
7.3.1 Integrated ranking
rank_ppi_nodes() normalizes all selected metrics to [0, 1], applies user-defined weights (equal by default), and produces a composite score for target prioritization:
rank_res <- rank_ppi_nodes(ppi_scored, use_weight = TRUE)
# Extract ranked table
ppi_ranked <- rank_res$graph
rank_df <- rank_res$table
rank_df[1:10, c("name", "degree", "betweenness_w", "closeness_w",
"MCC", "MNC", "EPC", "Score_network", "Rank_network")]#> name degree betweenness_w closeness_w MCC MNC EPC Score_network
#> 39 IL6 27 0.26684397 0.5901176 46520 27 40.575 0.8073885
#> 18 CXCL8 22 0.19414894 0.5440404 45146 20 40.575 0.7192179
#> 11 IL1B 23 0.07180851 0.5582129 45534 23 40.575 0.6802181
#> 3 CCL2 20 0.01418440 0.5112998 28584 20 40.575 0.6530773
#> 33 JUN 20 0.21808511 0.5305687 7618 20 40.575 0.6171510
#> 17 FOS 18 0.11968085 0.4945311 2563 17 40.575 0.5574060
#> 37 CXCL1 14 0.00000000 0.4682921 37584 14 40.575 0.5174125
#> 13 ICAM1 14 0.01063830 0.4368779 18144 14 40.575 0.4709080
#> 27 PTGS2 14 0.00000000 0.4691629 13320 14 40.575 0.4682695
#> 48 CCL4 10 0.00000000 0.4082640 30240 10 40.575 0.4675394
#> Rank_network
#> 39 1
#> 18 2
#> 11 3
#> 3 4
#> 33 5
#> 17 6
#> 37 7
#> 13 8
#> 27 9
#> 48 10
7.3.2 Radar plot of topological metrics
For a target of interest, get_node_profile() + radar_plot() produces a radar chart showing its normalized centrality fingerprint across multiple dimensions:
library(aplot)
# Pick the top 2 ranked nodes
top_nodes <- rank_df$name[1:2]
p_radar1 <- radar_plot(
get_node_profile(rank_df, top_nodes[1]),
fill_color = "#A3BEDD", line_color = "#4A7FB5",
title = top_nodes[1]
)
p_radar2 <- radar_plot(
get_node_profile(rank_df, top_nodes[2]),
fill_color = "#D59390", line_color = "#B5524A",
title = top_nodes[2]
)
plot_list(p_radar1, p_radar2, ncol = 2)
7.3.3 Heatmap of topological metrics
The heatmap provides a global overview of selected nodes across multiple topological metrics simultaneously. plot_node_heatmap() generates a Z-score normalized heatmap using ComplexHeatmap, with hierarchical clustering to reveal nodes with similar centrality profiles:
# Select key metrics for visualization
selected_cols <- c("degree", "betweenness", "closeness", "MCC", "MNC",
"DMNC", "EPC", "radiality", "Stress")
plot_node_heatmap(rank_df, select_cols = selected_cols)
7.4 Community detection
Identifying densely connected subnetworks (modules) helps reveal functional protein complexes and signaling cascades. TCMDATA supports three complementary algorithms.
7.4.1 Louvain modularity optimization
The Louvain algorithm[3] is a fast and scalable community detection method that optimizes network modularity through a hierarchical, greedy approach. It iteratively assigns nodes to communities that maximize the modularity gain, then aggregates communities into super-nodes and repeats the process. Due to its computational efficiency and ability to handle large-scale networks, Louvain has become one of the most popular algorithms for detecting functional modules in biological networks.
ppi_louvain <- run_louvain(ppi_scored, resolution = 1.0)
louvain_scores <- add_cluster_score(ppi_louvain, cluster_attr = "louvain_cluster", min_size = 3)
head(louvain_scores)#> Cluster_ID Score Nodes Edges Density Gene_List
#> 1 2 9.810 22 103 0.446 SERPINE1, CCL2, LIF, IL1RN, IL1B
#> 2 3 6.111 19 55 0.322 EGR1, EGR2, SNAI1, SOX9, FOSB
#> 3 1 2.667 4 4 0.667 RELB, BIRC3, NFKB2, TRAF1
#> Full_Genes
#> 1 SERPINE1,CCL2,LIF,IL1RN,IL1B,ICAM1,IL18,VCAM1,CXCL8,SOCS3,SELE,KLRD1,PLAUR,C5AR1,CCL20,PTGS2,CXCL1,IL6,CXCR4,FCGR3B,CCL3,CCL4
#> 2 EGR1,EGR2,SNAI1,SOX9,FOSB,CEBPB,FOS,PMAIP1,ATF3,MCL1,DUSP5,JUN,KLF4,CDKN1A,CCN1,KLF6,ZFP36,MYC,MAFF
#> 3 RELB,BIRC3,NFKB2,TRAF1
The resolution parameter controls clustering granularity: values > 1 produce more, smaller clusters; values < 1 yield fewer, larger clusters.
7.4.2 MCODE (Molecular Complex Detection)
MCODE (Molecular Complex Detection)[2] is a classic graph-clustering algorithm specifically designed for identifying densely connected regions in PPI networks. Originally implemented as a Cytoscape plugin, MCODE has become one of the most widely adopted methods for detecting protein complexes and functional modules in network pharmacology studies.
The algorithm operates in three phases: (1) vertex weighting based on local neighborhood density (k-core × density), (2) seeded growth from high-scoring seed nodes, and (3) post-processing via haircut (removing singly-connected peripheral nodes) or fluff (expanding with dense neighbors).
TCMDATA provides run_mcode(), a native R implementation of the MCODE algorithm, enabling seamless integration into scripted workflows without requiring external software. This eliminates the need to export networks to Cytoscape and manually run the plugin, significantly improving reproducibility and efficiency in PPI analysis pipelines.
ppi_mcode <- run_mcode(ppi_scored, vwp = 0.2, haircut = TRUE, fluff = FALSE)
mcode_res <- get_mcode_res(ppi_mcode, only_clusters = TRUE)
head(mcode_res)#> name mcode_score cluster module_score is_seed
#> 1 IL18 6.236364 Module_1 8.6 FALSE
#> 2 CCL3 6.236364 Module_1 8.6 FALSE
#> 3 SERPINE1 7.000000 Module_1 8.6 FALSE
#> 4 VCAM1 7.000000 Module_1 8.6 FALSE
#> 5 IL1B 7.151515 Module_1 8.6 FALSE
#> 6 CXCL8 7.151515 Module_1 8.6 FALSE
7.4.3 MCL (Markov Clustering)
The Markov Clustering (MCL) algorithm[4] detects community structure by simulating stochastic flow (random walks) on the network. It alternates between two operations: expansion (matrix squaring to allow flow to spread) and inflation (element-wise exponentiation to strengthen strong connections and weaken weak ones). This process naturally separates the network into well-defined clusters based on flow patterns. MCL is particularly effective for PPI networks because protein complexes tend to form densely connected regions that trap random walks.
ppi_mcl <- run_MCL(ppi_scored, inflation = 2.5)
mcl_scores <- add_cluster_score(ppi_mcl, cluster_attr = "MCL_cluster", min_size = 3)
head(mcl_scores)#> Cluster_ID Score Nodes Edges Density Gene_List
#> 1 7 10.211 20 97 0.511 SERPINE1, CCL2, LIF, IL1RN, IL1B
#> 2 2 5.000 11 25 0.455 EGR1, EGR2, FOSB, CEBPB, FOS
#> 3 5 3.200 6 8 0.533 SNAI1, SOX9, ATF3, JUN, KLF6
#> 4 1 2.667 4 4 0.667 RELB, BIRC3, NFKB2, TRAF1
#> Full_Genes
#> 1 SERPINE1,CCL2,LIF,IL1RN,IL1B,ICAM1,IL18,VCAM1,CXCL8,SOCS3,SELE,PLAUR,C5AR1,CCL20,PTGS2,CXCL1,IL6,CXCR4,CCL3,CCL4
#> 2 EGR1,EGR2,FOSB,CEBPB,FOS,DUSP5,KLF4,CDKN1A,CCN1,ZFP36,MYC
#> 3 SNAI1,SOX9,ATF3,JUN,KLF6,MAFF
#> 4 RELB,BIRC3,NFKB2,TRAF1
The inflation parameter controls cluster granularity: higher values (e.g., 3–5) yield tighter, more granular clusters; lower values (e.g., 1.5–2) produce larger, more inclusive modules.
7.5 PPI network robustness analysis
Assessing the robustness of PPI networks to node perturbation is crucial for understanding the resilience of biological systems and identifying critical vulnerabilities. The ppi_knock() function simulates targeted node removal (knockout) and evaluates the impact on network integrity by comparing against randomized null models, following the drug attack model proposed by Xi et al. (2022)[6].
7.5.1 Algorithm overview
The algorithm tracks four network-level topological metrics — ASPL (Average Shortest Path Length), AD (Average Degree), DC (Degree Centralization), and CC (Closeness Centrality) — and proceeds in four stages:
- Baseline: Compute the four metrics on the intact network.
- Real attack: Remove the target nodes and re-compute metrics. The Robustness Index (RI) quantifies relative change: \(RI = (M_{after} - M_{before}) / M_{before}\).
- Permutation null: Rewire the network \(n\) times (preserving degree sequence), shuffle edge weights, randomly knock out the same number of nodes, and compute the random RI distribution.
- Z-score normalization: \(Z = (RI_{real} - \mu_{null}) / \sigma_{null}\), with p-values derived via normal approximation: \(p = 2\Phi(-|Z|)\).
The Total Score integrates all metrics: \(Total = Z_{ASPL} - Z_{AD} - Z_{DC} - Z_{CC}\). A large positive value indicates the knocked-out targets are structurally critical. The Total P-value is derived by comparing the real combined RI against the permutation null distribution of combined RIs.
7.5.2 Example: knocking out IL6
#> Metric Baseline Post_KO Raw_RI Mu_Random Sd_Random Normalized_RI
#> 1 ASPL 2.7306271 2.8533420 0.04494018 0.001669616 0.010447079 4.141881
#> 2 AD 8.0000000 7.0416667 -0.11979167 -0.022552083 0.038223524 -2.543972
#> 3 DC 0.3958333 0.3182624 -0.19596865 -0.012564390 0.046719451 -3.925651
#> 4 CC 0.4673347 0.4485911 -0.04010743 -0.001194393 0.009824485 -3.960822
#> Pvalue
#> 1 3.444689e-05
#> 2 1.095998e-02
#> 3 8.649551e-05
#> 4 7.469231e-05
#> Total Score: 14.57233
#> Total P-value: 5.273783e-05
The results show that removing IL6 causes ASPL to increase (network paths become longer) while AD, DC, and CC decrease (connectivity drops), all deviating significantly from the random null (|Z| > 1.96, p < 0.05 for all metrics). The high Total Score (with a highly significant p-value) confirms IL6 as a structurally critical hub in this PPI network.
7.5.3 Visualization: before vs. after knockout
To visually compare network topology before and after knockout, we plot the intact network (with the target highlighted in red) alongside the post-knockout network:
library(ggtangle)
library(igraph)
library(ggplot2)
library(ggrepel)
library(aplot)
target_node <- "IL6"
## prepare before / after igraph objects
g_before <- ppi_ranked
V(g_before)$is_target <- ifelse(V(g_before)$name == target_node, "Target", "Other")
g_after <- delete_vertices(g_before, target_node)
## network before knockout
set.seed(2025)
p_before <- ggplot(g_before, layout = "fr") +
geom_edge(alpha = 0.10, color = "grey65") +
geom_point(aes(color = is_target, size = degree), alpha = 0.85) +
geom_text_repel(aes(label = name), size = 2.5, max.overlaps = 25,
segment.alpha = 0.3, fontface = "italic") +
scale_color_manual(values = c(Other = "#56B4E9", Target = "#D55E00"),
guide = "none") +
scale_size_continuous(range = c(2, 8), guide = "none") +
ggtitle("Before knockout") +
theme_void() +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 12))
## network after knockout
set.seed(2025)
p_after <- ggplot(g_after, layout = "fr") +
geom_edge(alpha = 0.10, color = "grey65") +
geom_point(aes(color = degree, size = Score_network), alpha = 0.85) +
geom_text_repel(aes(label = name), size = 2.5, max.overlaps = 25,
segment.alpha = 0.3, fontface = "italic") +
scale_color_gradient(low = "#56B4E9", high = "#D55E00", name = "Degree") +
scale_size_continuous(range = c(2, 8), name = "Score") +
ggtitle(paste0("After knockout (", target_node, " removed)")) +
theme_void() +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 12))
plot_list(p_before, p_after, ncol = 2, labels = c("A", "B"))
- Panel A shows the intact network with IL6 highlighted in red, revealing its central position and high degree.
- Panel B shows the network after removing IL6 — the loss of its connections visibly fragments local connectivity.
7.6 PPI network visualization
High-quality network visualization is essential for interpreting PPI analysis results and preparing publication-ready figures. While Cytoscape remains a popular choice for network visualization, it requires manual GUI operations that are difficult to reproduce and integrate into automated workflows.
ggtangle[5] addresses these limitations by providing a ggplot2-native framework for network rendering. Key advantages include:
- Full
ggplot2compatibility — Seamless integration with the grammar of graphics; supports allggplot2themes, scales, and annotations. - Reproducibility — All visualizations are generated programmatically, ensuring consistent results across runs.
- Publication-ready output — Fine-grained control over node aesthetics, edge styling, and layout algorithms for SCI-standard figures.
- Direct
igraphsupport — Acceptsigraphobjects directly without format conversion.
7.6.1 Basic network with topological mapping
The following example maps node degree to color and composite score to size, using the Fruchterman-Reingold force-directed layout:
library(ggtangle)
library(ggplot2)
library(ggrepel)
set.seed(2025)
ggplot(ppi_ranked, layout = "fr") +
geom_edge(alpha = 0.12, color = "grey60") +
geom_point(aes(color = degree, size = Score_network), alpha = 0.85) +
geom_text_repel(aes(label = name), size = 2.5, max.overlaps = 30, segment.alpha = 0.3) +
scale_color_gradient(low = "#56B4E9", high = "#D55E00", name = "Degree") +
scale_size_continuous(range = c(2, 8), name = "Score") +
theme_void() +
theme(
legend.position = "right",
legend.title = element_text(face = "bold", size = 10)
)
7.6.2 Community-colored network
Overlaying cluster assignments reveals modular organization. Hub nodes (top 15% by degree) are labeled using ggrepel to avoid text overlap:
library(ggrepel)
set.seed(2025)
ggplot(ppi_louvain, layout = "fr") +
geom_edge(alpha = 0.10, color = "grey65") +
geom_point(aes(color = louvain_cluster, size = degree), alpha = 0.85) +
geom_text_repel(aes(label = name), size = 2.5, max.overlaps = 30, segment.alpha = 0.3) +
scale_color_brewer(palette = "Set2", name = "Module") +
scale_size_continuous(range = c(2, 9), name = "Degree") +
theme_void() +
theme(
legend.position = "right",
legend.title = element_text(face = "bold", size = 10)
)
7.6.3 Hub-centric star layout
The star layout positions the highest-degree node at the center, emphasizing hub–spoke relationships:
set.seed(2025)
ggplot(ppi_ranked, layout = "star") +
geom_edge(alpha = 0.12, color = "grey60") +
geom_point(aes(color = Score_network, size = degree), alpha = 0.85) +
geom_text_repel(aes(label = name), size = 2.5, max.overlaps = 30, segment.alpha = 0.3) +
scale_color_viridis_c(option = "C", name = "Score") +
scale_size_continuous(range = c(2, 10), name = "Degree") +
theme_void() +
theme(
legend.position = "right",
legend.title = element_text(face = "bold", size = 10)
)
7.6.4 Layout algorithm comparison
Different layouts suit different analytical purposes. The following comparison illustrates four common algorithms:
library(aplot)
set.seed(2025)
base_layers <- list(
geom_edge(alpha = 0.10, color = "grey65"),
geom_point(aes(color = degree, size = Score_network), alpha = 0.85),
geom_text_repel(aes(label = name), size = 2, max.overlaps = 25, segment.alpha = 0.3),
scale_color_gradient(low = "#56B4E9", high = "#D55E00", guide = "none"),
scale_size_continuous(range = c(1.5, 6), guide = "none"),
theme_void()
)
p_fr <- ggplot(ppi_ranked, layout = "fr") + base_layers
p_kk <- ggplot(ppi_ranked, layout = "kk") + base_layers
p_nicely <- ggplot(ppi_ranked, layout = "nicely") + base_layers
p_star <- ggplot(ppi_ranked, layout = "star") + base_layers
plot_list(p_fr, p_kk, p_nicely, p_star, ncol = 2, labels = c("A", "B", "C", "D"))
| Panel | Layout | Description | Best for |
|---|---|---|---|
| A | Fruchterman-Reingold | Force-directed; minimizes edge crossings | Dense networks with many edges |
| B | Kamada-Kawai | Energy-based; preserves graph distances | Medium networks with clear clusters |
| C | Nicely | Automatic selection by igraph |
General-purpose visualization |
| D | Star | Hub at center, others radially arranged | Hub-centric topology analysis |
7.7 References
Chin CH, Chen SH, Wu HH, Ho CW, Ko MT, Lin CY. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Systems Biology (2014), 8(Suppl 4), S11. doi: 10.1186/1752-0509-8-S4-S11.
Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics (2003), 4, 2. doi: 10.1186/1471-2105-4-2.
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment (2008), 2008(10), P10008. doi: 10.1088/1742-5468/2008/10/P10008.
van Dongen S. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications (2008), 30(1), 121–141. doi: 10.1137/040608635.
Yu G. ggtangle: Draw Network and Phylogenetic Tree Using Grammar of Graphics. R package. doi: 10.32614/CRAN.package.ggtangle.
Xi Y, et al. Exploration of the Specific Pathology of HXMM Tablet Against Retinal Injury Based on Drug Attack Model to Network Robustness. Frontiers in Pharmacology (2022), 13, 826535. doi: 10.3389/fphar.2022.826535.
7.8 Session information
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] org.Hs.eg.db_3.22.0 AnnotationDbi_1.72.0 IRanges_2.44.0
#> [4] S4Vectors_0.48.0 Biobase_2.70.0 BiocGenerics_0.56.0
#> [7] generics_0.1.4 clusterProfiler_4.18.4 aplot_0.2.9
#> [10] ggrepel_0.9.7 ggtangle_0.1.1 igraph_2.2.2
#> [13] ggplot2_4.0.2 dplyr_1.2.0 aplotExtra_0.0.4
#> [16] ivolcano_0.0.5 enrichplot_1.30.5 TCMDATA_0.0.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 shape_1.4.6.1 rstudioapi_0.18.0
#> [4] jsonlite_2.0.0 tidydr_0.0.6 magrittr_2.0.4
#> [7] farver_2.1.2 rmarkdown_2.30 GlobalOptions_0.1.3
#> [10] fs_1.6.7 vctrs_0.7.1 memoise_2.0.1
#> [13] paletteer_1.7.0 ggtree_4.0.4 htmltools_0.5.9
#> [16] forcats_1.0.1 gridGraphics_0.5-1 sass_0.4.10
#> [19] bslib_0.10.0 htmlwidgets_1.6.4 plyr_1.8.9
#> [22] cachem_1.1.0 iterators_1.0.14 lifecycle_1.0.5
#> [25] pkgconfig_2.0.3 Matrix_1.7-4 R6_2.6.1
#> [28] fastmap_1.2.0 gson_0.1.0 clue_0.3-67
#> [31] digest_0.6.39 colorspace_2.1-2 ggnewscale_0.5.2
#> [34] rematch2_2.1.2 patchwork_1.3.2 maftools_2.26.0
#> [37] RSQLite_2.4.6 labeling_0.4.3 httr_1.4.8
#> [40] polyclip_1.10-7 compiler_4.5.2 bit64_4.6.0-1
#> [43] fontquiver_0.2.1 withr_3.0.2 doParallel_1.0.17
#> [46] S7_0.2.1 BiocParallel_1.44.0 DBI_1.3.0
#> [49] ggforce_0.5.0 R.utils_2.13.0 MASS_7.3-65
#> [52] rappdirs_0.3.4 rjson_0.2.23 DNAcopy_1.84.0
#> [55] tools_4.5.2 ape_5.8-1 scatterpie_0.2.6
#> [58] R.oo_1.27.1 glue_1.8.0 nlme_3.1-168
#> [61] GOSemSim_2.36.0 grid_4.5.2 ggvenn_0.1.19
#> [64] cluster_2.1.8.1 reshape2_1.4.5 fgsea_1.36.2
#> [67] gtable_0.3.6 R.methodsS3_1.8.2 tidyr_1.3.2
#> [70] data.table_1.18.2.1 XVector_0.50.0 foreach_1.5.2
#> [73] pillar_1.11.1 stringr_1.6.0 yulab.utils_0.2.4
#> [76] circlize_0.4.17 splines_4.5.2 tweenr_2.0.3
#> [79] treeio_1.34.0 lattice_0.22-7 survival_3.8-3
#> [82] bit_4.6.0 tidyselect_1.2.1 fontLiberation_0.1.0
#> [85] GO.db_3.22.0 ComplexHeatmap_2.26.1 Biostrings_2.78.0
#> [88] knitr_1.51 fontBitstreamVera_0.1.1 gridExtra_2.3
#> [91] bookdown_0.46 Seqinfo_1.0.0 xfun_0.56
#> [94] matrixStats_1.5.0 stringi_1.8.7 lazyeval_0.2.2
#> [97] ggfun_0.2.0 yaml_2.3.12 evaluate_1.0.5
#> [100] codetools_0.2-20 gdtools_0.5.0 tibble_3.3.1
#> [103] qvalue_2.42.0 ggplotify_0.1.3 cli_3.6.5
#> [106] systemfonts_1.3.2 jquerylib_0.1.4 Rcpp_1.1.1
#> [109] png_0.1-8 parallel_4.5.2 blob_1.3.0
#> [112] ggalluvial_0.12.6 DOSE_4.4.0 ggstar_1.0.6
#> [115] viridisLite_0.4.3 tidytree_0.4.7 ggiraph_0.9.6
#> [118] ggridges_0.5.7 scales_1.4.0 purrr_1.2.1
#> [121] crayon_1.5.3 GetoptLong_1.1.0 rlang_1.1.7
#> [124] cowplot_1.2.0 fastmatch_1.1-8 KEGGREST_1.50.0