Chapter 9 AI Module

TCMDATA includes an AI module built on top of aisdk, providing three layers of AI capability: interpretation (one-shot analysis of R objects), agent (multi-turn interactive analysis with tool use), and skill (domain-specific workflow knowledge). This chapter demonstrates the key features of each layer.

9.1 Prerequisites

The AI module requires both TCMDATA and aisdk:

devtools::install_github("YuLab-SMU/aisdk")
library(aisdk)
library(TCMDATA)

Model configuration is handled through tcm_setup() and only needs to be called once per session. User must set their own API key and provider to use AI module in this package.

tcm_setup(
  provider = "openai",
  api_key  = "sk-xxx",
  model    = "gpt-4o", 
  base_url = "https://xxx/v1", # if necessary 
  save     = TRUE,
  test     = TRUE)

9.2 Interpretation layer

The interpretation layer provides one-shot AI analysis of R objects and free-text queries, with no tool use or multi-turn conversation. The primary interface is tcm_interpret().

9.2.1 Free-text interpretation

txt <- tcm_interpret(
  "Please introduce the major pharmacological functions of Huangqi (Astragalus membranaceus), focusing on immunoregulation.",
  language = "en")
cat(txt)

9.2.2 Interpreting enrichment objects

tcm_interpret() can directly accept a clusterProfiler enrichment object. The package automatically compresses the relevant terms and genes into a compact representation before sending to the model:

library(clusterProfiler)
library(org.Hs.eg.db)

lz_targets <- search_herb("lingzhi", type = "Herb_pinyin_name")$target
lz_targets <- sample(unique(na.omit(lz_targets)), 100)

bp <- enrichGO(
  gene    = lz_targets,
  ont     = "BP",
  OrgDb   = org.Hs.eg.db,
  keyType = "SYMBOL")

tcm_interpret(
  bp,
  prompt   = "Summarise the main biological implications of this Lingzhi GO enrichment.",
  language = "en")
=== TCM AI Analysis: enrichment ===
  Model: openai:gpt-4o | Language: en | Audience: researcher

-- Summary --
The enrichment profile indicates that Lingzhi's putative targets cluster in innate
immune and inflammatory signaling, particularly TLR/IL-1/TNF-NF-κB pathways,
leukocyte adhesion, and matrix remodeling...

-- Key Findings --
  * Strong enrichment for cellular response to stress and external stimuli
  * Activation of innate immune and inflammatory pathways (IRAK1, NFKBIA, TNF, ADAM17)
  * Leukocyte adhesion and endothelial activation (VCAM1, MMP9)
  ...

9.2.3 Interpreting PPI objects

PPI network objects with topological metrics computed by compute_nodeinfo() can also be directly interpreted:

data("demo_ppi")
ppi_graph <- compute_nodeinfo(demo_ppi)

tcm_interpret(
  ppi_graph,
  prompt   = "Characterise the network architecture and identify the most important hub nodes.",
  language = "en")

9.2.4 Drafting result paragraphs

draft_result_paragraph() transforms an interpretation object into a publication-ready paragraph:

ai_res <- tcm_interpret(bp, language = "en")
draft  <- draft_result_paragraph(ai_res, language = "en")
cat(as.character(draft))

9.2.5 Custom structured output

tcm_interpret_schema() allows user-defined output schemas for integration into downstream pipelines:

my_schema <- tcm_schema(
  summary     = tcm_field_string("A concise 2-3 sentence summary"),
  mechanism   = tcm_field_string("Core mechanistic interpretation"),
  key_targets = tcm_field_array("Most important targets"),
  confidence  = tcm_field_enum(c("high", "medium", "low"),
                               "Confidence level")
)

custom_res <- tcm_interpret_schema(
  bp,
  schema = my_schema,
  type   = "enrichment",
  prompt = "Focus on inflammation-related processes.")

print(custom_res)

Available field types: tcm_field_string(), tcm_field_number(), tcm_field_boolean(), tcm_field_array(), tcm_field_enum().

9.3 Agent layer

The agent layer adds tool use and multi-turn conversation on top of the interpretation engine. TCMDATA provides 30+ analysis tools (target search, enrichment, PPI, ML screening, compound lookup, visualization, etc.) that the agent can call autonomously.

9.3.1 One-shot task: tcm_agent()

For a single analysis request, tcm_agent() automatically routes the query to the appropriate tools and returns the result:

# Simple herb target lookup — agent selects the right tool automatically
result <- tcm_agent("Search the targets of Huangqi (Astragalus)")
cat(result$text)

# The result also contains any generated artifacts
result$artifacts

The built-in router matches user queries to relevant tools using keyword patterns. For example, “enrichment” routes to GO/KEGG tools, “ppi” routes to network tools, and “machine learning” routes to ML screening tools. When multiple patterns match, tools are merged.

9.3.2 Interactive session: tcm_chat()

tcm_chat() opens an interactive REPL for multi-turn exploratory analysis:

tcm_chat()
╔══════════════════════════════════════════════════╗
║  TCM-Pharmacist · Interactive Session            ║
║  Type /help for commands · /quit to exit         ║
╚══════════════════════════════════════════════════╝

[1] You > Search targets of Huangqi and sepsis, then compute intersection
 ✓ Route: target_lookup + disease_lookup (high)
 ✓ Tools called: search_herb_records → search_disease_targets → compute_target_intersection
 ✓ New artifacts: intersect_001

 Agent > Found 121 intersection targets between Huangqi and sepsis...

[2] You > Run GO and KEGG enrichment on the intersection
 ✓ Route: enrichment (high)
 ✓ Tools called: run_go_enrichment → run_kegg_enrichment
 ✓ New artifacts: enrich_001, enrich_002

 Agent > GO enrichment identified 245 significant BP terms...

[3] You > /artifacts
  intersect_001  gene_list    character[121]  2026-04-10 14:32:01
  enrich_001     enrichment   enrichResult    2026-04-10 14:32:15
  enrich_002     enrichment   enrichResult    2026-04-10 14:32:16

[4] You > /save 10x8
  ✓ Exported 3 artifacts to tcm_output/

[5] You > /quit

Key commands: /help, /artifacts, /save [WxH], /history, /stats, /quit.

The session returns a list with history and artifacts for programmatic access:

res <- tcm_chat()
# After /quit:
res$artifacts  # all generated artifacts
res$history    # full conversation history

9.3.3 Programmatic agent: create_tcm_task_agent()

For scripted workflows, create a reusable agent and execute tasks programmatically:

agent <- create_tcm_task_agent()

r1 <- run_tcm_task(agent, "Search Huangqi targets and sepsis targets, compute intersection")
r2 <- run_tcm_task(agent, "Run GO enrichment on the intersection genes")
r3 <- run_tcm_task(agent, "Build PPI network and rank hub genes")

# Each result contains: $text, $artifacts, $tool_calls
r3$text

9.3.4 Artifact management

All analysis results are automatically stored as artifacts with generated IDs (e.g., enrich_001, ppi_002). The agent can reference artifacts in subsequent turns, and artifacts can be managed via dedicated functions:

list_tcm_artifacts()     # list all artifacts
load_tcm_artifact("enrich_001")  # retrieve the R object
clear_tcm_artifacts()    # clear all

9.4 Skill layer

Skills are domain-knowledge packages that guide the agent through complex multi-step workflows. Unlike tools (which perform specific operations), skills provide strategic context — what to do, in what order, and what to watch out for.

9.4.1 Built-in skills

TCMDATA ships with two package skills, and also loads aisdk’s skill-creator by default:

Skill Purpose
tcm-network-pharmacology Guides the full network pharmacology workflow (target retrieval → intersection → PPI → enrichment → validation → report). Only activates when user explicitly requests systematic analysis
analysis-preferences Background constraint layer. Sets default parameters (e.g., p-value cutoffs, PPI score thresholds, visualization defaults) and quality standards. Active on every turn
skill-creator Meta-skill from aisdk for creating new custom skills

9.4.2 How skills work

When the agent detects a request that matches a skill’s trigger condition, the skill’s instructions are injected into the conversation context. For example, asking “Help me do a network pharmacology analysis of Huangqi treating sepsis” activates the tcm-network-pharmacology skill, which guides the agent through:

  • Phase 1: Target collection (herb targets + disease targets → intersection)
  • Phase 2: Network & enrichment (PPI construction, GO/KEGG analysis, hub gene ranking)
  • Phase 3: Expression validation (WGCNA, ML screening, DEG integration — if data available)
  • Phase 4: Single-cell validation (if scRNA-seq data available)
  • Phase 5: Literature validation, cross-database verification, and report generation

Importantly, the skill is scope-aware: asking “Just search the targets of Huangqi” will only run the relevant step, not the full pipeline.

9.4.3 Creating custom skills

Use the skill-creator skill to define new skills. A minimal skill only needs a SKILL.md file with YAML frontmatter:

# Example: ask the agent to create a new skill
tcm_chat()
# > Help me create a skill for molecular docking analysis

A skill directory can optionally contain references/ (detailed documentation), scripts/ (executable scripts), and assets/ (templates).

9.4.4 Managing skills

By default, the agent uses the TCMDATA skill directory plus aisdk’s skill-creator. To customize preferences or add new skills, initialize a local skills directory:

# Copy all bundled skills to ./tcm_skills/ for customization
tcm_init_skills()

This creates a local tcm_skills/ directory containing the TCMDATA package skills. You can then:

# Edit analysis preferences (e.g., change default p-value cutoff)
file.edit("tcm_skills/analysis-preferences/SKILL.md")

# Add a new skill created by skill-creator
# Just place the skill directory under tcm_skills/:
#   tcm_skills/my-new-skill/SKILL.md

# Check which skills directory is currently active
tcm_skill_dir()

# Switch to a different skills directory
tcm_use_skills("path/to/other/skills")

# Reset to package defaults
tcm_reset_skills()

Once a local skills directory is active, all agent functions (tcm_agent(), tcm_chat(), create_tcm_task_agent()) will automatically use it.

9.5 Session information

sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] caret_7.0-1            lattice_0.22-9         org.Hs.eg.db_3.22.0   
#>  [4] AnnotationDbi_1.72.0   IRanges_2.44.0         S4Vectors_0.48.1      
#>  [7] Biobase_2.70.0         BiocGenerics_0.56.0    generics_0.1.4        
#> [10] clusterProfiler_4.18.4 aplot_0.2.9            ggrepel_0.9.8         
#> [13] ggtangle_0.1.1         igraph_2.2.3           ggplot2_4.0.2         
#> [16] dplyr_1.2.1            ivolcano_0.0.5         enrichplot_1.30.5     
#> [19] TCMDATA_0.1.0         
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.5.3           ggplotify_0.1.3         tibble_3.3.1           
#>   [4] R.oo_1.27.1             polyclip_1.10-7         hardhat_1.4.3          
#>   [7] pROC_1.19.0.1           rpart_4.1.24            lifecycle_1.0.5        
#>  [10] doParallel_1.0.17       globals_0.19.1          MASS_7.3-65            
#>  [13] magrittr_2.0.5          sass_0.4.10             rmarkdown_2.31         
#>  [16] jquerylib_0.1.4         yaml_2.3.12             ggvenn_0.1.19          
#>  [19] cowplot_1.2.0           DBI_1.3.0               RColorBrewer_1.1-3     
#>  [22] lubridate_1.9.5         purrr_1.2.2             R.utils_2.13.0         
#>  [25] yulab.utils_0.2.4       nnet_7.3-20             tweenr_2.0.3           
#>  [28] rappdirs_0.3.4          ipred_0.9-15            gdtools_0.5.0          
#>  [31] circlize_0.4.18         lava_1.9.0              listenv_0.10.1         
#>  [34] tidytree_0.4.7          parallelly_1.46.1       codetools_0.2-20       
#>  [37] DOSE_4.4.0              ggforce_0.5.0           tidyselect_1.2.1       
#>  [40] shape_1.4.6.1           farver_2.1.2            matrixStats_1.5.0      
#>  [43] Seqinfo_1.0.0           jsonlite_2.0.0          GetoptLong_1.1.1       
#>  [46] e1071_1.7-17            ggridges_0.5.7          ggalluvial_0.12.6      
#>  [49] survival_3.8-6          iterators_1.0.14        systemfonts_1.3.2      
#>  [52] foreach_1.5.2           tools_4.5.3             ggnewscale_0.5.2       
#>  [55] treeio_1.34.0           Rcpp_1.1.1              glue_1.8.0             
#>  [58] prodlim_2026.03.11      gridExtra_2.3           xfun_0.57              
#>  [61] ranger_0.18.0           qvalue_2.42.0           withr_3.0.2            
#>  [64] fastmap_1.2.0           digest_0.6.39           timechange_0.4.0       
#>  [67] R6_2.6.1                gridGraphics_0.5-1      colorspace_2.1-2       
#>  [70] GO.db_3.22.0            RSQLite_2.4.6           R.methodsS3_1.8.2      
#>  [73] tidyr_1.3.2             fontLiberation_0.1.0    data.table_1.18.2.1    
#>  [76] recipes_1.3.2           class_7.3-23            httr_1.4.8             
#>  [79] htmlwidgets_1.6.4       scatterpie_0.2.6        ModelMetrics_1.2.2.2   
#>  [82] pkgconfig_2.0.3         gtable_0.3.6            timeDate_4052.112      
#>  [85] blob_1.3.0              ComplexHeatmap_2.26.1   S7_0.2.1               
#>  [88] XVector_0.50.0          htmltools_0.5.9         fontBitstreamVera_0.1.1
#>  [91] bookdown_0.46           fgsea_1.36.2            clue_0.3-68            
#>  [94] scales_1.4.0            png_0.1-9               gower_1.0.2            
#>  [97] Boruta_9.0.0            ggfun_0.2.0             knitr_1.51             
#> [100] rstudioapi_0.18.0       reshape2_1.4.5          rjson_0.2.23           
#> [103] nlme_3.1-168            proxy_0.4-29            cachem_1.1.0           
#> [106] GlobalOptions_0.1.4     stringr_1.6.0           parallel_4.5.3         
#> [109] pillar_1.11.1           grid_4.5.3              vctrs_0.7.3            
#> [112] randomForest_4.7-1.2    tidydr_0.0.6            cluster_2.1.8.2        
#> [115] evaluate_1.0.5          cli_3.6.6               compiler_4.5.3         
#> [118] rlang_1.2.0             crayon_1.5.3            future.apply_1.20.2    
#> [121] labeling_0.4.3          plyr_1.8.9              fs_2.0.1               
#> [124] ggiraph_0.9.6           stringi_1.8.7           viridisLite_0.4.3      
#> [127] BiocParallel_1.44.0     Biostrings_2.78.0       lazyeval_0.2.3         
#> [130] glmnet_4.1-10           GOSemSim_2.36.0         fontquiver_0.2.1       
#> [133] Matrix_1.7-5            patchwork_1.3.2         bit64_4.6.0-1          
#> [136] future_1.70.0           KEGGREST_1.50.0         kernlab_0.9-33         
#> [139] memoise_2.0.1           bslib_0.10.0            ggtree_4.0.5           
#> [142] fastmatch_1.1-8         bit_4.6.0               xgboost_3.2.1.1        
#> [145] ape_5.8-1               gson_0.1.0