Chapter 9 AI Module

TCMDATA includes an AI module built on top of aisdk, providing three layers of AI capability: interpretation (one-shot analysis of R objects), agent (multi-turn interactive analysis with tool use), and skill (domain-specific workflow knowledge). This chapter demonstrates the key features of each layer.

9.1 Prerequisites

The AI module requires both TCMDATA and aisdk:

install.packages("aisdk")
library(aisdk)
library(TCMDATA)

Model configuration is handled through tcm_setup() and only needs to be called once per session. User must set their own API key and provider to use AI module in this package.

tcm_setup(
  provider = "openai",
  api_key  = "sk-xxx",
  model    = "gpt-4o", 
  base_url = "https://xxx/v1", # if necessary 
  save     = TRUE,
  test     = TRUE)

9.2 Interpretation layer

The interpretation layer provides one-shot AI analysis of R objects and free-text queries, with no tool use or multi-turn conversation. The primary interface is tcm_interpret().

9.2.1 Free-text interpretation

txt <- tcm_interpret(
  "Please introduce the major pharmacological functions of Huangqi (Astragalus membranaceus), focusing on immunoregulation.",
  language = "en")
cat(txt)

9.2.2 Interpreting enrichment objects

tcm_interpret() can directly accept a clusterProfiler enrichment object. The package automatically compresses the relevant terms and genes into a compact representation before sending to the model:

library(clusterProfiler)
library(org.Hs.eg.db)

lz_targets <- search_herb("lingzhi", type = "Herb_pinyin_name")$target
lz_targets <- sample(unique(na.omit(lz_targets)), 100)

bp <- enrichGO(
  gene    = lz_targets,
  ont     = "BP",
  OrgDb   = org.Hs.eg.db,
  keyType = "SYMBOL")

tcm_interpret(
  bp,
  prompt   = "Summarise the main biological implications of this Lingzhi GO enrichment.",
  language = "en")

=== TCM AI Analysis: enrichment ===
  Model: openai:gpt-4o | Language: en | Audience: researcher

-- Summary --
The enrichment profile indicates that Lingzhi's putative targets cluster in innate
immune and inflammatory signaling, particularly TLR/IL-1/TNF-NF-κB pathways,
leukocyte adhesion, and matrix remodeling...

-- Key Findings --
  * Strong enrichment for cellular response to stress and external stimuli
  * Activation of innate immune and inflammatory pathways (IRAK1, NFKBIA, TNF, ADAM17)
  * Leukocyte adhesion and endothelial activation (VCAM1, MMP9)
  ...

9.2.3 Interpreting PPI objects

PPI network objects with topological metrics computed by compute_nodeinfo() can also be directly interpreted:

data("demo_ppi")
ppi_graph <- compute_nodeinfo(demo_ppi)

tcm_interpret(
  ppi_graph,
  prompt   = "Characterise the network architecture and identify the most important hub nodes.",
  language = "en")

9.2.4 Drafting result paragraphs

draft_result_paragraph() transforms an interpretation object into a publication-ready paragraph:

ai_res <- tcm_interpret(bp, language = "en")
draft  <- draft_result_paragraph(ai_res, language = "en")
cat(as.character(draft))

9.2.5 Custom structured output

tcm_interpret_schema() allows user-defined output schemas for integration into downstream pipelines:

my_schema <- tcm_schema(
  summary     = tcm_field_string("A concise 2-3 sentence summary"),
  mechanism   = tcm_field_string("Core mechanistic interpretation"),
  key_targets = tcm_field_array("Most important targets"),
  confidence  = tcm_field_enum(c("high", "medium", "low"),
                               "Confidence level")
)

custom_res <- tcm_interpret_schema(
  bp,
  schema = my_schema,
  type   = "enrichment",
  prompt = "Focus on inflammation-related processes.")

print(custom_res)

Available field types: tcm_field_string(), tcm_field_number(), tcm_field_boolean(), tcm_field_array(), tcm_field_enum().

9.3 Agent layer

The agent layer adds tool use and multi-turn conversation on top of the interpretation engine. TCMDATA provides 30+ analysis tools (target search, enrichment, PPI, ML screening, compound lookup, visualization, etc.) that the agent can call autonomously.

9.3.1 One-shot task: `tcm_agent()`

For a single analysis request, tcm_agent() automatically routes the query to the appropriate tools and returns the result:

# Simple herb target lookup — agent selects the right tool automatically
result <- tcm_agent("Search the targets of Huangqi (Astragalus)")
cat(result$text)

# The result also contains any generated artifacts
result$artifacts

The built-in router matches user queries to relevant tools using keyword patterns. For example, “enrichment” routes to GO/KEGG tools, “ppi” routes to network tools, and “machine learning” routes to ML screening tools. When multiple patterns match, tools are merged.

9.3.2 Interactive session: `tcm_chat()`

tcm_chat() opens an interactive REPL for multi-turn exploratory analysis:

tcm_chat()

╔══════════════════════════════════════════════════╗
║  TCM-Pharmacist · Interactive Session            ║
║  Type /help for commands · /quit to exit         ║
╚══════════════════════════════════════════════════╝

[1] You > Search targets of Huangqi and sepsis, then compute intersection
 ✓ Route: target_lookup + disease_lookup (high)
 ✓ Tools called: search_herb_records → search_disease_targets → compute_target_intersection
 ✓ New artifacts: intersect_001

 Agent > Found 121 intersection targets between Huangqi and sepsis...

[2] You > Run GO and KEGG enrichment on the intersection
 ✓ Route: enrichment (high)
 ✓ Tools called: run_go_enrichment → run_kegg_enrichment
 ✓ New artifacts: enrich_001, enrich_002

 Agent > GO enrichment identified 245 significant BP terms...

[3] You > /artifacts
  intersect_001  gene_list    character[121]  2026-04-10 14:32:01
  enrich_001     enrichment   enrichResult    2026-04-10 14:32:15
  enrich_002     enrichment   enrichResult    2026-04-10 14:32:16

[4] You > /save 10x8
  ✓ Exported 3 artifacts to tcm_output/

[5] You > /quit

Key commands: /help, /artifacts, /save [WxH], /history, /stats, /quit.

The session returns a list with history and artifacts for programmatic access:

res <- tcm_chat()
# After /quit:
res$artifacts  # all generated artifacts
res$history    # full conversation history

9.3.3 Programmatic agent: `create_tcm_task_agent()`

For scripted workflows, create a reusable agent and execute tasks programmatically:

agent <- create_tcm_task_agent()

r1 <- run_tcm_task(agent, "Search Huangqi targets and sepsis targets, compute intersection")
r2 <- run_tcm_task(agent, "Run GO enrichment on the intersection genes")
r3 <- run_tcm_task(agent, "Build PPI network and rank hub genes")

# Each result contains: $text, $artifacts, $tool_calls
r3$text

9.3.4 Artifact management

All analysis results are automatically stored as artifacts with generated IDs (e.g., enrich_001, ppi_002). The agent can reference artifacts in subsequent turns, and artifacts can be managed via dedicated functions:

list_tcm_artifacts()     # list all artifacts
load_tcm_artifact("enrich_001")  # retrieve the R object
clear_tcm_artifacts()    # clear all

9.4 Skill layer

Skills are domain-knowledge packages that guide the agent through complex multi-step workflows. Unlike tools (which perform specific operations), skills provide strategic context — what to do, in what order, and what to watch out for.

9.4.1 Built-in skills

TCMDATA ships with two package skills, and also loads aisdk’s skill-creator by default:

Skill	Purpose
`tcm-network-pharmacology`	Guides the full network pharmacology workflow (target retrieval → intersection → PPI → enrichment → validation → report). Only activates when user explicitly requests systematic analysis
`analysis-preferences`	Background constraint layer. Sets default parameters (e.g., p-value cutoffs, PPI score thresholds, visualization defaults) and quality standards. Active on every turn
`skill-creator`	Meta-skill from aisdk for creating new custom skills

9.4.2 How skills work

When the agent detects a request that matches a skill’s trigger condition, the skill’s instructions are injected into the conversation context. For example, asking “Help me do a network pharmacology analysis of Huangqi treating sepsis” activates the tcm-network-pharmacology skill, which guides the agent through:

Phase 1: Target collection (herb targets + disease targets → intersection)
Phase 2: Network & enrichment (PPI construction, GO/KEGG analysis, hub gene ranking)
Phase 3: Expression validation (WGCNA, ML screening, DEG integration — if data available)
Phase 4: Single-cell validation (if scRNA-seq data available)
Phase 5: Literature validation, cross-database verification, and report generation

Importantly, the skill is scope-aware: asking “Just search the targets of Huangqi” will only run the relevant step, not the full pipeline.

9.4.3 Creating custom skills

Use the skill-creator skill to define new skills. A minimal skill only needs a SKILL.md file with YAML frontmatter:

# Example: ask the agent to create a new skill
tcm_chat()
# > Help me create a skill for molecular docking analysis

A skill directory can optionally contain references/ (detailed documentation), scripts/ (executable scripts), and assets/ (templates).

9.4.4 Managing skills

By default, the agent uses the TCMDATA skill directory plus aisdk’s skill-creator. To customize preferences or add new skills, initialize a local skills directory:

# Copy all bundled skills to ./tcm_skills/ for customization
tcm_init_skills()

This creates a local tcm_skills/ directory containing the TCMDATA package skills. You can then:

# Edit analysis preferences (e.g., change default p-value cutoff)
file.edit("tcm_skills/analysis-preferences/SKILL.md")

# Add a new skill created by skill-creator
# Just place the skill directory under tcm_skills/:
#   tcm_skills/my-new-skill/SKILL.md

# Check which skills directory is currently active
tcm_skill_dir()

# Switch to a different skills directory
tcm_use_skills("path/to/other/skills")

# Reset to package defaults
tcm_reset_skills()

Once a local skills directory is active, all agent functions (tcm_agent(), tcm_chat(), create_tcm_task_agent()) will automatically use it.

9.5 Session information

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] caret_7.0-1            lattice_0.22-9         org.Hs.eg.db_3.23.1   
#>  [4] AnnotationDbi_1.74.0   IRanges_2.46.0         S4Vectors_0.50.1      
#>  [7] Biobase_2.72.0         BiocGenerics_0.58.1    generics_0.1.4        
#> [10] clusterProfiler_4.20.0 aplot_0.2.9            ggrepel_0.9.8         
#> [13] ggtangle_0.1.2         igraph_2.3.1           ggplot2_4.0.3         
#> [16] dplyr_1.2.1            ivolcano_0.0.5         enrichplot_1.32.0     
#> [19] TCMDATA_0.1.0         
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.6.0           ggplotify_0.1.3         tibble_3.3.1           
#>   [4] polyclip_1.10-7         hardhat_1.4.3           enrichit_0.1.4         
#>   [7] pROC_1.19.0.1           rpart_4.1.27            lifecycle_1.0.5        
#>  [10] httr2_1.2.2             doParallel_1.0.17       globals_0.19.1         
#>  [13] processx_3.9.0          MASS_7.3-65             magrittr_2.0.5         
#>  [16] sass_0.4.10             rmarkdown_2.31          jquerylib_0.1.4        
#>  [19] yaml_2.3.12             ggvenn_0.1.19           DBI_1.3.0              
#>  [22] RColorBrewer_1.1-3      lubridate_1.9.5         purrr_1.2.2            
#>  [25] yulab.utils_0.2.4       nnet_7.3-20             tweenr_2.0.3           
#>  [28] rappdirs_0.3.4          ipred_0.9-15            aisdk_1.4.8            
#>  [31] gdtools_0.5.1           circlize_0.4.18         lava_1.9.1             
#>  [34] listenv_0.10.1          tidytree_0.4.7          fru_0.0.7              
#>  [37] parallelly_1.47.0       codetools_0.2-20        DOSE_4.6.0             
#>  [40] ggforce_0.5.0           tidyselect_1.2.1        shape_1.4.6.1          
#>  [43] farver_2.1.2            matrixStats_1.5.0       Seqinfo_1.2.0          
#>  [46] jsonlite_2.0.0          GetoptLong_1.1.1        e1071_1.7-17           
#>  [49] ggridges_0.5.7          ggalluvial_0.12.6       survival_3.8-6         
#>  [52] iterators_1.0.14        systemfonts_1.3.2       foreach_1.5.2          
#>  [55] tools_4.6.0             ggnewscale_0.5.2        treeio_1.36.1          
#>  [58] Rcpp_1.1.1-1.1          glue_1.8.1              prodlim_2026.03.11     
#>  [61] gridExtra_2.3           xfun_0.57               qvalue_2.44.0          
#>  [64] withr_3.0.2             fastmap_1.2.0           callr_3.7.6            
#>  [67] digest_0.6.39           timechange_0.4.0        R6_2.6.1               
#>  [70] gridGraphics_0.5-1      colorspace_2.1-2        GO.db_3.23.1           
#>  [73] RSQLite_3.53.1          tidyr_1.3.2             fontLiberation_0.1.0   
#>  [76] data.table_1.18.4       recipes_1.3.2           class_7.3-23           
#>  [79] httr_1.4.8              htmlwidgets_1.6.4       scatterpie_0.2.6       
#>  [82] ModelMetrics_1.2.2.2    pkgconfig_2.0.3         gtable_0.3.6           
#>  [85] timeDate_4052.112       blob_1.3.0              ComplexHeatmap_2.28.0  
#>  [88] S7_0.2.2                XVector_0.52.0          htmltools_0.5.9        
#>  [91] fontBitstreamVera_0.1.1 bookdown_0.46           clue_0.3-68            
#>  [94] scales_1.4.0            png_0.1-9               gower_1.0.2            
#>  [97] Boruta_10.0.0           ggfun_0.2.0             knitr_1.51             
#> [100] rstudioapi_0.18.0       reshape2_1.4.5          rjson_0.2.23           
#> [103] nlme_3.1-169            proxy_0.4-29            cachem_1.1.0           
#> [106] GlobalOptions_0.1.4     stringr_1.6.0           parallel_4.6.0         
#> [109] pillar_1.11.1           grid_4.6.0              vctrs_0.7.3            
#> [112] randomForest_4.7-1.2    tidydr_0.0.6            cluster_2.1.8.2        
#> [115] evaluate_1.0.5          cli_3.6.6               compiler_4.6.0         
#> [118] rlang_1.2.0             crayon_1.5.3            future.apply_1.20.2    
#> [121] labeling_0.4.3          ps_1.9.3                plyr_1.8.9             
#> [124] fs_2.1.0                ggiraph_0.9.6           stringi_1.8.7          
#> [127] viridisLite_0.4.3       Biostrings_2.80.1       lazyeval_0.2.3         
#> [130] glmnet_5.0              GOSemSim_2.38.0         fontquiver_0.2.1       
#> [133] Matrix_1.7-5            patchwork_1.3.2         bit64_4.8.2            
#> [136] future_1.70.0           KEGGREST_1.52.0         kernlab_0.9-33         
#> [139] memoise_2.0.1           bslib_0.11.0            ggtree_4.2.0           
#> [142] bit_4.6.0               xgboost_3.2.1.1         ape_5.8-1              
#> [145] gson_0.1.0