Chapter 2 Downloading Data from RAP
The UKBAnalytica package ships Python helper scripts for downloading data from the
UK Biobank Research Analysis Platform (RAP). These scripts live in inst/python/
after package installation.
2.1 File structure
inst/
python/
ukb_data_loader.py # Demographics & metabolites (Spark)
protein_loader.py # Proteomics (dx commands)
field_ids_demographic.txt # Example demographic field IDs
extdata/
metabolites_non_ratio.txt # Non-ratio metabolite reference (170 fields)
2.2 Demographic data
Download any combination of UKB fields by specifying their field IDs.
The loader uses Spark via dxdata under the hood.
# Pass IDs directly
python ukb_data_loader.py demographic \
--ids 31,53,21022,21001 \
-o population.csv
# Or read IDs from a file (recommended for many fields)
python ukb_data_loader.py demographic \
--id-file field_ids_demographic.txt \
-o population.csvThe ID file supports comments (#), comma-separated and space-separated formats:
2.3 Metabolomics data (NMR)
Download NMR metabolomics data. You can retrieve all 251 metabolite fields or restrict to the curated non-ratio subset of 170 fields.
2.4 Proteomics data (Olink)
Download Olink protein expression data via dx commands.
The loader handles batching, merging, and progress tracking automatically.
2.5 Common UKB field IDs
The table below lists commonly used field IDs for reference:
| Category | Field IDs | Description |
|---|---|---|
| Basic demographics | 31, 53, 21022, 21001 | Sex, assessment date, age, BMI |
| Lifestyle | 20116, 20117, 1160 | Smoking, alcohol, sleep |
| Blood pressure | 93, 94, 4079, 4080 | Systolic/diastolic BP |
| Biomarkers | 30870, 30780, 30760, 30750 | Triglycerides, LDL, HDL, glucose |
| Hospital records | 41270, 41280, 41271, 41281 | ICD-10/9 diagnoses + dates |
| Death registry | 40000, 40001, 40002 | Death date, causes |