Configuration Reference¶
OneEHR experiments are driven by a single TOML config file. This page documents the public configuration contract for preprocessing, modeling, testing, and analysis.
See examples/tjh/mortality_patient.toml for a complete working example.
[dataset]¶
File paths for the three-table input spec. At minimum, dynamic is required.
| Parameter | Type | Default | Description |
|---|---|---|---|
dynamic |
str |
None |
Path to dynamic event CSV |
static |
str |
None |
Path to static patient CSV |
label |
str |
None |
Path to label event CSV |
[dataset]
dynamic = "data/dynamic.csv"
static = "data/static.csv"
label = "data/label.csv"
[preprocess]¶
Controls how irregular events are binned and features are built.
| Parameter | Type | Default | Description |
|---|---|---|---|
bin_size |
str |
"1d" |
Time bin width, for example "1h", "6h", or "1d" |
numeric_strategy |
str |
"mean" |
Aggregation for numeric values: mean, last, median, min, max, std, or count |
categorical_strategy |
str |
"onehot" |
Encoding for categorical values: onehot or count |
code_selection |
str |
"frequency" |
Code vocabulary strategy: frequency, all, or list |
top_k_codes |
int |
100 |
Number of top codes for frequency selection |
min_code_count |
int |
1 |
Minimum event count for a code to be included in the vocabulary |
max_seq_length |
int |
None |
Truncate sequences to most recent N time bins |
min_events_per_patient |
int |
1 |
Exclude patients with fewer events |
pipeline |
list[dict] |
[] |
Ordered list of preprocessing ops applied after binning (see below) |
[preprocess]
bin_size = "1d"
numeric_strategy = "mean"
categorical_strategy = "onehot"
code_selection = "frequency"
top_k_codes = 100
min_code_count = 1
Preprocessing Pipeline¶
The pipeline field defines an ordered sequence of preprocessing operations fitted on the train split and applied to all splits. When pipeline is empty (the default), numeric features are filled with 0 at train/test time as a safety net.
Each step is a TOML table with an op key and operation-specific parameters. The cols parameter supports glob patterns ("num__*", "cat__*"), explicit lists, or null (all columns).
Supported operations:
| Op | Description | Key params |
|---|---|---|
impute |
Fill NaN with a statistic | strategy (mean, median, mode, constant), value |
forward_fill |
LOCF within patient + fallback | group_key, order_key, fallback.strategy |
standardize |
Z-score normalization | (none) |
zscore_filter |
Replace outliers beyond threshold with NaN | threshold (default 3.0) |
normalize_label |
Z-score the label column (regression) | col (default "label") |
winsorize |
Quantile-based outlier clipping | lower_q, upper_q |
clip |
Hard value clipping | lower, upper |
knn_impute |
KNN imputation | n_neighbors |
iterative_impute |
MICE imputation | max_iter |
robust_scale |
Median/IQR scaling | (none) |
quantile_norm |
Quantile normalization | output_distribution, n_quantiles |
Example: LOCF with mean fallback (recommended for time-series EHR):
[preprocess]
pipeline = [
{ op = "forward_fill", cols = "num__*", group_key = "patient_id", order_key = "bin_time", fallback = { strategy = "mean" } },
]
Example: Outlier handling + imputation + normalization:
[preprocess]
pipeline = [
{ op = "zscore_filter", cols = "num__*", threshold = 3.0 },
{ op = "impute", cols = "num__*", strategy = "median" },
{ op = "standardize", cols = "num__*" },
]
[task]¶
Defines the prediction task.
| Parameter | Type | Default | Description |
|---|---|---|---|
kind |
str |
"binary" |
Task type: binary, regression, multiclass, survival, or multilabel |
prediction_mode |
str |
"patient" |
Prediction granularity: patient or time |
num_classes |
int |
None |
Number of classes (required when kind = "multiclass") |
loss |
str |
"default" |
Loss function: default or focal |
focal_gamma |
float |
2.0 |
Gamma for focal loss |
# Binary classification
[task]
kind = "binary"
prediction_mode = "patient"
# Multiclass classification
[task]
kind = "multiclass"
prediction_mode = "patient"
num_classes = 5
# Survival analysis (time-to-event with censoring)
[task]
kind = "survival"
prediction_mode = "patient"
# Multi-label classification (e.g., ICD coding)
[task]
kind = "multilabel"
prediction_mode = "patient"
[split]¶
Patient-level group split configuration. All strategies guarantee no patient appears in multiple splits.
| Parameter | Type | Default | Description |
|---|---|---|---|
kind |
str |
"random" |
Split strategy: random or time |
seed |
int |
42 |
Random seed |
val_size |
float |
0.1 |
Validation fraction |
test_size |
float |
0.2 |
Test fraction for random |
time_boundary |
str |
None |
Datetime string for time splits |
[split]
kind = "random"
seed = 42
val_size = 0.1
test_size = 0.2
[split]
kind = "time"
time_boundary = "2012-01-01"
[[models]]¶
Model selection and per-model hyperparameters. Use [[models]] to train multiple models in one experiment. Each entry has a name and an optional params dict for model-specific hyperparameters.
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str |
"xgboost" |
Model name, see Models |
params |
dict |
{} |
Model-specific hyperparameters |
[[models]]
name = "xgboost"
[models.params]
n_estimators = 100
max_depth = 4
learning_rate = 0.1
[[models]]
name = "gru"
[models.params]
hidden_dim = 64
num_layers = 1
[trainer]¶
Training loop configuration for deep learning models.
| Parameter | Type | Default | Description |
|---|---|---|---|
device |
str |
"auto" |
auto, cpu, or cuda |
seed |
int |
42 |
Random seed |
max_epochs |
int |
30 |
Maximum training epochs |
batch_size |
int |
64 |
Batch size |
lr |
float |
1e-3 |
Learning rate |
weight_decay |
float |
0.0 |
AdamW weight decay |
grad_clip |
float |
1.0 |
Gradient clipping max norm |
num_workers |
int |
0 |
DataLoader workers |
precision |
str |
"fp32" |
fp32, fp16, or bf16 |
scheduler |
str |
"none" |
LR scheduler: none, cosine, step, or plateau |
scheduler_params |
dict |
{} |
Scheduler-specific params (e.g., T_max, step_size, gamma) |
class_weight |
str |
"none" |
Class weighting: none or balanced |
early_stopping |
bool |
true |
Enable early stopping |
patience |
int |
5 |
Epochs without improvement before stopping |
monitor |
str |
"val_loss" |
Metric for early stopping: val_loss, val_auroc, val_auprc, val_rmse, val_mae |
monitor_mode |
str |
"min" |
min (lower is better) or max (higher is better) |
[trainer]
device = "auto"
seed = 42
max_epochs = 30
batch_size = 64
lr = 1e-3
early_stopping = true
patience = 5
Monitor AUROC instead of loss for binary tasks:
[trainer]
monitor = "val_auroc"
monitor_mode = "max"
early_stopping = true
patience = 10
The trainer tracks a per-epoch history of train_loss, val_loss, and the monitored metric (if not val_loss). This history is saved in meta.json under train_metrics.history.
[[systems]]¶
LLM system definitions for cross-system comparison via oneehr test.
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str |
"" |
Unique system name |
kind |
str |
"llm" |
System kind: llm or agent |
framework |
str |
"single_llm" |
Framework type |
backend |
str |
"openai" |
Backend provider |
model |
str |
"gpt-4o" |
Provider model identifier |
api_key_env |
str |
"OPENAI_API_KEY" |
Environment variable containing the API key |
params |
dict |
{} |
System-specific parameters |
[[systems]]
name = "gpt4o_eval"
kind = "llm"
framework = "single_llm"
backend = "openai"
model = "gpt-4o"
api_key_env = "OPENAI_API_KEY"
[output]¶
Run directory configuration.
| Parameter | Type | Default | Description |
|---|---|---|---|
root |
str |
"runs" |
Root directory for all runs |
run_name |
str |
"exp001" |
Name of this experiment run |
Artifacts are written to {root}/{run_name}/. See Artifacts for the full directory layout.
[output]
root = "runs"
run_name = "tjh"