Core Workflows

This guide covers the standard OneEHR operating path: prepare standardized tables, materialize features, train and test models, and write structured analysis.

Use this page for workflow decisions. Use the Configuration Reference, CLI Reference, and Artifacts Reference for full option tables and on-disk details.

Each command reads and writes the same shared run directory contract, so persisted state stays aligned across preprocessing, training, testing, and analysis.

Workflow Shape

For a typical experiment, the command sequence is:

oneehr preprocess --config experiment.toml
oneehr train      --config experiment.toml
oneehr test       --config experiment.toml
oneehr analyze    --config experiment.toml
oneehr plot       --config experiment.toml    # optional

All of these commands operate on the same run root, usually {output.root}/{output.run_name}.

Preprocess

oneehr preprocess is the first required step for every run. It reads the standardized dataset tables, materializes the binned feature views, saves the split contract, and writes the run manifest used by downstream commands.

uv run oneehr preprocess --config experiment.toml

What preprocessing decides:

Inputs come from [dataset]. The required raw shape is:

Train

oneehr train fits one or more configured models against the saved preprocess artifacts and split contract.

uv run oneehr train --config experiment.toml
uv run oneehr train --config experiment.toml --force

Key behaviors:

In OneEHR, the TOML file is the experiment contract: if the config changes, the experiment changed.

Test

oneehr test evaluates all trained models and configured [[systems]] on the held-out test split.

uv run oneehr test --config experiment.toml
uv run oneehr test --config experiment.toml --force

Outputs:

Use --force to overwrite existing test outputs.

Analyze

oneehr analyze reads test/predictions.parquet and writes structured analysis outputs under analyze/.

uv run oneehr analyze --config experiment.toml
uv run oneehr analyze --config experiment.toml --module comparison

Available modules:

When --module is not specified, all available modules are run.

Plot

oneehr plot --config experiment.toml --style nature

Renders publication-quality figures from test and analyze results. Supported figures: ROC curves, PR curves, confusion matrices, calibration plots, decision curve analysis, forest plots, fairness plots, training curves, significance plots, missing data heatmaps, cohort flow diagrams, Kaplan-Meier curves, and attribution heatmaps.

Style presets: default, nature, lancet, wide.

Split Strategies

All supported split strategies are patient-level group splits. A patient never appears in more than one of train, validation, or test.

Supported strategies:

[split]
kind = "random"
seed = 42
val_size = 0.1
test_size = 0.2
[split]
kind = "time"
time_boundary = "2012-01-01"