EHR AI platform
OneEHR
From standardized EHR tables to reproducible runs, structured analysis, and cross-system comparison across ML/DL models and LLM systems.
OneEHR is a Python platform for longitudinal EHR experiments. It provides shared infrastructure for preprocessing, modeling, testing, and analysis on one shared run contract so the CLI and notebooks all read the same saved artifacts.
Why OneEHR¶
Most EHR projects do not fail because a model cannot be trained. They fail because preprocessing, splits, and analysis all drift into different formats owned by different scripts. OneEHR keeps those stages on one shared run contract so that a run remains reproducible and inspectable long after training finishes.
Standardize first
Event-table in, not dataset magic
Prepare normalized EHR tables once, then reuse the same inputs across preprocess, training, testing, and analysis.
Shared contract
One shared run contract across every interface
The CLI and notebooks all read the same run directory instead of parallel export formats.
Comparable outputs
Unified predictions and structured analysis
A single predictions.parquet with a system column enables cross-system comparison. Analysis modules produce JSON artifacts that stay explorable after the experiment is over.
Cross-system comparison
Unified scoring across systems
ML/DL models and LLM systems are tested on the same split with the same metrics, so comparisons stay fair and reproducible.
Workflow At A Glance¶
Preprocess
Materialize binned features and labels from standardized EHR tables.
Train
Fit tabular and deep learning models from a TOML experiment contract.
Test
Evaluate all trained models and configured systems on the held-out test split.
Analyze
Write structured analysis outputs for cross-system comparison and feature importance.
Choose Your Entry Point¶
Quickstart
Use the bundled TJH example config for the shortest path from raw tables to a complete run directory.
Core Workflows
Understand the standard preprocess, train, test, and analyze path in detail.
Configuration Reference
Full TOML option tables for dataset, preprocessing, split, models, trainer, systems, and output.
Design Principles¶
TOML is the experiment contract
Configuration is versionable, reviewable, and explicit. If the TOML changes, the experiment changed.
Patient-level leakage prevention
Supported split strategies are patient-group aware so that evaluation defaults to safer behavior.
Structured outputs over notebook state
Saved artifacts are machine-readable (Parquet + JSON), so downstream automation does not depend on hidden cells.
Cross-system comparison is built in
ML/DL models and LLM systems produce predictions in the same format, enabling fair comparison via the test and analyze commands.
Start Here¶
- Use Installation to set up Python 3.12+ and install OneEHR.
- Use Quickstart for a runnable end-to-end example.
- Use Tutorials for step-by-step Jupyter notebooks covering all features.
- Use Dataset Converters to convert MIMIC-III/IV or eICU data.
- Use Configuration Reference if you are authoring experiment TOML files.
- Use Models Reference for all 25 model architectures and their hyperparameters.
- Use Artifacts Reference if you need the precise on-disk contract for tooling.