EHR AI platform

OneEHR

From standardized EHR tables to reproducible runs, structured analysis, and cross-system comparison across ML/DL models and LLM systems.

OneEHR is a Python platform for longitudinal EHR experiments. It provides shared infrastructure for preprocessing, modeling, testing, and analysis on one shared run contract so the CLI and notebooks all read the same saved artifacts.

Python 3.12+ TOML config MIMIC / eICU ICD / CCS / ATC Reproducible
Input contract 3-table EHR schema dynamic.csv, static.csv, label.csv
Run outputs Structured artifacts Parquet + JSON
Models 25 built-in Tabular ML, recurrent, transformer, Mamba, EHR-specialised, survival
System layer Cross-system comparison Same samples, same scoring contract

Why OneEHR

Most EHR projects do not fail because a model cannot be trained. They fail because preprocessing, splits, and analysis all drift into different formats owned by different scripts. OneEHR keeps those stages on one shared run contract so that a run remains reproducible and inspectable long after training finishes.

Standardize first

Event-table in, not dataset magic

Prepare normalized EHR tables once, then reuse the same inputs across preprocess, training, testing, and analysis.

Shared contract

One shared run contract across every interface

The CLI and notebooks all read the same run directory instead of parallel export formats.

Comparable outputs

Unified predictions and structured analysis

A single predictions.parquet with a system column enables cross-system comparison. Analysis modules produce JSON artifacts that stay explorable after the experiment is over.

Cross-system comparison

Unified scoring across systems

ML/DL models and LLM systems are tested on the same split with the same metrics, so comparisons stay fair and reproducible.

Workflow At A Glance

01

Preprocess

Materialize binned features and labels from standardized EHR tables.

02

Train

Fit tabular and deep learning models from a TOML experiment contract.

03

Test

Evaluate all trained models and configured systems on the held-out test split.

04

Analyze

Write structured analysis outputs for cross-system comparison and feature importance.

Choose Your Entry Point

Quickstart

Use the bundled TJH example config for the shortest path from raw tables to a complete run directory.

Core Workflows

Understand the standard preprocess, train, test, and analyze path in detail.

Configuration Reference

Full TOML option tables for dataset, preprocessing, split, models, trainer, systems, and output.

Design Principles

TOML is the experiment contract

Configuration is versionable, reviewable, and explicit. If the TOML changes, the experiment changed.

Patient-level leakage prevention

Supported split strategies are patient-group aware so that evaluation defaults to safer behavior.

Structured outputs over notebook state

Saved artifacts are machine-readable (Parquet + JSON), so downstream automation does not depend on hidden cells.

Cross-system comparison is built in

ML/DL models and LLM systems produce predictions in the same format, enabling fair comparison via the test and analyze commands.

Start Here