Dataset Converters¶
OneEHR includes built-in converters for standard clinical datasets. Each converter transforms raw source tables into OneEHR's three-table format (dynamic.csv, static.csv, label.csv).
Usage¶
CLI¶
oneehr convert --dataset mimic3 --raw-dir /path/to/mimic3 --output-dir data/mimic3/ --task mortality
Python API¶
from oneehr.datasets import MIMIC3Converter
converter = MIMIC3Converter("/path/to/mimic3")
result = converter.convert()
# Access DataFrames directly
print(result.dynamic.shape)
print(result.static.shape)
print(result.labels.keys()) # ['mortality', 'readmission', 'los_3day', 'los_7day']
# Or save to disk
converter.save("data/mimic3/", task="mortality")
MIMIC-III¶
Class: oneehr.datasets.MIMIC3Converter
Expected files: ADMISSIONS.csv, PATIENTS.csv, ICUSTAYS.csv, LABEVENTS.csv, CHARTEVENTS.csv, DIAGNOSES_ICD.csv, PROCEDURES_ICD.csv, PRESCRIPTIONS.csv
Event sources:
| Source | Code prefix | Value |
|---|---|---|
| Lab events | LAB_{itemid} |
Numeric result |
| Chart events | CHART_{itemid} |
Numeric/text value |
| Diagnoses | DX_{icd9_code} |
1 (presence) |
| Procedures | PROC_{icd9_code} |
1 (presence) |
| Prescriptions | RX_{drug} |
1 (presence) |
Static features: age, sex, ethnicity, insurance
Label tasks:
| Task | Description |
|---|---|
mortality |
In-hospital mortality per admission |
readmission |
30-day unplanned readmission |
los_3day |
Length of stay > 3 days |
los_7day |
Length of stay > 7 days |
Options:
| Parameter | Default | Description |
|---|---|---|
use_chartevents |
True |
Include chart events (large file) |
use_prescriptions |
True |
Include prescriptions |
max_chartevents_rows |
None |
Limit rows for memory control |
MIMIC-IV¶
Class: oneehr.datasets.MIMIC4Converter
Expected layout: hosp/ and icu/ subdirectories (or flat directory fallback).
Key differences from MIMIC-III:
- Lowercase column names
- ICD version tracking (icd_version column: 9 or 10)
- Diagnosis codes prefixed with ICD version: DX_ICD9_{code} or DX_ICD10_{code}
- anchor_age field for direct age access
- race column in admissions (instead of ethnicity)
Label tasks: Same as MIMIC-III (mortality, readmission, los_3day, los_7day).
eICU¶
Class: oneehr.datasets.EICUConverter
Expected files: patient.csv, lab.csv, vitalPeriodic.csv, vitalAperiodic.csv, diagnosis.csv, medication.csv
Event sources:
| Source | Code prefix | Value |
|---|---|---|
| Lab results | LAB_{labname} |
Numeric result |
| Vital signs (periodic) | VITAL_{column} |
Numeric value |
| Vital signs (aperiodic) | VITAL_{column} |
Numeric value |
| Diagnoses | DX_{icd9code} |
1 |
| Medications | RX_{drugname} |
1 |
Note: eICU uses minute-based offsets from hospital admission. The converter creates synthetic timestamps for compatibility with OneEHR's time-based binning.
Options:
| Parameter | Default | Description |
|---|---|---|
use_vitals |
True |
Include vital sign events |
use_medication |
True |
Include medication events |
Custom Datasets¶
To convert any dataset, produce three CSVs matching the OneEHR data model:
dynamic.csv: patient_id, event_time, code, value
static.csv: patient_id, <covariates...>
label.csv: patient_id, label_time, label_code, label_value
See the Data Model reference for column specifications.
You can also extend BaseConverter:
from oneehr.datasets._base import BaseConverter, ConvertedDataset
class MyConverter(BaseConverter):
def convert(self) -> ConvertedDataset:
# Load and transform your data
return ConvertedDataset(dynamic=..., static=..., labels={"mortality": ...})