
Why I Built PyHEOR
Most people doing Health Economics and Outcomes Research (HEOR) use R (heemod, hesim, DARTH) or TreeAge. The R ecosystem is mature, but I kept running into friction in real projects:
- Multiple R packages with inconsistent APIs — a complete analysis workflow means jumping between packages constantly
- TreeAge is expensive commercial software and isn’t very flexible
- Python has a much stronger ecosystem for data processing and machine learning, but there’s no complete HEOR framework for it
So I built PyHEOR — a pure Python framework for health economics modeling, designed around a single, unified API that covers the full HEOR workflow.
Project: https://github.com/lenardar/PyHEOR
What It Does
PyHEOR currently supports:
Modeling Engines
- Markov cohort model (discrete-time state transitions)
- Partitioned Survival Model (PSM)
- Microsimulation (individual-level state transitions)
- Discrete Event Simulation (DES, continuous time)
Evidence Synthesis
- IPD survival curve fitting (6 distributions, AIC/BIC comparison)
- KM curve reconstruction (Guyot method — reconstruct IPD from published KM figures)
- NMA posterior sample integration (import MCMC samples from R packages)
Analysis and Decision
- Base case, one-way sensitivity analysis (OWSA), probabilistic sensitivity analysis (PSA)
- Multi-strategy comparison: efficiency frontier, ICER, NMB, CEAC, CEAF, EVPI
- Budget impact analysis (BIA)
- Model calibration (Nelder-Mead / random search)
Export and Visualization
- 28 specialized chart types
- Multi-sheet Excel export + Excel formula verification model (for cross-validating Python results)
- One-command Markdown report (
generate_report()— runs all analyses and produces a report with figures)
Design
One Model Object, All Analyses
Once you define a model, run_base_case(), run_owsa(), and run_psa() flow naturally from the same object — no restructuring code for different analysis types. Deterministic, sensitivity, and probabilistic analyses all live on the same model:
1 | result = model.run_base_case() # deterministic base case |
In R, doing the same thing typically means manually restructuring parameters or passing data between packages. PyHEOR encapsulates all of that inside the model object.
ph.C: The Complement Sentinel
The most annoying part of writing transition matrices is the diagonal element — you have to manually compute 1 - sum(others), and with many parameters this is error-prone. PyHEOR uses a ph.C placeholder that fills in the complement automatically:
1 | model.set_transitions("SOC", lambda p, t: [ |
This is inspired by heemod’s C constant — familiar if you’ve used it before.
Lambda for Everything
Costs, utilities, and transition probabilities are all defined with lambda functions. This naturally supports time-varying logic and parameter dependencies without any special configuration:
1 | # drug cost 5000 for first 24 months, then 2000 |
Note that the microsimulation lambda takes three arguments (p, t, attrs) — the third is patient attributes. The engine infers cohort vs. individual mode from the number of lambda parameters.
Unified Parameter System
One add_param() call defines the base value, OWSA range, and PSA distribution together. Each analysis type automatically uses the appropriate value:
1 | model.add_param("p_HS", |
Built-in distributions: Beta, Gamma, Normal, LogNormal, Uniform, Triangular, Dirichlet, Fixed. Parameterized by mean/sd — no need to hand-calculate alpha/beta.
Discount rates are also part of the parameter system — pass a Param object and they automatically participate in OWSA and PSA:
1 | model = ph.MarkovModel( |
Flexible Cost System
Beyond basic state costs, several special scenarios are supported:
1 | # one-time cost in first cycle (e.g., enrollment screening) |
Transition costs are calculated from per-cycle transition flows and are unaffected by half-cycle correction. Scheduled costs handle multi-cohort cost accumulation through convolution.
Excel Formula Verification
Reviewers often require an Excel verification model. PyHEOR can export a standalone Excel file that replicates the model using native Excel formulas:
1 | ph.export_excel_model(result, "verification.xlsx") |
The exported file has a yellow input area for the transition matrix, and SUMPRODUCT-based formulas for Trace, costs, and QALYs. A Summary sheet shows the difference between Excel and Python results (should be ~0). Reviewers can validate the model logic directly in Excel.
Example: Markov Cohort Model
A three-state (Healthy → Sick → Dead) cost-effectiveness analysis:
1 | import pyheor as ph |
Example: Partitioned Survival Model (PSM)
PSM is the standard approach for oncology economic evaluations. It uses two parametric survival curves (OS and PFS) to partition the population across three states — no transition matrix needed:
1 | import pyheor as ph |
10 parametric distributions are built in (Exponential, Weibull, LogLogistic, LogNormal, Gompertz, Generalized Gamma, etc.), plus ProportionalHazards and AcceleratedFailureTime wrappers for applying HRs or acceleration factors to a baseline curve.
From Published KM Figures to Modeling
A common HEOR problem: the paper only shows a KM curve — no raw data. PyHEOR implements the Guyot method (Guyot et al. 2012) to reconstruct IPD from digitized KM coordinates, which can then be fitted directly to parametric distributions for modeling:
1 | # 1. KM coordinates from WebPlotDigitizer or similar |
Manually digitized coordinates inevitably have noise (non-monotone points, duplicate x values). guyot_reconstruct handles preprocessing internally — outlier detection, duplicate time handling, and enforced monotone non-increasing, following strategies from the R package IPDfromKM.
This “published KM figure → IPD → parametric fit → modeling” pipeline is genuinely useful in practice.
Roadmap
PyHEOR’s current feature set covers most common HEOR workflows. Planned work:
- More examples and tutorials
- Cross-validation against R ecosystem results
- Performance improvements (especially for microsimulation and DES)
Looking further ahead, I’m planning to improve PyHEOR’s AI compatibility:
- Structured output (
to_dict/to_json) so analysis results can be directly read and understood by LLMs - Auto-interpretation (
interpret(wtp)) to generate standardized conclusion text with one call - Natural language modeling interface: define models via JSON Schema so LLMs can build them without writing Python
If you’re working on health economics evaluations, give it a try: PyHEOR on GitHub