PyHEOR: Health Economics Modeling in Python

Why I Built PyHEOR

Most people doing Health Economics and Outcomes Research (HEOR) use R (heemod, hesim, DARTH) or TreeAge. The R ecosystem is mature, but I kept running into friction in real projects:

  • Multiple R packages with inconsistent APIs — a complete analysis workflow means jumping between packages constantly
  • TreeAge is expensive commercial software and isn’t very flexible
  • Python has a much stronger ecosystem for data processing and machine learning, but there’s no complete HEOR framework for it

So I built PyHEOR — a pure Python framework for health economics modeling, designed around a single, unified API that covers the full HEOR workflow.

Project: https://github.com/lenardar/PyHEOR

What It Does

PyHEOR currently supports:

Modeling Engines

  • Markov cohort model (discrete-time state transitions)
  • Partitioned Survival Model (PSM)
  • Microsimulation (individual-level state transitions)
  • Discrete Event Simulation (DES, continuous time)

Evidence Synthesis

  • IPD survival curve fitting (6 distributions, AIC/BIC comparison)
  • KM curve reconstruction (Guyot method — reconstruct IPD from published KM figures)
  • NMA posterior sample integration (import MCMC samples from R packages)

Analysis and Decision

  • Base case, one-way sensitivity analysis (OWSA), probabilistic sensitivity analysis (PSA)
  • Multi-strategy comparison: efficiency frontier, ICER, NMB, CEAC, CEAF, EVPI
  • Budget impact analysis (BIA)
  • Model calibration (Nelder-Mead / random search)

Export and Visualization

  • 28 specialized chart types
  • Multi-sheet Excel export + Excel formula verification model (for cross-validating Python results)
  • One-command Markdown report (generate_report() — runs all analyses and produces a report with figures)

Design

One Model Object, All Analyses

Once you define a model, run_base_case(), run_owsa(), and run_psa() flow naturally from the same object — no restructuring code for different analysis types. Deterministic, sensitivity, and probabilistic analyses all live on the same model:

1
2
3
4
5
6
result = model.run_base_case()      # deterministic base case
owsa = model.run_owsa() # one-way sensitivity
psa = model.run_psa(n_sim=1000) # probabilistic sensitivity

# one-command Markdown report (tornado, CE plane, CEAC, etc.)
ph.generate_report(model, "report.md")

In R, doing the same thing typically means manually restructuring parameters or passing data between packages. PyHEOR encapsulates all of that inside the model object.

ph.C: The Complement Sentinel

The most annoying part of writing transition matrices is the diagonal element — you have to manually compute 1 - sum(others), and with many parameters this is error-prone. PyHEOR uses a ph.C placeholder that fills in the complement automatically:

1
2
3
4
5
model.set_transitions("SOC", lambda p, t: [
[ph.C, p["p_HS"], p["p_HD"]], # ph.C = 1 - p_HS - p_HD
[0, ph.C, p["p_SD"]], # ph.C = 1 - p_SD
[0, 0, 1 ],
])

This is inspired by heemod’s C constant — familiar if you’ve used it before.

Lambda for Everything

Costs, utilities, and transition probabilities are all defined with lambda functions. This naturally supports time-varying logic and parameter dependencies without any special configuration:

1
2
3
4
5
6
7
8
9
# drug cost 5000 for first 24 months, then 2000
model.set_state_cost("Sick", "Trt", lambda p, t: 5000 if t < 24 else 2000)

# age-dependent mortality (microsimulation can access individual attributes)
model.set_transitions("SOC", lambda p, t, attrs: [
[ph.C, p["p_HS"] * (1 + (attrs["age"] - 55) * 0.02), 0.01],
[0, ph.C, p["p_SD"]],
[0, 0, 1],
])

Note that the microsimulation lambda takes three arguments (p, t, attrs) — the third is patient attributes. The engine infers cohort vs. individual mode from the number of lambda parameters.

Unified Parameter System

One add_param() call defines the base value, OWSA range, and PSA distribution together. Each analysis type automatically uses the appropriate value:

1
2
3
4
5
6
model.add_param("p_HS",
base=0.15, # used in deterministic analysis
low=0.10, high=0.20, # OWSA range
dist=ph.Beta(mean=0.15, sd=0.03), # PSA sampling distribution
label="Healthy → Sick probability",
)

Built-in distributions: Beta, Gamma, Normal, LogNormal, Uniform, Triangular, Dirichlet, Fixed. Parameterized by mean/sd — no need to hand-calculate alpha/beta.

Discount rates are also part of the parameter system — pass a Param object and they automatically participate in OWSA and PSA:

1
2
3
4
5
model = ph.MarkovModel(
...,
dr_cost=ph.Param(0.03, low=0.0, high=0.08, label="Cost discount rate"),
dr_qaly=ph.Param(0.03, low=0.0, high=0.05, label="QALY discount rate"),
)

Flexible Cost System

Beyond basic state costs, several special scenarios are supported:

1
2
3
4
5
6
7
8
9
10
11
# one-time cost in first cycle (e.g., enrollment screening)
model.set_state_cost("PFS", "Trt", lambda p, t: 50000, first_cycle_only=True)

# cost limited to specific cycles (e.g., 24-month treatment)
model.set_state_cost("PFS", "Trt", lambda p, t: p["c_drug"], apply_cycles=(0, 24))

# transition cost: surgery when moving from PFS to Progressed
model.set_transition_cost("surgery", "PFS", "Progressed", 50000)

# transition cost schedule: surgery + two follow-up cycles
model.set_transition_cost("surgery", "PFS", "Progressed", [50000, 10000, 10000])

Transition costs are calculated from per-cycle transition flows and are unaffected by half-cycle correction. Scheduled costs handle multi-cohort cost accumulation through convolution.

Excel Formula Verification

Reviewers often require an Excel verification model. PyHEOR can export a standalone Excel file that replicates the model using native Excel formulas:

1
ph.export_excel_model(result, "verification.xlsx")

The exported file has a yellow input area for the transition matrix, and SUMPRODUCT-based formulas for Trace, costs, and QALYs. A Summary sheet shows the difference between Excel and Python results (should be ~0). Reviewers can validate the model logic directly in Excel.

Example: Markov Cohort Model

A three-state (Healthy → Sick → Dead) cost-effectiveness analysis:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import pyheor as ph

model = ph.MarkovModel(
states=["Healthy", "Sick", "Dead"],
strategies=["Standard", "New Treatment"],
n_cycles=40,
cycle_length=1,
dr_cost=ph.Param(0.03, low=0.0, high=0.08, label="Cost discount rate"),
dr_qaly=ph.Param(0.03, low=0.0, high=0.05, label="QALY discount rate"),
half_cycle_correction=True,
)

model.add_param("p_HS", base=0.15, dist=ph.Beta(mean=0.15, sd=0.03))
model.add_param("p_SD", base=0.10, dist=ph.Beta(mean=0.10, sd=0.02))

model.set_transitions("Standard", lambda p, t: [
[ph.C, p["p_HS"], 0.02],
[0, ph.C, p["p_SD"]],
[0, 0, 1],
])

model.set_transitions("New Treatment", lambda p, t: [
[ph.C, p["p_HS"] * 0.7, 0.02],
[0, ph.C, p["p_SD"] * 0.8],
[0, 0, 1],
])

model.set_state_cost("medical", {"Healthy": 500, "Sick": 3000, "Dead": 0})
model.set_state_cost("drug", {
"Standard": {"Healthy": 0, "Sick": 0, "Dead": 0},
"New Treatment": {"Healthy": 2000, "Sick": 2000, "Dead": 0},
})
model.set_utility({"Healthy": 0.95, "Sick": 0.60, "Dead": 0.0})

result = model.run_base_case()
print(result.summary())
print(result.icer())

owsa = model.run_owsa()
owsa.plot_tornado()

psa = model.run_psa(n_sim=1000)
psa.plot_scatter(wtp=50000)
psa.plot_ceac(wtp_range=(0, 150000))

ph.generate_report(model, "report.md")

Example: Partitioned Survival Model (PSM)

PSM is the standard approach for oncology economic evaluations. It uses two parametric survival curves (OS and PFS) to partition the population across three states — no transition matrix needed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import pyheor as ph

psm = ph.PSMModel(
states=["PFS", "Progressed", "Dead"],
survival_endpoints=["PFS", "OS"],
strategies=["SOC", "New Drug"],
n_cycles=120,
cycle_length=1/12, # monthly cycles
dr_cost=0.03,
dr_qaly=0.03,
)

baseline_pfs = ph.LogLogistic(shape=1.5, scale=18)
baseline_os = ph.Weibull(shape=1.2, scale=36)

psm.set_survival("SOC", "PFS", baseline_pfs)
psm.set_survival("SOC", "OS", baseline_os)

psm.set_survival("New Drug", "OS",
lambda p: ph.ProportionalHazards(baseline_os, hr=0.7))
psm.set_survival("New Drug", "PFS",
lambda p: ph.AcceleratedFailureTime(baseline_pfs, af=1.3))

psm.set_state_cost("treatment", {
"SOC": {"PFS": 1000, "Progressed": 2500, "Dead": 0},
"New Drug": {"PFS": 6000, "Progressed": 2500, "Dead": 0},
})
psm.set_utility({"PFS": 0.80, "Progressed": 0.55, "Dead": 0.0})

result = psm.run_base_case()
print(result.summary())
print(result.icer())

result.plot_survival()
result.plot_state_area()

10 parametric distributions are built in (Exponential, Weibull, LogLogistic, LogNormal, Gompertz, Generalized Gamma, etc.), plus ProportionalHazards and AcceleratedFailureTime wrappers for applying HRs or acceleration factors to a baseline curve.

From Published KM Figures to Modeling

A common HEOR problem: the paper only shows a KM curve — no raw data. PyHEOR implements the Guyot method (Guyot et al. 2012) to reconstruct IPD from digitized KM coordinates, which can then be fitted directly to parametric distributions for modeling:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 1. KM coordinates from WebPlotDigitizer or similar
t_digitized = [0, 2, 4, 6, 8, 10, 12, 15, 18, 21, 24]
s_digitized = [1.0, 0.92, 0.83, 0.74, 0.66, 0.58, 0.50, 0.40, 0.32, 0.25, 0.20]

# 2. number-at-risk table from the paper
t_risk = [0, 6, 12, 18, 24]
n_risk = [120, 88, 60, 38, 22]

# 3. reconstruct IPD (with built-in preprocessing: denoising, sorting, monotonization)
ipd_time, ipd_event = ph.guyot_reconstruct(
t_digitized, s_digitized,
t_risk, n_risk,
tot_events=96, # optional: total events reported in the paper
)

# 4. fit 6 distributions with automatic AIC/BIC comparison
fitter = ph.SurvivalFitter(ipd_time, ipd_event, label="OS")
fitter.fit()
print(fitter.summary())
best = fitter.best_model()

# 5. use directly in PSM
psm.set_survival("SOC", "OS", best.distribution)

Manually digitized coordinates inevitably have noise (non-monotone points, duplicate x values). guyot_reconstruct handles preprocessing internally — outlier detection, duplicate time handling, and enforced monotone non-increasing, following strategies from the R package IPDfromKM.

This “published KM figure → IPD → parametric fit → modeling” pipeline is genuinely useful in practice.

Roadmap

PyHEOR’s current feature set covers most common HEOR workflows. Planned work:

  • More examples and tutorials
  • Cross-validation against R ecosystem results
  • Performance improvements (especially for microsimulation and DES)

Looking further ahead, I’m planning to improve PyHEOR’s AI compatibility:

  • Structured output (to_dict / to_json) so analysis results can be directly read and understood by LLMs
  • Auto-interpretation (interpret(wtp)) to generate standardized conclusion text with one call
  • Natural language modeling interface: define models via JSON Schema so LLMs can build them without writing Python

If you’re working on health economics evaluations, give it a try: PyHEOR on GitHub