Docs, parameters,
references.
Everything you need to reason about what EpiChat will and won't do — architecture, parameters, disease defaults, and the seven-phase data roadmap.
Architecture
EpiChat is a five-layer pipeline. The LLM handles language; templates handle the simulation. Free-form code is never executed.
User NL query
│
▼
Layer 1 LLM Parameter Parser (Claude API + extraction prompt)
│ Structured JSON (SimParams)
▼
Layer 2 Data-informed resolver (country data — proposed)
│ Calibrated params
▼
Layer 3 Code generator (Jinja2 → Starsim Python)
│ Python script
▼
Layer 4 Execution engine (subprocess sandbox, 90s timeout)
│ Stats JSON + plot file
▼
Layer 5 Results narrator (Claude API + narration prompt)
│ Plain-language summary
▼
User: epidemic curve + interpretation
Disease models
Six compartmental models, each specified directly or inferred from a disease name.
| Model | Compartments | Use case | Required extras |
|---|---|---|---|
| SIR | S → I → R | Standard acute (flu, COVID acute) | — |
| SEIR | S → E → I → R | Latent period (measles, SARS) | dur_exp |
| SIS | S → I → S | No lasting immunity (gonorrhea) | — |
| SIRS | S → I → R → S | Waning immunity (COVID endemic) | dur_immune |
| SEIRS | S → E → I → R → S | Latent + waning (COVID full) | dur_exp, dur_immune |
| SEIAR | S → E → I|A → R | Asymptomatic transmission (flu, COVID) | dur_exp, p_asymp, rel_trans_asymp |
Built-in disease defaults
Name a disease without full parameters and EpiChat fills these in.
| Disease | Model | dur_inf | dur_exp | n_contacts | p_death | Notes |
|---|---|---|---|---|---|---|
| Generic SIR | SIR | 10 d | — | 4 | 0.0 | R₀ = 2.5 |
| COVID (acute) | SIR | 8 d | — | 6 | 0.01 | |
| COVID (endemic) | SIRS | 8 d | — | 6 | 0.005 | immune = 180 d |
| COVID (full) | SEIRS | 8 d | 5 d | 6 | 0.005 | immune = 180 d |
| Influenza | SIR | 5 d | — | 6 | 0.001 | |
| Measles | SEIR | 8 d | 12 d | 10 | 0.001 | R₀ ≈ 15 |
| SARS-like | SEIR | 10 d | 5 d | 5 | 0.05 | |
| Ebola | SIR | 10 d | — | 3 | 0.5 | |
| Gonorrhea | SIS | 90 d | — | 2 | 0.0 | R₀ ≈ 2 |
Parameters
Every parameter is validated by a Pydantic schema before the template layer sees it.
| Parameter | Type | Default | Description |
|---|---|---|---|
| disease_type | string | sir | sir · seir · sis · sirs · seirs · seiar |
| n_agents | int | 10,000 | Population size · 10 → 1,000,000 |
| beta | float | computed | β = R₀ × 365 / (n_contacts × dur_inf) |
| init_prev | float | 0.01 | Seed fraction infected |
| dur_inf | float | 10.0 | Infectious period (days) |
| dur_exp | float? | null | Latent period · required for SEIR/SEIRS/SEIAR |
| dur_immune | float? | null | Immunity days before waning · SIRS/SEIRS |
| p_death | float | 0.0 | Infection fatality rate |
| p_asymp | float | 0.3 | Asymptomatic fraction · SEIAR only |
| sim_dur_years | float | 1.0 | Simulation horizon |
| network_type | string | random | random · age_structured |
| n_contacts | int | 4 | Avg daily contacts |
| network_beta | float | 1.0 | Transmission multiplier per contact |
Interventions
Up to three intervention types can combine in a single simulation.
Vaccine
Removes a fraction of agents from the susceptible pool. start_day=0 = pre-existing immunity; >0 = ongoing campaign starting that day.
Treatment
Reduces infectious duration or mortality for a fraction of infected agents. Optional daily capacity cap.
Seasonality
Modulates transmission rate sinusoidally over the year — scale is variation strength, shift is phase (0.0 = winter peak, 0.5 = summer peak).
NL query tips
EpiChat's parser understands a wide range of phrasing. A few useful patterns:
| You say | EpiChat does |
|---|---|
| "R0 = 2.5" | Converts to β via R₀ × 365 / (n_contacts × dur_inf) |
| "COVID", "flu", "measles" | Applies built-in disease defaults |
| "80% vaccinated" | Vaccine intervention, start_day=0 |
| "campaign starting month 3" | Vaccine intervention, start_day=90 |
| "winter peak" / "seasonal" | Seasonality · scale=0.3, shift=0.0 |
| "endemic" / "long-run" | Enables demographics |
| "age-structured" / "school-age" | network_type=age_structured |
| "masks" / "50% mask uptake" | network_beta ≈ 0.75 |
| "seed 42" / "reproducible" | rand_seed=42 |
Roadmap
Seven phases in priority order. Complexity is implementation effort; impact is how much it moves the scientific needle.
| # | Phase | Data | Complexity | Impact |
|---|---|---|---|---|
| 1 | Country demographics | UN WPP · World Bank · WHO GHO | Low–Medium | High |
| 2 | Country contact matrices | Prem et al. 2021 · SOCRATES · CoMix | Medium | High |
| 3 | Age-specific severity | O'Driscoll 2021 · CDC COVID-NET · FluView | Medium | High |
| 4 | HouseholdNet | DHS · IPUMS · UN household data · ACS | High | Medium |
| 5 | Calibration vs surveillance | OWID · CDC Tracker · WHO FluNet · Tycho | High | Very High |
| 6 | STI networks | DHS sexual module · NATSAL · CDC NHBS | Very High | Medium |
| 7 | Geospatial metapopulation | WorldPop · OAG · OSM | Very High | High |
Risks · mitigations
| Risk | Mitigation |
|---|---|
| Invalid Starsim code | Template-based generation — no free-form LLM code |
| Parameter hallucination | Pydantic validation + plausibility checks |
| Execution failure | 3-attempt recovery loop with LLM re-parameterization |
| Timeout | 90-second subprocess limit |
| API instability | Starsim version pinned in requirements.txt |