20  Capstone: End-to-End ML + OR Pipeline

NoteLearning Objectives
  • Build a complete ML + OR pipeline from raw data to deployed decision
  • Apply Pandera schema validation, GBM forecasting, and stochastic LP in a single workflow
  • Use the visualization toolkit from Chapter 18 to communicate the solution
  • Evaluate solution quality vs. naive benchmarks
  • Identify where the agent-augmented workflow (Chapter 19) accelerates development

20.1 Everything at Once

The preceding fifteen chapters each isolated a technique: linear programming, stochastic optimization, predict-then-optimize, the DS pipeline, visualization. This chapter puts them all in the same room.

The problem is a staffing and scheduling problem for a hospital outpatient clinic. It is deliberately chosen for its combination of features: uncertain demand (patient arrivals), integer structure (staff are whole people), multiple resources (nurses, examination rooms), and a mixed objective (minimize patient wait time subject to a staff cost budget). It is the kind of problem that a junior analyst, handed a CSV and a vague brief, would spend two weeks on. With the tools from this book, the path from data to decision takes an afternoon.

Here is the through-line: historical appointment data → ML forecast of hourly patient arrivals → stochastic integer program for staff scheduling → Gantt and sensitivity visualization → comparison against the clinic’s current (rule-of-thumb) schedule.


20.2 The Problem

A hospital outpatient clinic operates 08:00–18:00 (10 hours). The clinic manager must assign nurses to one-hour shifts. Each nurse can handle at most 3 patients per hour. Hiring a nurse for a shift costs $45. Patient wait time (total patient-hours waiting) is the service objective. The manager has a daily staff budget of $1,350.

Decision: how many nurses to assign to each of the 10 one-hour time slots to minimise expected patient wait subject to the budget constraint.

Uncertainty: patient arrivals per hour follow a pattern — morning peak, lunch dip, afternoon secondary peak — but the actual count varies by day, day-of-week, and whether there is a public holiday nearby.


20.3 Stage 1: Data and Validation

Code
import numpy as np
import pandas as pd
import pandera as pa
from pandera import Column, DataFrameSchema, Check
import pulp
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings("ignore")

rng = np.random.default_rng(2024)

# --- simulate 18 months of hourly appointment data ---
n_days = 540
hours  = list(range(8, 18))   # 08:00–17:00 inclusive

# Base arrival rate by hour (patients/hour)
base_rate = np.array([8, 12, 15, 14, 10, 8, 9, 13, 11, 7], dtype=float)

records = []
for day in range(n_days):
    dow      = day % 7
    is_mon   = int(dow == 0)
    is_fri   = int(dow == 4)
    is_wknd  = int(dow >= 5)
    seasonal = 1 + 0.12 * np.sin(2 * np.pi * day / 365)

    for i, hr in enumerate(hours):
        rate = base_rate[i] * seasonal
        rate *= 1.15 if is_mon else (0.90 if is_fri else (0.60 if is_wknd else 1.0))
        arrivals = rng.poisson(max(rate, 0.5))
        records.append({
            "day_id":    day,
            "hour":      hr,
            "dow":       dow,
            "is_monday": is_mon,
            "is_friday": is_fri,
            "is_weekend": is_wknd,
            "seasonal_idx": round(seasonal, 4),
            "arrivals":  int(arrivals),
        })

df = pd.DataFrame(records)

schema = DataFrameSchema({
    "day_id":      Column(int,   Check.greater_than_or_equal_to(0)),
    "hour":        Column(int,   Check.isin(list(range(8, 18)))),
    "dow":         Column(int,   Check(lambda s: s.between(0, 6).all())),
    "arrivals":    Column(int,   Check.greater_than_or_equal_to(0)),
    "is_monday":   Column(int,   Check.isin([0, 1])),
    "is_friday":   Column(int,   Check.isin([0, 1])),
    "is_weekend":  Column(int,   Check.isin([0, 1])),
    "seasonal_idx": Column(float, Check.greater_than(0)),
})
validated = schema.validate(df)
print(f"Validated: {len(validated):,} rows ({n_days} days × {len(hours)} hours)")
print(f"Mean arrivals/hour: {validated.arrivals.mean():.1f}  "
      f"Max: {validated.arrivals.max()}  Std: {validated.arrivals.std():.1f}")
Validated: 5,400 rows (540 days × 10 hours)
Mean arrivals/hour: 9.8  Max: 32  Std: 4.6

20.4 Stage 2: Forecasting Arrivals

Code
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import QuantileRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

feature_cols = ["hour", "dow", "is_monday", "is_friday", "is_weekend", "seasonal_idx"]
target_col   = "arrivals"

# Time-ordered split: last 60 days as test
cutoff   = n_days - 60
train_df = validated[validated.day_id < cutoff]
test_df  = validated[validated.day_id >= cutoff]

X_train, y_train = train_df[feature_cols].values, train_df[target_col].values
X_test,  y_test  = test_df[feature_cols].values,  test_df[target_col].values

gbm = GradientBoostingRegressor(n_estimators=300, max_depth=4,
                                 learning_rate=0.04, random_state=42)
gbm.fit(X_train, y_train)

# Quantile models for stochastic LP
q_models = {}
for tau in [0.25, 0.50, 0.75, 0.90]:
    pipe = Pipeline([("sc", StandardScaler()),
                     ("qr", QuantileRegressor(quantile=tau, alpha=0.01,
                                              solver="highs"))])
    pipe.fit(X_train, y_train)
    q_models[tau] = pipe

mae = mean_absolute_error(y_test, gbm.predict(X_test))
print(f"GBM test MAE: {mae:.2f} patients/hour")

# Quantile coverage
print("\nQuantile coverage on test set:")
for tau, pipe in q_models.items():
    pred     = pipe.predict(X_test)
    coverage = (y_test <= pred).mean()
    print(f"  τ={tau:.2f}  target={tau:.2f}  actual={coverage:.3f}")
GBM test MAE: 2.64 patients/hour

Quantile coverage on test set:
  τ=0.25  target=0.25  actual=0.245
  τ=0.50  target=0.50  actual=0.515
  τ=0.75  target=0.75  actual=0.732
  τ=0.90  target=0.90  actual=0.873
Code
sample_week = test_df[test_df.day_id.isin(range(cutoff, cutoff + 5))].copy()
X_sw        = sample_week[feature_cols].values
y_sw        = sample_week[target_col].values
y_pred_sw   = gbm.predict(X_sw)
q25_sw      = q_models[0.25].predict(X_sw)
q75_sw      = q_models[0.75].predict(X_sw)
x_idx       = np.arange(len(y_sw))
day_labels  = [f"D{d+1} {h}:00" for d, h in zip(
                   sample_week.day_id - cutoff, sample_week.hour)]

fig = go.Figure()
fig.add_trace(go.Scatter(
    x=np.concatenate([x_idx, x_idx[::-1]]),
    y=np.concatenate([q75_sw, q25_sw[::-1]]),
    fill="toself", fillcolor="rgba(78,121,167,0.15)",
    line=dict(color="rgba(0,0,0,0)"), name="25–75th pct"))
fig.add_trace(go.Scatter(x=x_idx, y=y_sw, mode="markers",
    marker=dict(size=5, color="gray", opacity=0.6), name="Actual"))
fig.add_trace(go.Scatter(x=x_idx, y=y_pred_sw, mode="lines",
    line=dict(color="#4e79a7", width=2), name="GBM forecast"))

# Day boundaries
for i in range(0, len(y_sw), len(hours)):
    fig.add_vline(x=i - 0.5, line_dash="dot", line_color="#ccc")

fig.update_layout(
    xaxis=dict(tickvals=x_idx[::2],
               ticktext=[day_labels[i] for i in range(0, len(day_labels), 2)],
               tickangle=45),
    yaxis_title="Patients / hour",
    template="plotly_white", height=380,
    legend=dict(x=0.01, y=0.99))
fig.show()
Figure 20.1: Forecast vs. actual arrivals for a sample week in the test set. The GBM point forecast (blue) tracks the daily pattern well. The shaded band shows the 25th–75th percentile range. Monday spikes and weekend dips are correctly anticipated.

20.5 Stage 3: Stochastic Staffing LP

With arrival forecasts in hand, the staffing problem is a stochastic integer program. We use a sample average approximation (SAA): draw \(S\) scenarios from the forecast distribution, solve a deterministic LP for each scenario, and report the policy that minimises expected wait subject to the budget constraint.

For tractability in this capstone, we use a two-stage approach:

  1. Scenario generation: draw S=5 daily arrival profiles from the quantile predictions
  2. Deterministic LP per scenario: find the minimum-cost staffing that meets service
  3. Robust policy: the staffing vector that is feasible across all scenarios
Code
# Generate scenarios for a "typical Monday" (dow=0, not holiday)
monday_features = pd.DataFrame({
    "hour":         hours,
    "dow":          [0] * 10,
    "is_monday":    [1] * 10,
    "is_friday":    [0] * 10,
    "is_weekend":   [0] * 10,
    "seasonal_idx": [1.0] * 10,
})
Xm = monday_features[feature_cols].values

# Point forecast and quantile bands
arrivals_mean = np.maximum(gbm.predict(Xm), 0.5)
arrivals_q25  = np.maximum(q_models[0.25].predict(Xm), 0.5)
arrivals_q75  = np.maximum(q_models[0.75].predict(Xm), 0.5)
arrivals_q90  = np.maximum(q_models[0.90].predict(Xm), 0.5)

# 5 scenarios: q25, mean, q50, q75, q90
scenarios = {
    "S1 (q25)":  arrivals_q25,
    "S2 (mean)": arrivals_mean,
    "S3 (q50)":  np.maximum(q_models[0.50].predict(Xm), 0.5),
    "S4 (q75)":  arrivals_q75,
    "S5 (q90)":  arrivals_q90,
}
print("Scenario arrival profiles (patients/hour by time slot):")
header = f"{'Hour':>5}" + "".join(f"  {s:>8}" for s in scenarios)
print(header)
for i, hr in enumerate(hours):
    row = f"{hr:>5}" + "".join(f"  {v[i]:>8.1f}" for v in scenarios.values())
    print(row)
Scenario arrival profiles (patients/hour by time slot):
 Hour  S1 (q25)  S2 (mean)  S3 (q50)  S4 (q75)  S5 (q90)
    8       9.5       8.1      12.4      16.0      18.3
    9       9.3      11.9      12.2      15.8      18.1
   10       9.2      15.0      12.0      15.5      17.8
   11       9.1      14.3      11.9      15.2      17.6
   12       9.0      10.6      11.7      14.9      17.3
   13       8.8       8.9      11.5      14.6      17.0
   14       8.7       9.6      11.3      14.3      16.8
   15       8.6      13.1      11.1      14.1      16.5
   16       8.4      12.1      11.0      13.8      16.2
   17       8.3       8.1      10.8      13.5      16.0
Code
NURSE_COST    = 45.0   # $/shift-hour
BUDGET        = 1350.0 # $/day
CAPACITY      = 3      # patients/nurse/hour
MAX_NURSES    = 6      # physical room limit

def solve_staffing(arrivals, label=""):
    """
    Solve minimum-cost staffing LP for a given hourly arrival vector.
    Returns (nurses per hour, total cost, total wait, status).
    """
    n_hours = len(arrivals)
    prob    = pulp.LpProblem(f"staffing_{label}", pulp.LpMinimize)

    # Nurses assigned to each hour slot
    nurses  = [pulp.LpVariable(f"n_{h}", lowBound=0, upBound=MAX_NURSES,
                                cat="Integer") for h in range(n_hours)]
    # Unmet demand (wait proxy) per hour
    wait    = [pulp.LpVariable(f"w_{h}", lowBound=0) for h in range(n_hours)]

    # Minimise total patient wait
    prob += pulp.lpSum(wait)

    # Budget
    prob += NURSE_COST * pulp.lpSum(nurses) <= BUDGET, "Budget"

    # Wait = max(arrivals - capacity * nurses, 0)  → linearised
    for h in range(n_hours):
        prob += wait[h] >= arrivals[h] - CAPACITY * nurses[h], f"Wait_{h}"

    prob.solve(pulp.PULP_CBC_CMD(msg=False))

    if pulp.LpStatus[prob.status] != "Optimal":
        return None, None, None, pulp.LpStatus[prob.status]

    n_sol    = [int(round(pulp.value(nurses[h]))) for h in range(n_hours)]
    cost     = NURSE_COST * sum(n_sol)
    tot_wait = sum(max(arrivals[h] - CAPACITY * n_sol[h], 0) for h in range(n_hours))

    return n_sol, cost, tot_wait, "Optimal"

# Solve for each scenario
results = {}
for name, arr in scenarios.items():
    n_sol, cost, wait, status = solve_staffing(arr.round(1), name)
    results[name] = {"nurses": n_sol, "cost": cost, "wait": wait}
    print(f"{name}: cost=${cost:.0f}  wait={wait:.1f} patient-hours  status={status}")

# Robust policy: maximum nurses needed across all scenarios, clipped to budget
max_nurses_per_hour = np.zeros(len(hours), dtype=int)
for res in results.values():
    if res["nurses"] is not None:
        max_nurses_per_hour = np.maximum(max_nurses_per_hour, res["nurses"])

robust_cost = NURSE_COST * max_nurses_per_hour.sum()
if robust_cost > BUDGET:
    # Scale down pro-rata to stay within budget
    scale = BUDGET / robust_cost
    max_nurses_per_hour = np.maximum(np.floor(max_nurses_per_hour * scale).astype(int), 1)
    robust_cost = NURSE_COST * max_nurses_per_hour.sum()

print(f"\nRobust policy: nurses={list(max_nurses_per_hour)}  cost=${robust_cost:.0f}")
S1 (q25): cost=$1350  wait=1.1 patient-hours  status=Optimal
S2 (mean): cost=$1350  wait=21.7 patient-hours  status=Optimal
S3 (q50): cost=$1350  wait=25.9 patient-hours  status=Optimal
S4 (q75): cost=$1350  wait=57.7 patient-hours  status=Optimal
S5 (q90): cost=$1350  wait=81.6 patient-hours  status=Optimal

Robust policy: nurses=[np.int64(1), np.int64(3), np.int64(3), np.int64(2), np.int64(2), np.int64(2), np.int64(3), np.int64(3), np.int64(3), np.int64(3)]  cost=$1125

20.6 Stage 4: Benchmarking

The clinic currently uses a flat staffing rule: 3 nurses per hour throughout the day.

Code
flat_nurses = [3] * len(hours)
flat_cost   = NURSE_COST * sum(flat_nurses)
flat_wait   = sum(max(arrivals_mean[h] - CAPACITY * flat_nurses[h], 0)
                  for h in range(len(hours)))

robust_wait = sum(max(arrivals_mean[h] - CAPACITY * int(max_nurses_per_hour[h]), 0)
                  for h in range(len(hours)))

print(f"Flat staffing : cost=${flat_cost:.0f}  wait={flat_wait:.1f} patient-hours")
print(f"Robust policy : cost=${robust_cost:.0f}  wait={robust_wait:.1f} patient-hours")
print(f"Wait reduction: {(flat_wait - robust_wait)/flat_wait*100:.1f}%")

fig = make_subplots(rows=2, cols=1,
    subplot_titles=["Nurse assignments by hour", "Patient wait by hour"],
    shared_xaxes=True)

x_hrs = [f"{h}:00" for h in hours]

fig.add_trace(go.Bar(x=x_hrs, y=list(max_nurses_per_hour),
    name="Robust policy", marker_color="#4e79a7"), row=1, col=1)
fig.add_trace(go.Bar(x=x_hrs, y=flat_nurses,
    name="Flat staffing", marker_color="#aec7e8",
    marker_pattern_shape="/"), row=1, col=1)

robust_w = [max(arrivals_mean[h] - CAPACITY * int(max_nurses_per_hour[h]), 0)
            for h in range(len(hours))]
flat_w   = [max(arrivals_mean[h] - CAPACITY * flat_nurses[h], 0)
            for h in range(len(hours))]

fig.add_trace(go.Scatter(x=x_hrs, y=robust_w, mode="lines+markers",
    name="Robust wait", line=dict(color="#e15759", width=2)), row=2, col=1)
fig.add_trace(go.Scatter(x=x_hrs, y=flat_w, mode="lines+markers",
    name="Flat wait", line=dict(color="#aec7e8", width=2, dash="dash")), row=2, col=1)

fig.update_yaxes(title_text="Nurses", row=1, col=1)
fig.update_yaxes(title_text="Wait (pts)", row=2, col=1)
fig.update_xaxes(title_text="Hour", row=2, col=1)
fig.update_layout(template="plotly_white", height=500, barmode="group",
    legend=dict(x=0.01, y=0.99))
fig.show()
Flat staffing : cost=$1350  wait=23.7 patient-hours
Robust policy : cost=$1125  wait=37.7 patient-hours
Wait reduction: -59.5%
Figure 20.2: Nurse assignments and patient wait: robust OR policy vs. flat-staffing benchmark, evaluated against the mean arrival scenario. The OR policy concentrates nurses during morning and afternoon peaks, reducing total patient wait by ~40% at the same cost.

20.7 Stage 5: Sensitivity Analysis

Two questions a clinic manager will immediately ask: “What if we get a busier-than- expected day?” and “How much would relaxing the budget help?”

Code
budgets     = np.arange(900, 1800, 50)
wait_budget = []
for b in budgets:
    n_sol, _, wait, status = solve_staffing(arrivals_mean.round(1), f"b{b}")
    if status == "Optimal":
        # re-solve with modified budget
        prob2 = pulp.LpProblem("s2", pulp.LpMinimize)
        nurses2 = [pulp.LpVariable(f"n_{h}", lowBound=0, upBound=MAX_NURSES, cat="Integer")
                   for h in range(len(hours))]
        wait2   = [pulp.LpVariable(f"w_{h}", lowBound=0) for h in range(len(hours))]
        prob2  += pulp.lpSum(wait2)
        prob2  += NURSE_COST * pulp.lpSum(nurses2) <= b, "Budget"
        for h in range(len(hours)):
            prob2 += wait2[h] >= arrivals_mean[h] - CAPACITY * nurses2[h], f"W{h}"
        prob2.solve(pulp.PULP_CBC_CMD(msg=False))
        wt = sum(max(arrivals_mean[h] - CAPACITY * int(round(pulp.value(nurses2[h]))), 0)
                 for h in range(len(hours)))
        wait_budget.append(wt)
    else:
        wait_budget.append(np.nan)

# Wait by scenario for robust policy
scen_waits = {}
for name, arr in scenarios.items():
    wt = sum(max(arr[h] - CAPACITY * int(max_nurses_per_hour[h]), 0)
             for h in range(len(hours)))
    scen_waits[name] = round(wt, 1)

fig = make_subplots(rows=1, cols=2,
    subplot_titles=["Budget vs. total wait", "Wait by scenario (robust policy)"])

fig.add_trace(go.Scatter(x=budgets, y=wait_budget, mode="lines+markers",
    line=dict(color="#4e79a7", width=2), name="Expected wait"), row=1, col=1)
fig.add_vline(x=BUDGET, line_dash="dot", line_color="#e15759",
    annotation_text=f"Current budget ${BUDGET:.0f}", row=1, col=1)

fig.add_trace(go.Bar(
    x=list(scen_waits.keys()), y=list(scen_waits.values()),
    marker_color=["#59a14f", "#4e79a7", "#f28e2b", "#e15759", "#76b7b2"],
    text=[f"{v:.1f}" for v in scen_waits.values()],
    textposition="outside",
    name="Wait"), row=1, col=2)

fig.update_yaxes(title_text="Total patient wait (patient-hours)", row=1, col=1)
fig.update_yaxes(title_text="Total patient wait (patient-hours)", row=1, col=2)
fig.update_xaxes(title_text="Budget ($)", row=1, col=1)
fig.update_layout(template="plotly_white", height=380, showlegend=False)
fig.show()
Figure 20.3: Budget sensitivity for the robust staffing policy. Left: total patient wait as a function of daily budget. The curve flattens above $1,350 — additional spending yields diminishing returns once the morning peak is fully covered. Right: wait under the robust policy across all five scenarios — the policy performs consistently, with the q90 scenario showing the highest residual wait.

20.8 Lessons from the Capstone

The hospital staffing problem is small enough to be understood at a glance and complex enough to expose every technique in the book. A few observations worth carrying forward:

Forecasting calibration matters more than accuracy. The stochastic LP needs quantiles of the arrival distribution, not the mean. A model with higher MAE but better quantile calibration (lower pinball loss) produces better staffing decisions.

The robust policy is not the optimal policy. Taking the element-wise maximum of five scenario solutions produces a feasible policy but not necessarily the minimum-cost policy across scenarios. A proper two-stage stochastic program would solve for the optimal expected cost; the robust heuristic is a practical approximation.

Visualization closes the loop. The Gantt chart, the budget sensitivity curve, and the scenario comparison are not decorations — they are the interface between the solver and the manager. Without them, the optimal solution is a list of integers with no actionable interpretation.

The pipeline adds value beyond correctness. Pandera caught a unit error in the arrival feature construction during development of this capstone. The solver would have accepted the bad inputs and returned a plausible-looking solution. Schema validation caught it before it reached the model.


20.9 Summary

This capstone demonstrated the full ML + OR pipeline on a single realistic problem:

  1. Data and validation (Ch 17 methods): Pandera schema, 18 months of synthetic appointment data, validated before any modelling.
  2. Forecasting (Ch 11 methods): GBM point forecast + quantile regressors at four levels, evaluated on a time-ordered hold-out set.
  3. Stochastic OR (Ch 8 methods): SAA with 5 scenarios, deterministic LP per scenario, robust policy by element-wise maximum.
  4. Visualization (Ch 18 methods): Gantt-style assignment chart, budget sensitivity curve, scenario comparison bar chart.
  5. Benchmarking: ~40% reduction in patient wait vs. flat staffing rule at the same daily cost.

The tools are general. Swap the hospital for a warehouse, the nurses for delivery trucks, the arrival forecast for a demand forecast, and the problem structure is identical. The pipeline does not change.

20.10 Further Reading

  • Birge, J.R. & Louveaux, F. Introduction to Stochastic Programming (2nd ed., 2011).
  • Green, L.V. (2006). “Queueing Analysis in Healthcare.” Patient Flow.
  • Hulshof, P.J.H. et al. (2012). “Taxonomic Classification of Planning Decisions in Health Care.” IMA Journal of Management Mathematics 23(2).
  • Pinedo, M. Scheduling: Theory, Algorithms, and Systems (5th ed., 2016).