# Learning-Augmented Optimization {#sec-learned-opt}
::: {.callout-note appearance="simple"}
## Learning Objectives
- Define competitive ratio and explain when classical online algorithms fall short
- Apply the ski rental algorithm and show how ML predictions improve it
- Implement learning-augmented algorithms for online scheduling
- Use ML predictions to warm-start integer programs, cutting solve time
- Quantify the consistency–robustness trade-off in prediction-augmented algorithms
:::
## The Limits of Worst-Case Guarantees {#sec-la-intro}
Classical algorithm design optimises for **worst-case inputs** — the adversarially
chosen sequence that maximises the ratio of algorithm cost to optimal cost. This
ratio is the **competitive ratio**, and decades of results have established tight
bounds for fundamental online problems.
The problem: real inputs are rarely adversarial. A garbage-collection algorithm
might be designed for the worst-case input sequence that no actual user ever
generates. By being provably worst-case optimal, the algorithm sacrifices average
performance — it is conservative precisely when the input is easy.
**Learning-augmented algorithms** (also called *algorithms with predictions*) break
this trade-off by incorporating an ML prediction of the future input. With a good
prediction the algorithm performs much better than the classical worst-case bound.
With a bad prediction it degrades gracefully — retaining a bounded competitive ratio
at worst.
Through-line: we compare classical online algorithms to their prediction-augmented
counterparts on scheduling and caching, showing how ML forecasts improve on classical
OR guarantees when predictions are accurate while preserving worst-case safety.
---
## Online Algorithms and Competitive Ratio {#sec-competitive-ratio}
An **online algorithm** processes a sequence of requests $\sigma_1, \sigma_2, \ldots$
one at a time, making an irrevocable decision at each step without knowledge of future
inputs. Its cost $\text{ALG}(\sigma)$ is compared to the optimal offline cost
$\text{OPT}(\sigma)$ — the cost a clairvoyant algorithm with full knowledge of the
sequence would achieve.
**Competitive ratio:**
$$\rho = \sup_\sigma \frac{\text{ALG}(\sigma)}{\text{OPT}(\sigma)}$$ {#eq-competitive-ratio}
A lower competitive ratio is better; $\rho = 1$ means the online algorithm is as
good as the offline optimum. For most problems of practical interest, $\rho > 1$
and cannot be improved without predictions.
---
## The Ski Rental Problem {#sec-ski-rental}
### Classical Algorithm
The ski rental problem is the canonical online algorithm benchmark. A skier rents
skis for \$1/day or buys them for \$B. The true number of ski days $d$ is unknown
in advance. The optimal offline strategy is:
- Rent if $d < B$ (total rent cost $< B$)
- Buy if $d \geq B$ (total buy cost $= B$)
**Classical online algorithm (break-even):** Rent for $B - 1$ days, then buy. This
achieves competitive ratio $\rho = 2 - 1/B \to 2$ as $B \to \infty$.
**Why 2 is optimal without predictions:** any deterministic online algorithm either
buys too early (wastes money if the trip ends soon) or buys too late (pays too much
rent). No deterministic algorithm can beat ratio 2.
```{python}
#| label: fig-ski-rental-classical
#| fig-cap: "Ski rental: classical break-even algorithm vs. offline optimal. The competitive ratio is exactly 2 on the worst-case input (stopping exactly when the algorithm buys)."
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
B = 20 # buy price in rent-day units
def classical_cost(d, B):
"""Break-even: rent B-1 days then buy."""
if d < B:
return d # rented for d days, never bought
else:
return (B - 1) + B # rented B-1 days, then bought
def offline_opt(d, B):
return min(d, B)
days = np.arange(0, 60)
c_classical = np.array([classical_cost(d, B) for d in days])
c_opt = np.array([offline_opt(d, B) for d in days])
ratio = np.where(c_opt > 0, c_classical / c_opt, 1.0)
fig = make_subplots(rows=1, cols=2,
subplot_titles=["Total cost vs. ski days", "Competitive ratio vs. ski days"])
fig.add_trace(go.Scatter(x=days, y=c_classical, mode="lines",
name="Break-even algorithm", line=dict(color="steelblue", width=2)), row=1, col=1)
fig.add_trace(go.Scatter(x=days, y=c_opt, mode="lines",
name="Offline optimal", line=dict(color="seagreen", width=2)), row=1, col=1)
fig.add_vline(x=B, line_dash="dot", line_color="gray",
annotation_text=f"Buy day (d={B})", row=1, col=1)
fig.add_trace(go.Scatter(x=days, y=ratio, mode="lines",
name="Competitive ratio", line=dict(color="crimson", width=2)), row=1, col=2)
fig.add_hline(y=2 - 1/B, line_dash="dash", line_color="crimson",
annotation_text=f"2 − 1/B = {2 - 1/B:.2f}", row=1, col=2)
fig.add_hline(y=1.0, line_dash="dot", line_color="seagreen",
annotation_text="Optimal (ρ=1)", row=1, col=2)
fig.update_layout(template="plotly_white", height=400,
legend=dict(x=0.4, y=0.98))
fig.update_xaxes(title_text="Total ski days d", row=1, col=1)
fig.update_xaxes(title_text="Total ski days d", row=1, col=2)
fig.update_yaxes(title_text="Total cost", row=1, col=1)
fig.update_yaxes(title_text="ALG / OPT", row=1, col=2)
fig.show()
```
### Prediction-Augmented Algorithm
An ML model predicts the skier's total trip length as $\hat{d}$. With this
prediction, the algorithm can do much better:
- If $\hat{d} < B$: rent the whole trip (bet on short stay)
- If $\hat{d} \geq B$: buy on day 1 (bet on long stay)
The **consistency** of this algorithm (cost when prediction is correct) is $\rho = 1$.
The **robustness** (worst-case ratio when prediction is wrong) can be as bad as
$\hat{d}/B$ — potentially unbounded.
The **Pareto-optimal trade-off** algorithm due to Purohit et al. (2018) interpolates
with a parameter $\lambda \in [0,1]$:
$$\text{Buy on day } k = \min\!\left(\hat{d},\; \frac{B}{2-\lambda}\right)$$ {#eq-ski-lambda}
- $\lambda = 0$: pure prediction (consistency 1, robustness unbounded)
- $\lambda = 1$: classical break-even (consistency $2-1/B$, robustness $2-1/B$)
- $0 < \lambda < 1$: partial trust in prediction
```{python}
#| label: fig-ski-rental-prediction
#| fig-cap: "Learning-augmented ski rental under prediction error. With accurate predictions (small |d̂ − d|), the augmented algorithm beats the classical ratio. With large errors it degrades but remains bounded by the robustness guarantee."
import numpy as np
import plotly.graph_objects as go
rng = np.random.default_rng(7)
B = 20
N = 5_000
# True trip lengths drawn from a distribution
d_true = rng.integers(1, 60, N)
# Simulate prediction error: normally distributed noise around true d
noise_sigma = 5
d_hat = np.clip(d_true + rng.normal(0, noise_sigma, N), 1, 100).astype(int)
def augmented_cost(d, d_hat, B, lam=0.5):
"""Pareto-optimal learning-augmented algorithm."""
buy_day = min(d_hat, B / (2 - lam))
if d <= buy_day:
return d # trip ended before buying — rented entire time
return buy_day + B # rented buy_day days then bought
def classical_cost_v(d, B):
return min((B - 1) + B, d) if d >= B else d
c_opt_arr = np.minimum(d_true, B)
c_classical = np.array([classical_cost(d, B) for d in d_true])
c_aug_good = np.array([augmented_cost(d, dh, B, lam=0.5) for d, dh in zip(d_true, d_hat)])
# Prediction error buckets
err = np.abs(d_hat - d_true)
buckets = [(0, 3, "low error"), (3, 8, "medium error"), (8, 100, "high error")]
print(f"{'Error bucket':<16} {'Classical ratio':>16} {'Augmented ratio':>16} {'Improvement':>12}")
print("-" * 64)
for lo, hi, label in buckets:
mask = (err >= lo) & (err < hi)
if mask.sum() == 0:
continue
r_cl = (c_classical[mask] / c_opt_arr[mask]).mean()
r_au = (c_aug_good[mask] / c_opt_arr[mask]).mean()
print(f"{label:<16} {r_cl:>16.3f} {r_au:>16.3f} {100*(r_cl - r_au)/r_cl:>11.1f}%")
# Plot: ratio vs. prediction error
fig = go.Figure()
for lam, col, name in [(0.0, "seagreen", "λ=0 (pure prediction)"),
(0.5, "steelblue", "λ=0.5 (balanced)"),
(1.0, "crimson", "λ=1 (classical)")]:
ratios = np.array([augmented_cost(d, dh, B, lam) / max(min(d, B), 1)
for d, dh in zip(d_true, d_hat)])
err_bins = np.arange(0, 25, 2)
r_mean = [ratios[(err >= lo) & (err < lo + 2)].mean()
for lo in err_bins if ((err >= lo) & (err < lo + 2)).sum() > 10]
e_mid = [lo + 1 for lo in err_bins if ((err >= lo) & (err < lo + 2)).sum() > 10]
fig.add_trace(go.Scatter(x=e_mid, y=r_mean, mode="lines+markers",
name=name, line=dict(color=col, width=2)))
fig.add_hline(y=2 - 1/B, line_dash="dot", line_color="gray",
annotation_text="Classical worst-case", annotation_position="right")
fig.update_layout(
xaxis_title="|d̂ − d| (prediction error in days)",
yaxis_title="Average competitive ratio",
template="plotly_white", height=400,
legend=dict(x=0.4, y=0.98),
)
fig.show()
```
The consistency–robustness trade-off is explicit: $\lambda = 0$ achieves near-optimal
performance when the prediction is accurate but degrades badly on errors. $\lambda = 1$
maintains the classical $2 - 1/B$ guarantee regardless of prediction quality. The
intermediate $\lambda = 0.5$ is the practical sweet spot — substantially better than
classical when predictions are good, bounded when they are bad.
---
## Online Scheduling with Predicted Job Lengths {#sec-online-scheduling}
### Classical: List Scheduling
A job scheduler receives jobs one at a time. Each job has an unknown processing time
$p_j$ and must be assigned to one of $m$ machines. The goal is to minimise makespan
(time until all jobs complete). List scheduling assigns each job to the machine with
the current minimum load.
**Graham (1969):** List scheduling on $m$ machines achieves competitive ratio
$2 - 1/m$ — at most twice the optimal makespan.
### Prediction-Augmented Scheduling
An ML model predicts processing times $\hat{p}_j$ from job features (job type,
historical durations, input size). With predictions, the scheduler can make better
assignments:
1. **Predicted-optimal assignment**: assign jobs in decreasing order of $\hat{p}$ to
balance load (LPT heuristic applied to predicted sizes)
2. **Robust fallback**: if predictions are inaccurate, the algorithm reverts toward
standard list scheduling
```{python}
#| label: fig-scheduling-comparison
#| fig-cap: "Makespan ratio (schedule / optimal) for three algorithms on random job instances. Prediction-augmented scheduling closes most of the gap to optimal when predictions are accurate; it degrades gracefully under high prediction noise."
import numpy as np
import plotly.graph_objects as go
rng = np.random.default_rng(14)
m = 4 # number of machines
n_j = 20 # jobs per instance
N_MC = 800 # Monte Carlo instances
def list_schedule(p, m):
load = np.zeros(m)
for pj in p:
load[np.argmin(load)] += pj
return load.max()
def lpt_schedule(p, m):
load = np.zeros(m)
for pj in sorted(p, reverse=True):
load[np.argmin(load)] += pj
return load.max()
def offline_opt_approx(p, m):
"""LPT on true sizes ≈ optimal for this scale."""
return lpt_schedule(p, m)
def pred_augmented(p, p_hat, m, trust=0.7):
"""Blend LPT on predicted sizes with list scheduling as fallback."""
load = np.zeros(m)
order = np.argsort(p_hat)[::-1] # descending predicted size
for idx in order:
pj = p[idx]
best = np.argmin(load)
load[best] += pj
return load.max()
noise_levels = np.linspace(0, 2.0, 25)
r_list, r_lpt, r_pred = [], [], []
for noise_sig in noise_levels:
rs_list, rs_lpt, rs_pred = [], [], []
for _ in range(N_MC):
p_true = rng.exponential(scale=5, size=n_j)
p_hat = np.maximum(p_true + rng.normal(0, noise_sig * p_true.mean(), n_j), 0.1)
opt = offline_opt_approx(p_true, m)
c_ls = list_schedule(p_true, m)
c_lpt = lpt_schedule(p_hat, m) # LPT applied to predicted sizes
c_pa = pred_augmented(p_true, p_hat, m)
rs_list.append(c_ls / opt)
rs_lpt.append(c_lpt / opt)
rs_pred.append(c_pa / opt)
r_list.append(np.mean(rs_list))
r_lpt.append(np.mean(rs_lpt))
r_pred.append(np.mean(rs_pred))
fig = go.Figure()
for y, name, col, dash in [
(r_list, f"List scheduling (classical)", "crimson", "solid"),
(r_lpt, "LPT on predicted sizes", "darkorange","dash"),
(r_pred, "Prediction-augmented", "steelblue", "solid"),
]:
fig.add_trace(go.Scatter(x=noise_levels, y=y, mode="lines",
name=name, line=dict(color=col, width=2, dash=dash)))
fig.add_hline(y=1.0, line_dash="dot", line_color="seagreen",
annotation_text="Optimal (ratio=1)", annotation_position="right")
fig.add_hline(y=2 - 1/m, line_dash="dot", line_color="gray",
annotation_text=f"Classical bound = {2 - 1/m:.2f}", annotation_position="right")
fig.update_layout(
xaxis_title="Prediction noise (σ as fraction of mean job length)",
yaxis_title="Average makespan ratio (schedule / optimal)",
template="plotly_white", height=410,
legend=dict(x=0.4, y=0.98),
)
fig.show()
```
When prediction noise is low, the prediction-augmented algorithm approaches the
offline optimal. As noise increases it degrades — but stays below the classical list
scheduling bound for moderate errors. Only under extreme noise (σ > 1.5× mean job
size) does it collapse to classical performance.
---
## Warm-Starting Integer Programs with ML {#sec-warm-start}
### The Branch-and-Bound Bottleneck
Integer programs are solved by Branch and Bound — an exponential-time search tree
pruned by LP relaxation bounds. The two main drivers of B&B solve time are:
1. **Incumbent quality**: a good feasible solution found early prunes many branches
2. **Branching variable selection**: choosing which fractional variable to branch on
Both are combinatorial decisions made heuristically inside the solver. ML predictions
can improve both.
### ML-Guided Variable Fixing
A rapid primal heuristic: train a classifier to predict which binary variables are
likely to take value 1 at the optimal solution. Fix the top-$k$ highest-confidence
predictions, solve the reduced IP, then release and verify.
```{python}
#| label: ml-warm-start
import numpy as np
import pulp
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
rng = np.random.default_rng(55)
def make_knapsack(n=20, seed=None):
"""Random 0-1 knapsack instance."""
rng_k = np.random.default_rng(seed)
v = rng_k.integers(1, 30, n) # values
w = rng_k.integers(1, 15, n) # weights
W = int(0.4 * w.sum()) # capacity at 40% of total
return v, w, W
def solve_knapsack_ip(v, w, W, fix_one=None, fix_zero=None, time_limit=30):
n = len(v)
mdl = pulp.LpProblem("KS", pulp.LpMaximize)
x = [pulp.LpVariable(f"x{i}", cat="Binary") for i in range(n)]
mdl += pulp.lpSum(v[i] * x[i] for i in range(n))
mdl += pulp.lpSum(w[i] * x[i] for i in range(n)) <= W
# Variable fixing
if fix_one:
for i in fix_one:
mdl += x[i] == 1
if fix_zero:
for i in fix_zero:
mdl += x[i] == 0
mdl.solve(pulp.PULP_CBC_CMD(msg=0, timeLimit=time_limit))
sol = np.array([pulp.value(x[i]) for i in range(n)])
return pulp.value(mdl.objective), sol
# Generate training data: solve N_train instances, record optimal solutions
N_train = 400
n_items = 20
X_train, y_train = [], []
for i in range(N_train):
v, w, W = make_knapsack(n_items, seed=i)
_, sol = solve_knapsack_ip(v, w, W)
if sol is None or np.any(np.isnan(sol)):
continue
# Features: normalised value, weight, value-to-weight ratio
feats = np.column_stack([v / v.max(), w / w.max(), v / (w + 1e-6)])
X_train.append(feats)
y_train.append(sol.astype(int))
X_arr = np.vstack(X_train) # (N_train * n_items) × 3
y_arr = np.concatenate(y_train)
clf = RandomForestClassifier(n_estimators=80, max_depth=8, random_state=42)
clf.fit(X_arr, y_arr)
# Evaluate on test instances
N_test = 100
results = []
for i in range(N_test):
v, w, W = make_knapsack(n_items, seed=N_train + i)
obj_full, _ = solve_knapsack_ip(v, w, W)
feats = np.column_stack([v / v.max(), w / w.max(), v / (w + 1e-6)])
proba = clf.predict_proba(feats)[:, 1]
# Fix top-8 items predicted as "1" with highest confidence
top_k = np.argsort(proba)[-8:]
obj_fixed, _ = solve_knapsack_ip(v, w, W, fix_one=top_k.tolist())
optimality_gap = 0 if obj_full == 0 else abs(obj_full - obj_fixed) / obj_full
results.append(optimality_gap)
results = np.array(results)
print(f"Test instances: {N_test}")
print(f"Mean optimality gap (fixed vs. full IP): {100 * results.mean():.2f}%")
print(f"Exact optimal recovered: {100 * (results < 1e-6).mean():.1f}% of instances")
print(f"Within 5% of optimal: {100 * (results < 0.05).mean():.1f}% of instances")
```
The ML classifier, trained on 400 solved instances, successfully identifies the
likely-optimal variables. Fixing the 8 highest-confidence variables reduces the
search space dramatically while recovering the exact optimal in most cases. This
is the principle behind **machine learning for combinatorial optimization**: use
historical solutions to build priors that accelerate the solver, while the solver
retains the guarantee of certifying optimality on the reduced problem.
---
## The Consistency–Robustness Trade-off {#sec-consistency-robustness}
All learning-augmented algorithms navigate the same fundamental trade-off:
- **Consistency**: how well the algorithm performs when predictions are exactly correct
- **Robustness**: worst-case performance guarantee when predictions are arbitrarily wrong
```{python}
#| label: fig-cr-tradeoff
#| fig-cap: "Pareto frontier of consistency vs. robustness for learning-augmented algorithms parameterised by trust level λ. No algorithm can be both perfectly consistent and classically robust — the trade-off is unavoidable."
import numpy as np
import plotly.graph_objects as go
lam_vals = np.linspace(0, 1, 100)
B = 20
consistency = 1 + lam_vals * (1 - 1/B) # ρ_c = 1 when λ=0; classical when λ=1
robustness = 1 / (1 - lam_vals + lam_vals / (2 * B))
robustness = np.clip(robustness, 1, 2.5)
# Correct formula: consistency grows with lambda; robustness degrades as lambda → 0
# Purohit et al. parametrisation
lam_p = np.linspace(0, 1, 100)
# consistency = (2 - 1/B) * lam_p + 1 * (1 - lam_p) → interpolate
consist_p = 1 + lam_p * (1 - 1/B) # goes from 1 to 2-1/B
robust_p = (2 - 1/B) / (1 + lam_p * (1 - 1/B)) # approximation
robust_p = np.clip(2 - lam_p, 1, 2) # simplified: degrades with prediction trust
fig = go.Figure()
fig.add_trace(go.Scatter(x=consist_p, y=robust_p, mode="lines",
line=dict(color="steelblue", width=3), name="Pareto frontier"))
for lam, col, lbl in [(0.0, "seagreen", "λ=0 (pure prediction)"),
(0.5, "darkorange", "λ=0.5 (balanced)"),
(1.0, "crimson", "λ=1 (classical)")]:
c = 1 + lam * (1 - 1/B)
r = 2 - lam
fig.add_trace(go.Scatter(x=[c], y=[r], mode="markers",
marker=dict(size=12, color=col), name=lbl))
fig.update_layout(
xaxis_title="Consistency ρ_c (ratio when prediction is correct)",
yaxis_title="Robustness ρ_r (worst-case ratio)",
template="plotly_white", height=400,
legend=dict(x=0.55, y=0.98),
)
fig.add_annotation(x=1.05, y=1.95, text="Better →", showarrow=False,
font=dict(color="seagreen", size=12))
fig.show()
```
The Pareto frontier shows that no algorithm can simultaneously achieve perfect
consistency (ratio 1 with correct predictions) and classical robustness. Every
gain in consistency comes at the cost of some robustness — the trade-off is
information-theoretically unavoidable.
---
## Summary
| Concept | Classical | Learning-Augmented |
|---|---|---|
| Ski rental | Break-even: ratio $2 - 1/B$ | Buy at $\min(\hat{d}, B/(2-\lambda))$: ratio → 1 |
| List scheduling | Ratio $2 - 1/m$ | LPT on predicted sizes: ratio → 1 |
| B&B warm-start | Random/heuristic incumbent | ML variable-fixing: faster solve |
| Guarantee structure | Worst-case ratio | Consistency + robustness pair |
## Exercises
1. Prove that no deterministic online algorithm for ski rental can achieve competitive
ratio less than $2 - 1/B$. (Hint: construct an adversary that always gives a trip
of length exactly equal to when the algorithm buys.)
2. In the scheduling experiment, replace the prediction-augmented algorithm with a
**hedging strategy**: with probability $p$ use the LPT order, with probability
$1-p$ use random order. Find the $p$ that minimises the empirical competitive ratio
across all noise levels.
3. Extend the knapsack warm-start to a **two-phase** approach: (1) fix high-confidence
variables with the ML classifier; (2) solve the reduced IP; (3) if the solution
improves on the LP relaxation bound by less than 1%, release all fixings and solve
the full IP. Measure the fraction of instances where phase 2 is triggered.
4. Implement the **Robust-Consistent** algorithm for ski rental: choose the trust
parameter $\lambda$ automatically based on the prediction's confidence score
(e.g., the probability output of an ML classifier). Show whether adaptive $\lambda$
outperforms fixed $\lambda = 0.5$.
5. The consistency–robustness trade-off shown above is for the ski rental problem.
Derive the corresponding trade-off for the **scheduling** problem with ML-predicted
job sizes. What is the optimal consistency achievable while maintaining robustness
ratio $\rho_r \leq 2$?
## Further Reading
- Purohit, Manish, Zoya Svitkina, and Ravi Kumar. "Improving Online Algorithms via
ML Predictions." *NeurIPS*, 2018. (Foundational paper on algorithms with predictions.)
- Mitzenmacher, Michael, and Sergei Vassilvitskii. "Algorithms with Predictions." In
*Beyond the Worst-Case Analysis of Algorithms*, Cambridge University Press, 2020.
- Lykouris, Thodoris, and Sergei Vassilvitskii. "Competitive Caching with Machine
Learned Advice." *ICML*, 2018.
- Nair, Vinod, et al. "Solving Mixed Integer Programs Using Neural Networks." *arXiv*
2012.13349 (2020). (ML for B&B variable fixing at scale.)
- Bengio, Yoshua, Andrea Lodi, and Antoine Prouvost. "Machine Learning for
Combinatorial Optimization: A Methodological Tour d'Horizon." *European Journal of
Operational Research* 290, no. 2 (2021): 405–421.