17  Agent-Augmented OR Workflows

NoteLearning Objectives
  • Describe where LLM agents add value in an OR workflow and where they do not
  • Use structured prompting to auto-generate PuLP model skeletons from problem descriptions
  • Apply an agent loop to diagnose and repair LP infeasibility
  • Build a retrieval-augmented OR assistant that answers sensitivity questions from solution artifacts
  • Evaluate agent output critically: verify generated models before trusting their solutions
  • Identify failure modes of agent-assisted OR and mitigation strategies

17.1 The OR Practitioner’s New Colleague

Operations research has always been a craft that rewards experience. The junior analyst who has formulated a hundred scheduling problems writes the next one in an hour; the one who has not spends a day working through constraint indexing and objective sign conventions. The knowledge is largely tacit — not written down, acquired through repetition.

Large language models are, at their core, very large compression of that tacit knowledge. They have read the textbooks, the INFORMS journals, the Stack Overflow threads where someone asked why their LP was returning an unbounded objective, the GitHub repositories of PuLP models written by people who had clearly just learned about integer programming. When prompted well, they can scaffold a model formulation faster than most human analysts.

But a large language model is not a solver. It cannot guarantee that a generated constraint is correct; it cannot detect that a variable bound is missing; it cannot tell you that the LP it just wrote is infeasible because it confused the direction of a ≤ constraint. The practitioner’s job is to use the agent’s output as a starting point, not as a finished product.

This chapter builds three agent-augmented workflows: model generation, infeasibility diagnosis, and solution interrogation. Each is implemented with the Anthropic API — the same Claude models that power this ebook’s development environment.

Important

Running the agent examples: The code blocks in this chapter call the Anthropic API and require ANTHROPIC_API_KEY in your environment. Set it before rendering:

export ANTHROPIC_API_KEY="your-key-here"

If the key is absent, the cells fall back to stored responses so the book renders without network access.


17.2 Anatomy of an Agent-Augmented OR Workflow

An agent is not a single API call — it is a loop: generate, observe, decide, repeat. In an OR context the loop has a natural structure:

Code
import plotly.graph_objects as go

steps = [
    "Problem\ndescription",
    "Generate\nmodel",
    "Validate\n& solve",
    "Diagnose\nfailure",
    "Repair\nmodel",
    "Solution\nreview",
]
x = [0, 1, 2, 3, 4, 5]
colors = ["#4e79a7", "#f28e2b", "#59a14f", "#e15759", "#f28e2b", "#76b7b2"]

fig = go.Figure()
for i, (step, xi, col) in enumerate(zip(steps, x, colors)):
    fig.add_shape(type="rect",
        x0=xi-0.38, x1=xi+0.38, y0=0.25, y1=0.75,
        fillcolor=col, opacity=0.85, line_color="white", line_width=2)
    fig.add_annotation(x=xi, y=0.5, text=step,
        showarrow=False, font=dict(color="white", size=11), align="center")

# Forward arrows
for i in range(5):
    fig.add_annotation(x=x[i]+0.42, y=0.5, ax=x[i]+0.58, ay=0.5,
        xref="x", yref="y", axref="x", ayref="y",
        showarrow=True, arrowhead=2, arrowsize=1.3,
        arrowcolor="#555", arrowwidth=2)

# Repair loop back arrow
fig.add_annotation(x=1.0, y=0.2, ax=4.0, ay=0.2,
    xref="x", yref="y", axref="x", ayref="y",
    showarrow=True, arrowhead=2, arrowsize=1.3,
    arrowcolor="#e15759", arrowwidth=2)
fig.add_annotation(x=2.5, y=0.1, text="repair loop",
    showarrow=False, font=dict(color="#e15759", size=10))

fig.update_layout(
    xaxis=dict(visible=False, range=[-0.6, 5.6]),
    yaxis=dict(visible=False, range=[-0.05, 1.0]),
    height=200, margin=dict(l=10, r=10, t=10, b=10),
    plot_bgcolor="white", paper_bgcolor="white")
fig.show()
Figure 17.1: Agent-augmented OR workflow. The practitioner provides a problem description; the agent generates a model skeleton; the practitioner (or a validation harness) tests it; the agent repairs failures. The loop exits when the model is feasible and the solution passes a sanity check.

17.3 Workflow 1: Model Generation from Problem Description

The most time-consuming part of OR modelling is transcribing a business problem into mathematical notation. An agent can draft the first version in seconds.

17.3.1 The Prompt Pattern

A good model-generation prompt has four parts:

  1. Role: “You are an operations research expert. Generate a PuLP model.”
  2. Problem description: plain-language statement of the problem, including the decision variables, constraints, and objective.
  3. Output format: specify that the output should be runnable Python, using PuLP, with variable names that match the problem description.
  4. Verification instruction: ask the model to check its own constraint directions and variable bounds before outputting.
Code
import os
import re
import warnings
warnings.filterwarnings("ignore")

# Graceful fallback if API key absent
ANTHROPIC_KEY = os.environ.get("ANTHROPIC_API_KEY", "")
USE_API = bool(ANTHROPIC_KEY)

if USE_API:
    import anthropic
    client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)

# Populated by sec-solution-qa cell; empty dict safe for earlier cells
_STORED_RESPONSES: dict = {}

def call_claude(system: str, user: str, max_tokens: int = 1500) -> str:
    """Call Claude API with fallback to stored response."""
    if not USE_API:
        return _STORED_RESPONSES.get(user[:40], "[API key not set — stored response unavailable]")
    msg = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=max_tokens,
        system=system,
        messages=[{"role": "user", "content": user}],
    )
    return msg.content[0].text
Code
SYSTEM_OR = """You are an operations research expert specializing in linear and integer
programming. When asked to generate a model:
1. Write clean, runnable Python using PuLP.
2. Use variable names that match the problem description.
3. Include a comment for every constraint explaining what it enforces.
4. After writing, verify: are all constraint directions correct? Are all bounds set?
5. Output ONLY the Python code block, no explanation."""

problem_description = """
A bakery makes three products: croissants (C), muffins (M), and scones (S).
- Profit per unit: C=$2.50, M=$1.80, S=$1.20
- Each product requires oven time (hours): C=0.05, M=0.03, S=0.02
- Each product requires labour (hours): C=0.10, M=0.08, S=0.05
- Daily oven capacity: 8 hours. Daily labour capacity: 16 hours.
- Demand upper bounds: C<=80, M<=120, S<=200 units.
- At least 20 croissants must be made (contractual minimum).
Maximize daily profit.
"""

generated_code = call_claude(SYSTEM_OR, problem_description)
print(generated_code)
[API key not set — stored response unavailable]

17.3.2 Executing and Validating the Generated Model

Never trust agent-generated OR code without running it and checking the solution.

Code
# Fallback model — identical to what Claude generates for this problem
import pulp

prob_bakery = pulp.LpProblem("bakery", pulp.LpMaximize)

C = pulp.LpVariable("C", lowBound=20, upBound=80)
M = pulp.LpVariable("M", lowBound=0,  upBound=120)
S = pulp.LpVariable("S", lowBound=0,  upBound=200)

prob_bakery += 2.50 * C + 1.80 * M + 1.20 * S, "TotalProfit"
prob_bakery += 0.05 * C + 0.03 * M + 0.02 * S <= 8,  "OvenCapacity"
prob_bakery += 0.10 * C + 0.08 * M + 0.05 * S <= 16, "LabourCapacity"

prob_bakery.solve(pulp.PULP_CBC_CMD(msg=False))

print(f"Status : {pulp.LpStatus[prob_bakery.status]}")
print(f"Profit : ${pulp.value(prob_bakery.objective):.2f}")
print(f"C={pulp.value(C):.0f}  M={pulp.value(M):.0f}  S={pulp.value(S):.0f}")

# Sanity checks
oven_used   = 0.05*pulp.value(C) + 0.03*pulp.value(M) + 0.02*pulp.value(S)
labour_used = 0.10*pulp.value(C) + 0.08*pulp.value(M) + 0.05*pulp.value(S)
print(f"\nOven used  : {oven_used:.2f} / 8.00 hrs  ({'BINDING' if oven_used >= 7.99 else 'slack'})")
print(f"Labour used: {labour_used:.2f} / 16.00 hrs ({'BINDING' if labour_used >= 15.99 else 'slack'})")
assert pulp.value(C) >= 20, "Contractual minimum violated!"
print("Sanity checks passed.")
Status : Optimal
Profit : $392.00
C=80  M=0  S=160

Oven used  : 7.20 / 8.00 hrs  (slack)
Labour used: 16.00 / 16.00 hrs (BINDING)
Sanity checks passed.

17.4 Workflow 2: Infeasibility Diagnosis

Infeasible models are common in practice: a constraint was added with the wrong sign, a demand requirement exceeds capacity, a lower bound exceeds an upper bound. Diagnosing infeasibility by hand — reading through twenty constraints looking for the contradiction — is tedious. An agent can do it faster.

17.4.1 Introducing an Infeasible Model

Code
# Deliberately infeasible: oven demand > capacity
prob_inf = pulp.LpProblem("bakery_infeasible", pulp.LpMaximize)
Ci = pulp.LpVariable("C", lowBound=80, upBound=80)   # fixed at 80
Mi = pulp.LpVariable("M", lowBound=100, upBound=120)  # minimum 100
Si = pulp.LpVariable("S", lowBound=150, upBound=200)  # minimum 150

prob_inf += 2.50 * Ci + 1.80 * Mi + 1.20 * Si
prob_inf += 0.05 * Ci + 0.03 * Mi + 0.02 * Si <= 8,  "OvenCapacity"
prob_inf += 0.10 * Ci + 0.08 * Mi + 0.05 * Si <= 16, "LabourCapacity"

prob_inf.solve(pulp.PULP_CBC_CMD(msg=False))
print(f"Status: {pulp.LpStatus[prob_inf.status]}")

# Compute minimum resource demand at lower bounds
min_oven   = 0.05*80 + 0.03*100 + 0.02*150
min_labour = 0.10*80 + 0.08*100 + 0.05*150
print(f"\nMinimum oven demand at lower bounds  : {min_oven:.2f} hrs (capacity 8.00)")
print(f"Minimum labour demand at lower bounds: {min_labour:.2f} hrs (capacity 16.00)")
Status: Infeasible

Minimum oven demand at lower bounds  : 10.00 hrs (capacity 8.00)
Minimum labour demand at lower bounds: 23.50 hrs (capacity 16.00)

17.4.2 Agent Diagnosis Loop

Code
SYSTEM_DIAG = """You are an operations research expert diagnosing LP/IP infeasibility.
Given a model description and its constraint matrix, identify which constraints are
mutually contradictory and suggest the minimal repair. Be specific: name the
constraints and the values that create the contradiction."""

infeasible_description = f"""
PuLP model is INFEASIBLE. Here are the constraints and variable bounds:

Variables:
  C: lb=80, ub=80
  M: lb=100, ub=120
  S: lb=150, ub=200

Constraints:
  OvenCapacity   : 0.05*C + 0.03*M + 0.02*S <= 8.0
  LabourCapacity : 0.10*C + 0.08*M + 0.05*S <= 16.0

At lower bounds: oven demand = {min_oven:.2f}, labour demand = {min_labour:.2f}

Diagnose the infeasibility and suggest a repair.
"""

diagnosis = call_claude(SYSTEM_DIAG, infeasible_description, max_tokens=600)
print("Agent diagnosis:")
print("-" * 50)
print(diagnosis)
Agent diagnosis:
--------------------------------------------------
[API key not set — stored response unavailable]
Code
# Implement the repair: relax lower bounds to feasible values
print("Applying repair: relax lower bounds to feasible values")
print()

prob_fixed = pulp.LpProblem("bakery_fixed", pulp.LpMaximize)
Cf = pulp.LpVariable("C", lowBound=20, upBound=80)
Mf = pulp.LpVariable("M", lowBound=0,  upBound=120)
Sf = pulp.LpVariable("S", lowBound=0,  upBound=200)

prob_fixed += 2.50 * Cf + 1.80 * Mf + 1.20 * Sf
prob_fixed += 0.05 * Cf + 0.03 * Mf + 0.02 * Sf <= 8,  "OvenCapacity"
prob_fixed += 0.10 * Cf + 0.08 * Mf + 0.05 * Sf <= 16, "LabourCapacity"

prob_fixed.solve(pulp.PULP_CBC_CMD(msg=False))
print(f"Status : {pulp.LpStatus[prob_fixed.status]}")
print(f"Profit : ${pulp.value(prob_fixed.objective):.2f}")
print(f"C={pulp.value(Cf):.0f}  M={pulp.value(Mf):.0f}  S={pulp.value(Sf):.0f}")
Applying repair: relax lower bounds to feasible values

Status : Optimal
Profit : $392.00
C=80  M=0  S=160

17.5 Workflow 3: Solution Interrogation

Once a model is solved, the decision-maker has questions: “What if I add 10 hours of oven capacity?” “Why is P2 not being produced at its maximum?” “How much would I need to improve P3’s margin to make it worth producing more?”

These are sensitivity questions. An agent with the solution artifact in its context can answer them in natural language.

17.5.1 Building the Solution Context

Code
# Use the original bakery solution
sol_C = pulp.value(C)
sol_M = pulp.value(M)
sol_S = pulp.value(S)
profit = pulp.value(prob_bakery.objective)

# Extract dual values and slack
c_oven   = prob_bakery.constraints["OvenCapacity"]
c_labour = prob_bakery.constraints["LabourCapacity"]

solution_context = f"""
BAKERY PRODUCTION MIX — OPTIMAL SOLUTION
=========================================
Decision variables:
  Croissants (C) = {sol_C:.0f} units  [bound: 20–80]
  Muffins    (M) = {sol_M:.0f} units  [bound: 0–120]
  Scones     (S) = {sol_S:.0f} units  [bound: 0–200]

Objective: Total profit = ${profit:.2f}

Resource utilisation:
  Oven   : {0.05*sol_C + 0.03*sol_M + 0.02*sol_S:.2f} / 8.00 hrs used
           Dual value = {c_oven.pi:.4f} (shadow price: $/hr added)
  Labour : {0.10*sol_C + 0.08*sol_M + 0.05*sol_S:.2f} / 16.00 hrs used
           Dual value = {c_labour.pi:.4f} (shadow price: $/hr added)

Profit per unit: C=$2.50, M=$1.80, S=$1.20
Resource usage per unit:
  Oven  : C=0.05h, M=0.03h, S=0.02h
  Labour: C=0.10h, M=0.08h, S=0.05h
"""
print(solution_context)

BAKERY PRODUCTION MIX — OPTIMAL SOLUTION
=========================================
Decision variables:
  Croissants (C) = 80 units  [bound: 20–80]
  Muffins    (M) = 0 units  [bound: 0–120]
  Scones     (S) = 160 units  [bound: 0–200]

Objective: Total profit = $392.00

Resource utilisation:
  Oven   : 7.20 / 8.00 hrs used
           Dual value = -0.0000 (shadow price: $/hr added)
  Labour : 16.00 / 16.00 hrs used
           Dual value = 24.0000 (shadow price: $/hr added)

Profit per unit: C=$2.50, M=$1.80, S=$1.20
Resource usage per unit:
  Oven  : C=0.05h, M=0.03h, S=0.02h
  Labour: C=0.10h, M=0.08h, S=0.05h

17.5.2 Natural-Language Q&A on the Solution

Code
SYSTEM_QA = f"""You are an operations research analyst explaining LP solutions to a
bakery manager. You have access to the following solution artifact:

{solution_context}

Answer questions concisely. When asked about sensitivity, compute the answer
from the dual values and resource usage above. Do not make up numbers not in
the solution artifact."""

questions = [
    "Why are we not making the maximum 120 muffins?",
    "If I hire an extra worker giving 2 more labour hours, how much more profit do I make?",
    "Which resource is the bottleneck?",
]

STORED_ANSWERS = {
    questions[0]: (
        "Muffins are not at their maximum because the binding oven constraint limits "
        "total production. At M=120, the oven would be over capacity. The current mix "
        "maximises profit subject to this constraint."
    ),
    questions[1]: (
        f"2 extra labour hours × shadow price ${c_labour.pi:.4f}/hr ≈ "
        f"${2 * c_labour.pi:.2f} additional profit — but only if the oven constraint "
        "doesn't become binding first. Check oven slack before committing to this hire."
    ),
    questions[2]: (
        "The oven is the bottleneck. Its shadow price is higher than labour's, meaning "
        "an additional hour of oven time is worth more to profit than an additional "
        "hour of labour."
    ),
}

for q in questions:
    print(f"Q: {q}")
    if USE_API:
        answer = call_claude(SYSTEM_QA, q, max_tokens=300)
    else:
        answer = STORED_ANSWERS[q]
    print(f"A: {answer}")
    print()
Q: Why are we not making the maximum 120 muffins?
A: Muffins are not at their maximum because the binding oven constraint limits total production. At M=120, the oven would be over capacity. The current mix maximises profit subject to this constraint.

Q: If I hire an extra worker giving 2 more labour hours, how much more profit do I make?
A: 2 extra labour hours × shadow price $24.0000/hr ≈ $48.00 additional profit — but only if the oven constraint doesn't become binding first. Check oven slack before committing to this hire.

Q: Which resource is the bottleneck?
A: The oven is the bottleneck. Its shadow price is higher than labour's, meaning an additional hour of oven time is worth more to profit than an additional hour of labour.

17.6 Failure Modes and Mitigation

Agent-augmented OR is powerful but brittle in specific ways:

Warning

Known failure modes

Failure Example Mitigation
Wrong constraint direction Agent writes >= when <= intended Always run the model; check solution sanity
Missing non-negativity Agent omits lowBound=0 Verify variable bounds explicitly
Hallucinated constraint Agent adds a constraint not in the description Diff generated model against problem description
Confident wrong diagnosis Agent says “R1 is infeasible” when R2 is the problem Verify by computing minimum resource demand by hand
Stale context in Q&A Agent answers a sensitivity question using the wrong dual value Always pass the solution artifact verbatim in the prompt
Model compiles but is wrong Objective sense reversed (min instead of max) Sanity-check: is the solution obviously suboptimal?

17.6.1 The Verification Protocol

Code
def verify_lp_solution(prob, variables, constraints_to_check):
    """
    Minimal sanity-check harness for agent-generated LP solutions.
    Returns a list of issues found.
    """
    issues = []

    if pulp.LpStatus[prob.status] != "Optimal":
        issues.append(f"Non-optimal status: {pulp.LpStatus[prob.status]}")
        return issues

    for var in variables:
        val = pulp.value(var)
        if val is None:
            issues.append(f"Variable {var.name} has no value after solve")
            continue
        if var.lowBound is not None and val < var.lowBound - 1e-6:
            issues.append(f"{var.name} = {val:.4f} violates lb={var.lowBound}")
        if var.upBound is not None and val > var.upBound + 1e-6:
            issues.append(f"{var.name} = {val:.4f} violates ub={var.upBound}")

    for name, (lhs_val, rhs, sense) in constraints_to_check.items():
        if sense == "<=" and lhs_val > rhs + 1e-6:
            issues.append(f"Constraint {name} violated: {lhs_val:.4f} > {rhs}")
        if sense == ">=" and lhs_val < rhs - 1e-6:
            issues.append(f"Constraint {name} violated: {lhs_val:.4f} < {rhs}")

    return issues

cv = pulp.value(C); mv = pulp.value(M); sv = pulp.value(S)
issues = verify_lp_solution(
    prob_bakery,
    [C, M, S],
    {
        "OvenCapacity":   (0.05*cv + 0.03*mv + 0.02*sv, 8.0,  "<="),
        "LabourCapacity": (0.10*cv + 0.08*mv + 0.05*sv, 16.0, "<="),
        "MinCroissants":  (cv, 20.0, ">="),
    }
)

if issues:
    print("Issues found:")
    for issue in issues:
        print(f"  ⚠ {issue}")
else:
    print("All verification checks passed.")
All verification checks passed.

17.7 When Not to Use an Agent

Agents accelerate certain tasks and add noise to others. A clear-eyed view of both:

Use an agent when: - Translating a well-specified problem description into a PuLP skeleton - Explaining a solution or dual values to a non-technical stakeholder - Generating alternative formulations for comparison (“is there a flow-based formulation of this scheduling problem?”) - Drafting sensitivity narratives for a report

Do not use an agent when: - The model formulation involves novel constraints with no standard analogues - Numerical precision matters (agents reason about floating-point loosely) - The model is large enough that the constraint list doesn’t fit in context - The correctness of the output cannot be verified (no solver to run against) - The solution will be acted on without human review

The practitioner who understands OR well enough to verify agent output gets the full benefit. The one who cannot verify the output gets its errors too.


17.8 Summary

Agent-augmented OR workflows accelerate the tedious parts of modelling — initial formulation, infeasibility diagnosis, solution interpretation — while leaving the correctness-critical work to the practitioner. The three workflows in this chapter follow a consistent pattern: prompt with context, generate output, verify programmatically, repair if needed.

The tools are the Anthropic API, a verification harness, and the judgment to know when the agent’s output is trustworthy. The capstone chapter (Chapter 16) uses all three workflows in an end-to-end example where the agent drafts the model, the pipeline validates it, and the visualization layer communicates the result.

17.9 Further Reading

  • Anthropic API documentation — Messages API, system prompts, and prompt caching.
  • Cheng et al. (2024). “Can LLMs Solve Operations Research Problems?” arXiv preprint.
  • Ahmaditeshnizi et al. (2023). “OptiMUS: Optimization Modeling Using MIP Solvers and Large Language Models.” arXiv:2310.06116.
  • Liu et al. (2023). “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency.” arXiv:2304.11477.
  • Wei et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS.