8 The Conditions
- Derive the Karush-Kuhn-Tucker conditions for constrained nonlinear optimization
- Understand primal feasibility, dual feasibility, and complementary slackness
- Appreciate Karush’s overlooked 1939 master’s thesis
- Connect KKT theory to modern NLP solvers
8.1 Three Names, One Theorem
The Karush-Kuhn-Tucker conditions — the fundamental optimality conditions for constrained nonlinear programming — have a complicated naming history that reflects the sociology of mid-twentieth-century mathematics.
William Karush derived the conditions in his master’s thesis at the University of Chicago in 1939. The thesis was not published in a journal. It was not widely circulated. Karush went on to a career in applied mathematics at RAND and System Development Corporation, and the thesis remained obscure.
Harold Kuhn and Albert Tucker, working at Princeton, independently derived the same conditions in 1950 and presented them at the Second Berkeley Symposium on Mathematical Statistics and Probability in 1951. Their paper was published in the conference proceedings. It became one of the most influential papers in the history of applied mathematics.
For decades, the conditions were called the Kuhn-Tucker conditions. The historical correction — acknowledging Karush’s priority — was slow to propagate. The triple name “KKT” became standard in the 1990s, fifty years after Karush’s original derivation. Karush died in 1997, having lived long enough to see his contribution acknowledged, if not quite in the form he might have hoped.
8.2 The KKT Conditions
Consider the nonlinear program: \[\min_\mathbf{x} f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0,\; i=1,\ldots,m;\quad h_j(\mathbf{x}) = 0,\; j=1,\ldots,p\]
Under regularity conditions (constraint qualification), necessary conditions for a local minimum \(\mathbf{x}^*\) are the existence of multipliers \(\boldsymbol{\mu} \geq \mathbf{0}\) and \(\boldsymbol{\lambda}\) such that:
Stationarity: \[\nabla f(\mathbf{x}^*) + \sum_{i=1}^m \mu_i \nabla g_i(\mathbf{x}^*) + \sum_{j=1}^p \lambda_j \nabla h_j(\mathbf{x}^*) = \mathbf{0}\]
Primal feasibility: \[g_i(\mathbf{x}^*) \leq 0, \quad h_j(\mathbf{x}^*) = 0\]
Dual feasibility: \[\mu_i \geq 0 \quad \forall i\]
Complementary slackness: \[\mu_i g_i(\mathbf{x}^*) = 0 \quad \forall i\]
The last condition is the key one. Either the constraint is active (\(g_i = 0\)) and the multiplier can be positive, or the constraint is inactive (\(g_i < 0\)) and the multiplier must be zero. The constraint and its multiplier cannot both be positive.
Show Python
import numpy as np
from scipy.optimize import minimize
# NLP: min x1^2 + x2^2 - x1*x2
# s.t. x1 + x2 >= 1 (=> -x1 - x2 + 1 <= 0)
# x2 >= 0.2 (=> -x2 + 0.2 <= 0)
def f(x): return x[0]**2 + x[1]**2 - x[0]*x[1]
def g1(x): return x[0] + x[1] - 1 # >= 0
def g2(x): return x[1] - 0.2 # >= 0
constraints = [
{'type': 'ineq', 'fun': g1},
{'type': 'ineq', 'fun': g2},
]
res = minimize(f, [0.5, 0.5], constraints=constraints, method='SLSQP')
x_opt = res.x
# Compute Lagrangian gradient numerically to verify KKT
eps = 1e-6
grad_f = np.array([(f(x_opt+[eps,0])-f(x_opt))/eps,
(f(x_opt+[0,eps])-f(x_opt))/eps])
grad_g1 = np.array([1., 1.])
grad_g2 = np.array([0., 1.])
# Shadow prices (multipliers) from SLSQP
mu1 = -res.jac @ np.linalg.pinv(np.vstack([grad_g1, grad_g2])) if False else None
print(f"Optimal x: ({x_opt[0]:.4f}, {x_opt[1]:.4f})")
print(f"f(x*) = {f(x_opt):.4f}")
print(f"g1(x*) = {g1(x_opt):.4f} (active if ~0)")
print(f"g2(x*) = {g2(x_opt):.4f} (active if ~0)")
print(f"Constraint 1 active: {abs(g1(x_opt)) < 1e-4}")
print(f"Constraint 2 active: {abs(g2(x_opt)) < 1e-4}")Optimal x: (0.5000, 0.5000)
f(x*) = 0.2500
g1(x*) = 0.0000 (active if ~0)
g2(x*) = 0.3000 (active if ~0)
Constraint 1 active: True
Constraint 2 active: False
8.3 What the Conditions Actually Say
The stationarity condition says the gradient of the objective, at the optimum, must be a nonnegative linear combination of the active constraint gradients. Geometrically: the objective cannot point into the feasible set at the optimum — if it did, you could improve the objective by moving slightly into the feasible set, contradicting optimality.
Complementary slackness says an inactive constraint does not constrain the solution. Its multiplier is zero because relaxing it — pushing the boundary slightly further away — would not change the optimum. An active constraint’s multiplier measures how much the objective would improve if the constraint boundary were relaxed.
Together, these conditions characterize the constrained optimum just as \(f'(x) = 0\) characterizes the unconstrained optimum. They are necessary conditions under regularity assumptions (constraint qualifications ensure the multipliers exist). They are also sufficient conditions when the objective is convex and the feasible set is convex.
8.4 Constraint Qualifications
The KKT conditions hold under regularity conditions called constraint qualifications. The most common is linear independence constraint qualification (LICQ): the gradients of the active constraints at the solution are linearly independent. When this fails — when active constraints are redundant or degenerate — multipliers may not be unique or may not exist.
Constraint qualification failures are a common source of solver trouble in practice. A pressure constraint and a temperature constraint that are always simultaneously active because they share the same physical mechanism may produce a numerically rank-deficient constraint Jacobian, causing interior point solvers to report “restoration phase failure” and SQP solvers to report abnormal terminations. The fix is usually to identify and remove the redundant constraint.
8.5 The Road to Sufficiency: Convexity
The KKT conditions are necessary for any smooth NLP under regularity. For convex programs — convex objective and convex feasible set — they are also sufficient. Any KKT point is a global minimum.
This sufficiency result makes convex programming a qualitatively different field from general nonlinear programming. Gradient-based methods converge to global optima, not just local ones. The duality gap — the difference between the primal optimum and the dual optimum — is zero. The multipliers have a clean economic interpretation. Many structural problems in engineering (least-norm solutions, portfolio optimization, certain geometric design problems) are convex programs, and recognizing this structure is a significant practical advantage.
8.6 Summary
The Karush-Kuhn-Tucker conditions provide necessary (and for convex problems, sufficient) conditions for constrained optima. Stationarity requires the gradient of the objective to be a nonnegative combination of active constraint gradients. Dual feasibility requires multipliers on inequality constraints to be nonnegative. Complementary slackness requires that inactive constraints have zero multipliers. The conditions were derived by Karush in 1939 (unpublished) and Kuhn and Tucker in 1951 (published), and are the foundation of every constrained optimization algorithm in use today.
8.7 Further Reading
Kuhn and Tucker (1951) is the 1951 paper. Karush’s original 1939 thesis is reproduced in Kjeldsen’s “A Contextualized Historical Analysis of the Kuhn-Tucker Theorem in Nonlinear Programming” (Historia Mathematica, 27(4):331–361, 2000), which also documents the history of the naming controversy. Nocedal and Wright (2006) Chapter 12 provides a thorough modern treatment of KKT theory and constraint qualifications.