7.1 Lagrangian duality
The Lagrangian ¶ Let us consider the general constrained optimization problem
minimize f 0 ( x ) subject to f i ( x ) ≤ 0 , i = 1 , … , m h i ( x ) = 0 , i = 1 , … , p \begin{split}
\minimize&\quad f_0(x)\\
\text{subject to}&\quad f_i(x) \leq 0, \quad i = 1, \ldots, m\\
&\quad h_i(x) = 0, \quad i = 1, \ldots, p
\end{split} minimize subject to f 0 ( x ) f i ( x ) ≤ 0 , i = 1 , … , m h i ( x ) = 0 , i = 1 , … , p The Lagrangian of the problem (1) is defined as the function L : D × R m × R p → R L: \mathcal{D}\times \mathbb{R}^m\times \mathbb{R}^p \to \mathbb{R} L : D × R m × R p → R such that
L ( x , λ , ν ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) + ∑ i = 1 p ν i h i ( x ) L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p\nu_i h_i(x) L ( x , λ , ν ) = f 0 ( x ) + i = 1 ∑ m λ i f i ( x ) + i = 1 ∑ p ν i h i ( x ) where λ ∈ R m \lambda \in \mathbb{R}^m λ ∈ R m and ν ∈ R P \nu \in \mathbb{R}^P ν ∈ R P are Lagrange multiplier vectors.
Comments
λ i \lambda_i λ i is the Lagrange multiplier associated to the i − i- i − th inequality constraint f i ( x ) ≤ 0 f_i(x) \leq 0 f i ( x ) ≤ 0 ;
ν i \nu_i ν i is the Lagrange multiplier associated to the i − i- i − th equality constraint h i ( x ) = 0 h_i(x) = 0 h i ( x ) = 0
the Lagrangian is defined for unfeasible points: it penalizes the violation of constraints through Lagrange multipliers.
The Lagrange dual function ¶ By minimizing the Lagrangian over all x x x in the domain D \mathcal{D} D , we get a very useful function, called the (Lagrange) dual function .
Comments
the dual function g g g is always concave , even if the problem (1) is not convex . Proof below!
if the Lagrangian L L L is unbounded below in variable x x x , the dual function g g g takes on the value − ∞ -\infty − ∞
the dual function provides a lower bound of the optimal value p ⋆ p^\star p ⋆ of the problem (P ) \mathcal{P}) P ) .
To prove that g g g is a concave function, recall that a function g : C → R g: \mathcal{C} \to \mathbb{R} g : C → R is concave if 1) C \mathcal{C} C is a convex set and 2) − g -g − g is a convex function, i.e.,
for all x , y ∈ C , for all θ ∈ [ 0 , 1 ] , g ( θ x + ( 1 − θ ) y ) ≥ θ g ( x ) + ( 1 − θ ) g ( y ) \text{for all }x,y \in \mathcal{C}, \text{ for all } \theta \in [0, 1], g(\theta x + (1-\theta)y ) \geq \theta g(x) + (1-\theta) g(y) for all x , y ∈ C , for all θ ∈ [ 0 , 1 ] , g ( θ x + ( 1 − θ ) y ) ≥ θ g ( x ) + ( 1 − θ ) g ( y ) Applying this definition to the dual function, we get for C = R m × R p \mathcal{C} = \mathbb{R}^m\times \mathbb{R}^p C = R m × R p which is clearly convex. For point 2), note ξ = ( λ , ν ) \xi = (\lambda, \nu) ξ = ( λ , ν ) for simplicity.
Take x ∈ D x \in \mathcal{D} x ∈ D . For all ξ , ξ ′ \xi,\xi' ξ , ξ ′ and for all θ ∈ [ 0 , 1 ] \theta \in [0, 1] θ ∈ [ 0 , 1 ] ,
L ( x , θ ξ + ( 1 − θ ) ξ ′ ) = θ L ( x , ξ ) + ( 1 − θ ) L ( x , ξ ′ ) L(x, \theta \xi + (1-\theta)\xi') = \theta L(x, \xi) + (1-\theta) L(x, \xi') L ( x , θ ξ + ( 1 − θ ) ξ ′ ) = θ L ( x , ξ ) + ( 1 − θ ) L ( x , ξ ′ ) by definition of the Lagrangian. Or, by definition of the Lagrange dual function, we have L ( x , ξ ) ≥ g ( ξ ) L(x, \xi) \geq g(\xi) L ( x , ξ ) ≥ g ( ξ ) . Therefore,
L ( x , θ ξ + ( 1 − θ ) ξ ′ ) ≥ θ g ( ξ ) + ( 1 − θ ) g ( ξ ′ ) L(x, \theta \xi + (1-\theta)\xi' ) \geq \theta g(\xi) + (1-\theta) g(\xi') L ( x , θ ξ + ( 1 − θ ) ξ ′ ) ≥ θ g ( ξ ) + ( 1 − θ ) g ( ξ ′ ) and since this is true for any x ∈ D x \in \mathcal{D} x ∈ D , in particular,
g ( θ ξ + ( 1 − θ ) ξ ′ ) = inf x ∈ D L ( x , θ ξ + ( 1 − θ ) ξ ′ ) ≥ θ g ( ξ ) + ( 1 − θ ) g ( ξ ′ ) g(\theta \xi + (1-\theta)\xi') = \inf_{x \in \mathcal{D}}L(x, \theta \xi + (1-\theta)\xi' ) \geq \theta g(\xi) + (1-\theta) g(\xi') g ( θ ξ + ( 1 − θ ) ξ ′ ) = x ∈ D inf L ( x , θ ξ + ( 1 − θ ) ξ ′ ) ≥ θ g ( ξ ) + ( 1 − θ ) g ( ξ ′ ) which shows that g g g is concave.
Another very important property is that g g g gives a lower bound on the optimal value of the initial problem (1) .
Proof 1
Suppose that x ~ \tilde{x} x ~ is feasible and that λ ≥ 0 \lambda \geq 0 λ ≥ 0 .
Hence f i ( x ~ ) ≤ 0 f_i(\tilde{x}) \leq 0 f i ( x ~ ) ≤ 0 for i = 1 , … , m i=1, \ldots, m i = 1 , … , m and h i ( x ) = 0 h_i(x)= 0 h i ( x ) = 0 for i = 1 , … , p i=1, \ldots, p i = 1 , … , p .
It results that ∑ i = 1 m λ i f i ( x ~ ) + ∑ i = 1 p ν i h i ( x ~ ) ≤ 0 \sum_{i=1}^m \lambda_i f_i(\tilde{x}) + \sum_{i=1}^p\nu_i h_i(\tilde{x}) \leq 0 ∑ i = 1 m λ i f i ( x ~ ) + ∑ i = 1 p ν i h i ( x ~ ) ≤ 0 .
Therefore
L ( x ~ , λ , ν ) = f 0 ( x ~ ) + ∑ i = 1 m λ i f i ( x ~ ) + ∑ i = 1 p ν i h i ( x ~ ) ≤ f 0 ( x ~ ) L(\tilde{x}, \lambda, \nu) = f_0(\tilde{x})+\sum_{i=1}^m \lambda_i f_i(\tilde{x}) + \sum_{i=1}^p\nu_i h_i(\tilde{x}) \leq f_0(\tilde{x}) L ( x ~ , λ , ν ) = f 0 ( x ~ ) + i = 1 ∑ m λ i f i ( x ~ ) + i = 1 ∑ p ν i h i ( x ~ ) ≤ f 0 ( x ~ ) Hence,
f 0 ( x ~ ) ≥ L ( x ~ , λ , ν ) ≥ inf x ∈ D L ( x , λ , ν ) = g ( λ , ν ) f_0(\tilde{x}) \geq L(\tilde{x}, \lambda, \nu) \geq \inf_{x\in \mathcal{D}} L({x}, \lambda, \nu) = g(\lambda, \nu) f 0 ( x ~ ) ≥ L ( x ~ , λ , ν ) ≥ x ∈ D inf L ( x , λ , ν ) = g ( λ , ν ) The inequality holds for all feasible x ~ \tilde{x} x ~ and in particular for x ~ = x ⋆ \tilde{x} = x^\star x ~ = x ⋆ , one gets g ( λ , ν ) ≤ p ⋆ g(\lambda, \nu) \leq p^\star g ( λ , ν ) ≤ p ⋆ .
The Lagrange dual problem ¶ We can exploit the lower bound property of the dual function to obtain the best lower bound possible on the original problem (1) (which we call the primal problem ).
Comments
a pair ( λ , ν ) (\lambda, \nu) ( λ , ν ) is said to be dual feasible if λ ≥ 0 \lambda \geq 0 λ ≥ 0 and g ( λ , ν ) > − ∞ g(\lambda, \nu) > -\infty g ( λ , ν ) > − ∞
the dual optimal parameters are denoted by ( λ ⋆ , ν ⋆ ) (\lambda^\star, \nu^\star) ( λ ⋆ , ν ⋆ )
the dual problem is always convex (since g g g is always concave) even if the primal problem isn’t!
Weak and strong duality ¶ Let d ⋆ = sup λ ≥ 0 g ( λ , ν ) d^\star = \sup_{\lambda \geq 0} g(\lambda, \nu) d ⋆ = sup λ ≥ 0 g ( λ , ν ) be the optimal value of the dual Lagrange problem.
By the lower bound property of g g g , we have the fundamental property of weak duality .
We have
d ⋆ ≤ p ⋆ d^\star \leq p^\star d ⋆ ≤ p ⋆ Comments
Weak duality always holds even if the primal problem (1) is not convex
the property works even if d ⋆ d^\star d ⋆ or p ⋆ p^\star p ⋆ are infinite.
If the primal problem is unbounded below, p ⋆ = − ∞ p^\star = -\infty p ⋆ = − ∞ and hence d ⋆ = − ∞ d^\star = -\infty d ⋆ = − ∞ (dual is infeasible);
if the dual problem is unbounded above, d = ∞ d=\infty d = ∞ and thus p ⋆ = ∞ p^\star = \infty p ⋆ = ∞ (primal is infeasible).
the quantity p ⋆ − d ⋆ ≥ 0 p^\star - d^\star \geq 0 p ⋆ − d ⋆ ≥ 0 is called the duality gap
Comments
strong duality does not always hold .
strong duality is equivalent to having zero duality gap;
strong duality can be ensured by constraint qualifications .
For convex problems, a useful constraint qualification is known as Slater’s criterion .
If the primal problem is convex, i.e., it takes the form
minimize f 0 ( x ) subject to f i ( x ) ≤ 0 , i = 1 , … , m A x = b \begin{split}
\minimize&\quad f_0(x)\\
\text{subject to}&\quad f_i(x) \leq 0, \quad i = 1, \ldots, m\\
&\quad Ax = b
\end{split} minimize subject to f 0 ( x ) f i ( x ) ≤ 0 , i = 1 , … , m A x = b where f i f_i f i , i = 0 , … , m i=0, \ldots, m i = 0 , … , m are convex functions, and if there exists a feasible x ~ \tilde{x} x ~ s.t. f i ( x ~ ) < 0 , i = 1 , … , m f_i(\tilde{x})< 0, i = 1, \ldots, m f i ( x ~ ) < 0 , i = 1 , … , m and A x ~ = b A\tilde{x} = b A x ~ = b (i.e., it is strictly feasible) then strong duality holds.
In addition, when d ⋆ > − ∞ d^\star > -\infty d ⋆ > − ∞ , the dual optimum is attained.
Keep in mind that Slater’s criterion only applies to convex problems ; in addition it is a sufficient condition for strong duality. Strong duality can exist even if Slater’s criterion is not satisfied.
A few examples to put this notion into practice.
Saddle-point interpretation ¶ Consider, for simplicity, the optimization problem without equality constraints
minimize f 0 ( x ) subject to f i ( x ) ≤ 0 , i = 1 , … , m \begin{split}
\minimize&\quad f_0(x)\\
\text{subject to}&\quad f_i(x) \leq 0, \quad i = 1, \ldots, m
\end{split} minimize subject to f 0 ( x ) f i ( x ) ≤ 0 , i = 1 , … , m with Lagrangian L ( x , λ ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) L(x, \lambda) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) L ( x , λ ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) . Observe that
sup λ ≥ 0 L ( x , λ ) = sup λ ≥ 0 ( f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) ) = { f 0 ( x ) if x is feasible ∞ otherwise \sup_{\lambda\geq 0}L(x, \lambda)= \sup_{\lambda\geq 0} \left(f_0(x) + \sum_{i=1}^m \lambda_i f_i(x)\right) = \begin{cases}
f_0(x) & \text{ if } x \text{ is feasible}\\
\infty & \text{ otherwise}
\end{cases} λ ≥ 0 sup L ( x , λ ) = λ ≥ 0 sup ( f 0 ( x ) + i = 1 ∑ m λ i f i ( x ) ) = { f 0 ( x ) ∞ if x is feasible otherwise To see this, observe that if x x x is feasible, then f i ( x ) ≤ 0 f_i(x) \leq 0 f i ( x ) ≤ 0 for all i = 1 , … , m i=1, \ldots, m i = 1 , … , m . Hence the optimal choice of λ = 0 \lambda=0 λ = 0 . If x x x is not feasible, then for some i i i , f i ( x ) > 0 f_i(x) > 0 f i ( x ) > 0 . Hence L ( x , λ ) L(x, \lambda) L ( x , λ ) is unbounded above as a function of λ \lambda λ and sup λ ≥ 0 L ( x , λ ) = ∞ \sup_{\lambda\geq 0}L(x, \lambda) = \infty sup λ ≥ 0 L ( x , λ ) = ∞ .
p ⋆ = inf x sup λ ≥ 0 L ( x , λ ) , d ⋆ = sup λ ≥ 0 inf x L ( x , λ ) p^\star = \inf_{x} \sup_{\lambda\geq 0}L(x, \lambda), \quad d^\star = \sup_{\lambda\geq 0}\inf_{x}L(x, \lambda) p ⋆ = x inf λ ≥ 0 sup L ( x , λ ) , d ⋆ = λ ≥ 0 sup x inf L ( x , λ ) When strong duality holds p ⋆ = d ⋆ p^\star = d^\star p ⋆ = d ⋆ and thus
inf x sup λ ≥ 0 L ( x , λ ) = sup λ ≥ 0 inf x L ( x , λ ) \inf_{x} \sup_{\lambda\geq 0}L(x, \lambda) = \sup_{\lambda\geq 0}\inf_{x}L(x, \lambda) x inf λ ≥ 0 sup L ( x , λ ) = λ ≥ 0 sup x inf L ( x , λ ) i.e., the order of maximization/ minimization can be interchanged without changing the result.
This equality also characterizes the saddle point property of the Lagrangian L : D × R + m L:\mathcal{D} \times \mathbb{R}^m_+ L : D × R + m ; if x ⋆ x^\star x ⋆ and λ ⋆ \lambda^\star λ ⋆ are primal and dual optimal (and strong duality holds) one has
L ( x ⋆ , λ ) ≤ L ( x ⋆ , λ ⋆ ) ≤ L ( x , λ ⋆ ) for any x and λ ≥ 0 L(x^\star, \lambda) \leq L(x^\star, \lambda^\star) \leq L(x, \lambda^\star) \text{ for any } x \text{ and } \lambda \geq 0 L ( x ⋆ , λ ) ≤ L ( x ⋆ , λ ⋆ ) ≤ L ( x , λ ⋆ ) for any x and λ ≥ 0 i.e., ( x ⋆ , λ ⋆ ) (x^\star, \lambda^\star) ( x ⋆ , λ ⋆ ) define a saddle point of the Lagrangian.
Conversely, if ( x , λ ) (x, \lambda) ( x , λ ) is a saddle-point of the Lagrangian, then x x x is primal optimal, λ \lambda λ is dual optimal with zero duality gap.
Consider the problem
minimize 1 2 x 2 subject to 1 − x ≤ 0 \begin{split}
\minimize&\quad \frac{1}{2}x^2\\
\text{subject to}&\quad 1-x \leq 0
\end{split} minimize subject to 2 1 x 2 1 − x ≤ 0 The solution to the primal problem is trivial and is x ⋆ = 1 x^\star = 1 x ⋆ = 1 with p ⋆ = 1 2 p^\star =\frac{1}{2} p ⋆ = 2 1 . The Lagrangian is L ( x , λ ) = 1 2 x 2 + λ ( 1 − x ) L(x, \lambda) = \frac{1}{2}x^2 + \lambda (1-x) L ( x , λ ) = 2 1 x 2 + λ ( 1 − x ) . The dual function is g ( λ ) = inf x L ( x , λ ) = λ 2 2 + ( 1 − λ ) λ = − λ 2 2 + λ g(\lambda) = \inf_x L(x, \lambda) = \frac{\lambda^2}{2} + (1-\lambda)\lambda = -\frac{\lambda^2}{2} + \lambda g ( λ ) = inf x L ( x , λ ) = 2 λ 2 + ( 1 − λ ) λ = − 2 λ 2 + λ . The function g g g is maximal for λ ⋆ = 1 \lambda^\star = 1 λ ⋆ = 1 ; hence d ⋆ = 1 2 d^\star = \frac{1}{2} d ⋆ = 2 1 . Hence strong duality holds.
Let us compute the Hessian of the Lagrangian at point ( x ⋆ , λ ⋆ ) = ( 1 , 1 ) (x^\star, \lambda^\star) = (1, 1) ( x ⋆ , λ ⋆ ) = ( 1 , 1 ) . One gets
∇ 2 L ( 1 , 1 ) = [ 1 − 1 − 1 0 ] \nabla^2 L(1, 1) = \begin{bmatrix}
1 &-1\\
-1 & 0
\end{bmatrix} ∇ 2 L ( 1 , 1 ) = [ 1 − 1 − 1 0 ] which has eigenvalues of distinct sign. This shows that the point ( 1 , 1 ) (1, 1) ( 1 , 1 ) is indeed a saddle point of the Lagrangian.
import numpy as np
import matplotlib.pyplot as plt
def L(x, lam):
return 0.5*x**2 + lam*(1-x)
x = np.linspace(-0.5, 2.5, 128)
lam = np.linspace(0, 2, 128)
xx, ll = np.meshgrid(x, lam)
plt.figure()
plt.contourf(x, lam, L(xx, ll))
plt.colorbar()
plt.axhline(1, color='k', linestyle = 'dashed')
plt.axvline(1, color='k', linestyle = 'dashed')
plt.scatter([1], [1], label='saddle point', color='r')
plt.legend()
plt.xlabel('$x$')
plt.ylabel('$\lambda$')
fig, ax = plt.subplots(ncols=2, sharey=True)
ax[0].plot(L(x, 1), label='$L(x, \lambda^\star)$', color='b')
ax[1].plot(L(1, lam), label='$L(x^\star, \lambda)$', color='r')
ax[0].set_xlabel('$x$')
ax[1].set_xlabel('$\lambda$')
fig.legend();