Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Principle

Consider the optimization problem

minimizef0(x)subject toxΩ \begin{array}{ll} \minimize &\quad f_0(x)\\ \st &\quad x \in \Omega \end{array}

where f0f_0 is the objective and ΩRn\Omega \subset \mathbb{R}^n is the constraint set.

Key idea : replace (1) with the unconstrained optimization problem

minimizef0(x)+ρp(x)subject toxRn,ρ>0\begin{array}{ll} \minimize&\quad f_0(x) + \rho p(x)\\ \st & \quad x \in \mathbb{R}^n \end{array} ,\quad \rho > 0

where p:RnRp:\mathbb{R}^n \rightarrow \mathbb{R} is the penalty function such that

The penalty parameter ρ\rho controls the cost of violating the constraints.

Choosing penalty functions

Recall Ω\Omega is often parameterized by equality and inequality constraints.

Ω={xRn|fi(x)0,i=1,,mhi(x)=0,i=1,,p}\Omega = \left\lbrace x \in \mathbb{R}^n \:\middle\vert \begin{array}{rl} f_i(x) \leq 0&, \quad i = 1, \ldots, m\\ h_i(x) = 0&, \quad i = 1, \ldots, p \end{array}\right\rbrace

Typical choices of penalty functions: use \alert{quadratic functions}

Note that besides quadratic functions, one can consider also non-smooth penalties such as 1\ell_1-norm with very interesting properties (outside the scope of this lecture).

Penalty method algorithm

We thus replace solving (1) by solving a succession of unconstrained penalized problems

minimizef0(x)+ρkp(x)subject toxRn,ρk>0 \begin{array}{ll} \minimize&\quad f_0(x) + \rho_k p(x)\\ \st & \quad x \in \mathbb{R}^n \end{array},\quad \rho_k > 0

with increasing penalties ρ1<ρ2<<ρk+\rho_1 < \rho_2 < \ldots < \rho_k \rightarrow +\infty.

This results suggests the following algorithm.

How-to compute solutions of penalized problems?

Consider a problem with one inequality constraint f1(x)0f_1(x) \le 0. The quadratic penalty is

p(x)=12[max(0,f1(x))]2=12[f1+(x)]2.p(x)=\tfrac12\left[\max(0,f_1(x))\right]^2 =\tfrac12 \left[f_1^+(x)\right]^2 .

How to compute the gradient of the penalty function?

A potential concern is that f1+f_1^+ is not differentiable at points where f1(x)=0f_1(x)=0. However, the penalty can be written as a composition

p(x)=γ(f1(x)),γ(y)=12[max(0,y)]2.p(x)=\gamma(f_1(x)), \qquad \gamma(y)=\tfrac12 \left[\max(0,y)\right]^2 .

Although the map ymax(0,y)y\mapsto \max(0,y) is not differentiable at 0, the quadratic penalty γ\gamma is differentiable everywhere, because

γ(y)=max(0,y),\gamma'(y)=\max(0,y),

and in particular γ(0)=0\gamma'(0)=0. Therefore, if f1f_1 is differentiable, the chain rule applies.

Using the chain rule,

p(x)=γ(f1(x))f1(x)=f1+(x)f1(x)={f1(x)f1(x),f1(x)>0,0,f1(x)0.\nabla p(x) = \gamma'(f_1(x))\, \nabla f_1(x) = f_1^+(x)\,\nabla f_1(x) = \begin{cases} f_1(x)\,\nabla f_1(x), & f_1(x)>0, \\[4pt] 0, & f_1(x)\le 0 . \end{cases}

Thus the quadratic penalty is differentiable (but not twice differentiable) even at points where the constraint is active, and its gradient is given by the expression above.

Examples

A simple problem

Consider the (very) simple problem

minimizexsubject tox1\begin{split} \text{minimize}&\quad x \\ \text{subject to}&\quad x\geq 1 \end{split}

and solve it using the penalty method.

Minimum-norm solution of linear equations

Using the penalty method, solve the following problem:

minimize12x22subject toAx=b\begin{split} \text{minimize}&\quad \frac{1}{2} \Vert x\Vert_2^2 \\ \text{subject to}&\quad Ax = b \end{split}

A nonconvex 1D example

We consider the following nonconvex optimization problem

minimize(x11)2+(x2+1)2subject tox12+x22sin(3x1)cos(2x2)20\begin{array}{ll} \minimize & \quad (x_1-1)^2 + (x_2+1)^2\\ \st&\quad x_1^2+x_2^2 - \sin(3x_1)\cos(2x_2) -2\leq 0 \end{array}

The feasible set of this problem is nonconvex, as shown by the plot below (green area).

Source
import numpy as np
import matplotlib.pyplot as plt

# define objective and constraints
def f(x, y):
    return (x - 2)**2 + (y + 1)**2

def g(x, y):
    """Nonlinear inequality constraint g(x,y) <= 0"""
    return -(2- x**2-y**2 +  np.sin(3*x) * np.cos(2*y))

# Grid for contour plots
xx = np.linspace(-2.5, 3, 400)
yy = np.linspace(-2.5, 2.5, 400)
X, Y = np.meshgrid(xx, yy)

F = f(X, Y)
G = g(X, Y)

plt.figure(figsize=(10, 8))

# Objective contours
CS1 = plt.contour(X, Y, F, levels=20, cmap='viridis', alpha=0.6)
plt.clabel(CS1, inline=1, fontsize=8)

# Constraint boundary (g(x,y)=0)
CS2 = plt.contour(X, Y, G, levels=[0], colors='red', linewidths=2)
plt.clabel(CS2, fmt={0: '$f_1(x) = 0$'}, fontsize=10)

# Feasible region shading
plt.contourf(X, Y, G, levels=[-10, 0], colors=['#d0ffd0'], alpha=0.3)
plt.xlabel("$x_1$")
plt.ylabel("$x_2$")
plt.scatter(2, -1, color='black', marker='*', s=150, label='Unconstrained solution')
plt.legend()
<Figure size 1000x800 with 1 Axes>

Let us display how the penalized cost function will look for different values of ρ\rho.

Source
rhos = [0, .1, .5, 1]
Gplus = np.maximum(0, G)
fig, ax = plt.subplots(ncols=len(rhos), figsize=(12, 4), sharex=True, sharey=True)
for i, rho in enumerate(rhos):

    ax[i].contour(X, Y, F+rho*(Gplus)**2)
    ax[i].set_title(r'$\rho = '+str(rho)+'$')
<Figure size 1200x400 with 4 Axes>

We can iteratively minimize penalized problems with increasing penalty parameter ρ\rho. Let’s do this numerically and plot the results.

Source
# redefine function to be used with scipy minimize
from scipy.optimize import minimize

def f(xy):
    x, y = xy
    return (x - 2)**2 + (y + 1)**2

def g(xy):
    x, y = xy
    return -(2- x**2-y**2 +  np.sin(3*x) * np.cos(2*y))

def g_plus(xy):
    return max(0.0, g(xy))

# penalized objective
def P_mu(xy, rho):
    return f(xy) + 0.5 * rho * g_plus(xy)**2


rhos = [1, 2, 5, 10, 100]
solutions = []

x0 = np.array([2, -1])   # initial guess

for rho in rhos:

    # Wrap for minimize
    fun = lambda xy: P_mu(xy, rho)

    # Run optimizer
    res = minimize(fun, x0, method='BFGS') #options={'disp': True} to show details

    sol = res.x
    solutions.append(sol)


    # warm start next iteration
    x0 = sol

print("\nSolutions for increasing rho:")
for rho, sol in zip(rhos, solutions):
    print(f"rho={rho:4d}:  x={sol},  f_1(x)={g(sol)}")

## plots
xx = np.linspace(-2.5, 3, 400)
yy = np.linspace(-2.5, 2.5, 400)
X, Y = np.meshgrid(xx, yy)

F = (X - 2)**2 + (Y + 1)**2
G = X**2+Y**2 -  np.sin(3*X) * np.cos(2*Y)-2

plt.figure(figsize=(10, 8))

# Objective contours
CS1 = plt.contour(X, Y, F, levels=20, cmap='viridis', alpha=0.6)
plt.clabel(CS1, inline=1, fontsize=8)

# Constraint boundary
CS2 = plt.contour(X, Y, G, levels=[0], colors='red', linewidths=2)
plt.clabel(CS2, fmt={0: "$f_1(x) = 0$"}, fontsize=10)

# Feasible region shading
plt.contourf(X, Y, G, levels=[-10, 0], colors=['#d0ffd0'], alpha=0.3)

# Plot solutions for each mu
colors = ['blue', 'green', 'orange', 'purple', 'red']
for sol, rho, col in zip(solutions, rhos, colors):
    plt.scatter(sol[0], sol[1], color=col, s=80, label=f"ρ={rho}")


plt.title("Penalty Method in 2D")
plt.xlabel("$x_1$")
plt.ylabel("$x_2$")
plt.scatter(2, -1, color='black', marker='*', s=150, label='Unconstrained solution')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

Solutions for increasing rho:
rho=   1:  x=[ 1.38285633 -0.93313364],  f_1(x)=0.5369100930598791
rho=   2:  x=[ 1.29431356 -0.86992303],  f_1(x)=0.3184002494305595
rho=   5:  x=[ 1.22615353 -0.8057165 ],  f_1(x)=0.13185377660643813
rho=  10:  x=[ 1.20427496 -0.78335207],  f_1(x)=0.06577643106933301
rho= 100:  x=[ 1.18560937 -0.76388718],  f_1(x)=0.006543126703541777
<Figure size 1000x800 with 1 Axes>

In the limit of large ρ\rho, we would get the (unique) solution to the initial problem. Stopping at a given ρ\rho value gives an approximation of that point, characterized by the value of f1(x)f_1(x), which quantifies “how much” we are violating the constraint.