2.4 Optimality conditions for unconstrained optimization
First, we consider the unconstrained optimization problem
minimize f 0 ( x ) subject to x ∈ R n \begin{array}{ll}
\minimize& f_0({x})\\
\st & x \in \mathbb{R}^n
\end{array} minimize subject to f 0 ( x ) x ∈ R n We assume that f 0 f_0 f 0 has “good” properties on R n \mathbb{R}^n R n , i.e., f 0 f_0 f 0 is twice continuously differentiable (or at least continuously differentiable).
Provided optima exist (not always guaranteed!), we will derive conditions on a point x ⋆ x^\star x ⋆ to be a local optimum of the problem. (in the unconstrained case, one also says that x ⋆ x^\star x ⋆ is a local minimizer of f 0 f_0 f 0 ).
To be able to characterize global optimality , we’ll need to assume convexity of the objective function f 0 f_0 f 0 .
Most of the results from this section are reproduced from Nocedal & Wright (2006, Chapter 2) , where detailed proofs are available.
Necessary conditions (unconstrained case) ¶ We call x ⋆ x^\star x ⋆ a stationary or critical point of f 0 f_0 f 0 if ∇ f 0 ( x ⋆ ) = 0 \nabla f_0(x^\star) = 0 ∇ f 0 ( x ⋆ ) = 0 .
Any local optima must be a stationary point.
Prove Theorem 1 and Theorem 2 by contradiction using Taylor’s theorem.
Theorem 1 and Theorem 2 only give necessary conditions for a point x ⋆ x^\star x ⋆ to be optimal.
Some possible situations in 1D.
For higher dimensions (here 2D), the situation is even more subtle.
Sufficient conditions (unconstrained case) ¶ Suppose that ∇ 2 f 0 \nabla^2 f_0 ∇ 2 f 0 is continuous in an open neighborhood of x ⋆ x^\star x ⋆ and that ∇ f 0 ( x ⋆ ) = 0 \nabla f_0(x^\star) = 0 ∇ f 0 ( x ⋆ ) = 0 and ∇ 2 f 0 ( x ⋆ ) ≻ 0 \nabla^2 f_0(x^\star) \succ 0 ∇ 2 f 0 ( x ⋆ ) ≻ 0 (is positive definite). Then x ⋆ x^\star x ⋆ is a strict local optimum of (1) , or equivalently, x ⋆ x^\star x ⋆ is a strict local minimizer of f 0 f_0 f 0 .
Sufficient conditions guarantee that local optima are strict , i.e. they are isolated. (Compare with the necessary conditions of Theorem 1 and Theorem 2 )
These sufficient conditions are not necessary: a point x ⋆ x^\star x ⋆ can fail to satisfy the conditions and yet be a strict minimizer of f 0 f_0 f 0 .
Example: f ( x ) = x 4 f(x) = x^4 f ( x ) = x 4 ; the point x ⋆ = 0 x^\star = 0 x ⋆ = 0 is a strict local minimizer (and global as well) but f ′ ′ ( 0 ) = 0 f''(0) = 0 f ′′ ( 0 ) = 0 shows that the Hessian is not positive definite at this point.
Exercise 2 (Quadratic function)
Let f : R 2 → R f:\mathbb{R}^2 \rightarrow \mathbb{R} f : R 2 → R such that f ( x ) = x 1 2 − x 2 2 f(x) = x_1^2 - x_2^2 f ( x ) = x 1 2 − x 2 2 . Compute its gradient and Hessian. List critical points and their properties.
To help, here is how to plot this function in Python.
import numpy as np
import matplotlib.pyplot as plt
u = np.linspace(-1, 1, 128)
x1, x2= np.meshgrid(u, u)
plt.contourf(x1, x2, x1**2-x2**2)
plt.colorbar()
plt.xlabel(r'$x_1$')
plt.ylabel(r'$x_2$');
Exercise 3 (Rosenbrock function)
Let f : R 2 → R f:\mathbb{R}^2 \rightarrow \mathbb{R} f : R 2 → R such that f ( x ) = ( 1 − x 1 ) 2 + 5 ( x 2 − x 1 2 ) 2 f(x) = (1-x_1)^2 + 5(x_2-x_1^2)^2 f ( x ) = ( 1 − x 1 ) 2 + 5 ( x 2 − x 1 2 ) 2 .
Does the point [ 1 , 1 ] ⊤ [1, 1]^\top [ 1 , 1 ] ⊤ satisfy the necessary conditions? the sufficient conditions?
Necessary and sufficient conditions (unconstrained case) ¶ When f 0 f_0 f 0 is convex there is a simple characterization of optimal points.
Consider the unconstrained ptimization problem (1) and suppose that f 0 : R n → R f_0:\mathbb{R}^n\rightarrow \mathbb{R} f 0 : R n → R is convex. A point x ⋆ x^\star x ⋆ is a local optimum (hence global by Theorem 3 ) if and only if x ⋆ x^\star x ⋆ is a stationary point of f 0 f_0 f 0 , i.e., such that ∇ f 0 ( x ⋆ ) = 0 \nabla f_0(x^\star) = 0 ∇ f 0 ( x ⋆ ) = 0 .
If f 0 f_0 f 0 is strictly convex, then this theorem gives a characterization of the unique global optimum of the problem (when it exists). Finding points such that ∇ f 0 ( x ⋆ ) = 0 \nabla f_0 (x^\star) = 0 ∇ f 0 ( x ⋆ ) = 0 (stationary points) is the foundation for many unconstrained optimization algorithms, even in the non-convex case. In chapter 3, we’ll see a very important application of this result to solve a very important category of optimization problems, called (unconstrained) least-squares problems.
Nocedal, J., & Wright, S. J. (2006). Numerical optimization (Second Edition). Springer.