Skip to article frontmatterSkip to article content

2.1The mathematical toolbox

This section reviews essential mathematical concepts used throughout optimization. We focus on differentiability, gradients, Hessians, and related tools that form the foundation for analyzing and solving optimization problems.

Differentiability

Let f:URf: U \to \mathbb{R} be a function defined on an open subset UU of Rn\mathbb{R}^n.

If ff is differentiable at aUa \in U, the differential dfadf_{a} can be expressed in terms of partial derivatives of ff at aa:

dfa(h)=i=1nhifxi(a)df_{a}(h) = \sum_{i=1}^n h_i \frac{\partial f}{\partial x_i}(a)

and as the limit, for any hRnh \in \mathbb{R}^n,

dfa(h)=limε0f(a+εh)f(a)εdf_{a}(h) = \lim_{\varepsilon \rightarrow 0} \frac{f(a+\varepsilon h) - f(a)}{\varepsilon}

If fC1f \in C^1, we say it is continuously differentiable. If fC2f\in C^2, we say it is twice continuously differentiable.

Keep in mind that the converse is not true, as shown by the following example.

Gradient and Hessian

Probably two of the most important tools for this lecture!

Positive definiteness

Let ARn×nA \in \mathbb{R}^{n\times n} be a symmetric matrix (A=AA^\top = A).

We write A0A \succeq 0 when it is PSD, and A0A \succ 0 when it is PD.

Proof

Let ARn×nA \in \mathbb{R}^{n\times n} be symmetric. By the spectral theorem, AA is diagonizable by a orthogonal matrix QRn×nQ\in \mathbb{R}^{n\times n} such that A=QDQA = Q D Q^\top, where D=diag(λ1,λ2,,λN)D = \diag(\lambda_1, \lambda_2, \ldots, \lambda_N) is a real diagonal matrix. Then let vRnv \in \mathbb{R}^n be arbitrary and denote by w=QTvw = Q^T v its projection into the orthonormal basis defined by QQ. Compute

vAv=vTQDQTv=wDw=i=1nλiwi2v^\top A v = v^T Q D Q^T v = w^\top D w = \sum_{i=1}^n \lambda_i w_i^2

Clearly, this expression is non-negative for all vv if and only if λi0\lambda_i \geq 0 for every i=1,,ni=1, \ldots, n, and strictly positive if and only λi>0\lambda_i > 0 for every i=1,,ni=1, \ldots, n.

As we shall see, the positive (semi)-definiteness of the Hessian matrix is crucial to characterize solutions of optimization problems.

Taylor’s theorem

Among different versions of Taylor’s theorem, the following will be useful for proofs later on.

References
  1. Nocedal, J., & Wright, S. J. (2006). Numerical optimization (Second Edition). Springer.