The mathematical toolbox - Mines Nancy

This section reviews essential mathematical concepts used throughout optimization. We focus on differentiability, gradients, Hessians, and related tools that form the foundation for analyzing and solving optimization problems.

Differentiability¶

Let $f: U \to \mathbb{R}$ be a function defined on an open subset $U$ of $\mathbb{R}^n$ .

If $f$ is differentiable at $a \in U$ , the differential $df_{a}$ can be expressed in terms of partial derivatives of $f$ at $a$ :

df_{a}(h) = \sum_{i=1}^n h_i \frac{\partial f}{\partial x_i}(a)

(2)

and as the limit, for any $h \in \mathbb{R}^n$ ,

df_{a}(h) = \lim_{\varepsilon \rightarrow 0} \frac{f(a+\varepsilon h) - f(a)}{\varepsilon}

(3)

If $f \in C^1$ , we say it is continuously differentiable. If $f\in C^2$ , we say it is twice continuously differentiable.

Keep in mind that the converse is not true, as shown by the following example.

Gradient and Hessian¶

Probably two of the most important tools for this lecture!

Definition 4 (Hessian)

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be of class $C^2$ . For $a \in \mathbb{R}^n$ , the matrix

\nabla^2 f(a) = \begin{bmatrix} \frac{\partial^2 f}{{\partial x_1}^2}(a) & \frac{\partial^2 f}{\partial x_1\partial x_2}(a) & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n}(a) \\ \frac{\partial^2 f}{\partial x_2\partial x_1}(a) & \frac{\partial^2 f}{{\partial x_2}^2}(a) & \cdots & \frac{\partial^2 f}{\partial x_2\partial x_n}(a) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1}(a) & \frac{\partial^2 f}{\partial x_n\partial x_2}(a) & \cdots & \frac{\partial^2 f}{{\partial x_n}^2}(a) \end{bmatrix} \in \mathbb{R}^{n\times n}

(5)

is called the Hessian matrix of $f$ at point $a$ . Sometimes the notation $\mathsf{Hess}\,f (a)$ is also used.

Positive definiteness¶

Let $A \in \mathbb{R}^{n\times n}$ be a symmetric matrix ( $A^\top = A$ ).

We write $A \succeq 0$ when it is PSD, and $A \succ 0$ when it is PD.

Proof

Let $A \in \mathbb{R}^{n\times n}$ be symmetric. By the spectral theorem, $A$ is diagonizable by a orthogonal matrix $Q\in \mathbb{R}^{n\times n}$ such that $A = Q D Q^\top$ , where $D = \diag(\lambda_1, \lambda_2, \ldots, \lambda_N)$ is a real diagonal matrix. Then let $v \in \mathbb{R}^n$ be arbitrary and denote by $w = Q^T v$ its projection into the orthonormal basis defined by $Q$ . Compute

v^\top A v = v^T Q D Q^T v = w^\top D w = \sum_{i=1}^n \lambda_i w_i^2

(7)

Clearly, this expression is non-negative for all $v$ if and only if $\lambda_i \geq 0$ for every $i=1, \ldots, n$ , and strictly positive if and only $\lambda_i > 0$ for every $i=1, \ldots, n$ .

As we shall see, the positive (semi)-definiteness of the Hessian matrix is crucial to characterize solutions of optimization problems.

Taylor’s theorem¶

Among different versions of Taylor’s theorem, the following will be useful for proofs later on.

References¶

Nocedal, J., & Wright, S. J. (2006). Numerical optimization (Second Edition). Springer.

Mines Nancy - Optimization course

Characterizing solutions