Skip to article frontmatterSkip to article content

Principle of descent algorithms

Consider the unconstrained optimization problem

minimizef0(x)subject toxRn\begin{array}{ll} \minimize &\quad f_0(x)\\ \st & \quad x \in \mathbb{R}^n \end{array}

To solve (1), we construct a sequence x(0),x(1),x^{(0)}, x^{(1)}, \ldots such that the k+1k+1-th iteration reads

x(k+1)=x(k)+αkdkx^{(k+1)} = x^{(k)} + \alpha_k d_k

where αk0\alpha_k \geq 0 is called the step size and dkd_k is a direction at iteration kk. The goal is to construct a sequence {x(k)}\lbrace x^{(k)} \rbrace that describes a descent algorithm, i.e.,

f0(x(0))>f0(x(1))>>f0(x(k))>f0(x(k+1))>f_0(x^{(0)}) > f_0(x^{(1)}) > \ldots > f_0(x^{(k)}) > f_0(x^{(k+1)}) > \ldots

with the goal that, in the limit of kk\to \infty, x(k)xx^{(k)} \rightarrow x^\star with x x^\star being (at least) a local minimizer of f0f_0.

A zoo of descent algorithms

Simply put, strategies for choosing the direction dkd_k define different categories of optimization algorithms. Generally speaking, dkd_k is chosen as

dk=Bk1f0(x(k))d_k = -B_k^{-1}\nabla f_0(x^{(k)})

where BkB_k is symmetric and invertible. In the next sections, we will see several classical choices for the matrix BkB_k:

How to define a stopping criterion

Before going over these methods in more detail, it is important to address the following question: how to check that an algorithm has converged? In other terms, when do we decide to stop the sequence (2)?

This is a wide topic, but the main idea is stop the algorithm whenever at the current iteration, a given criterion becomes smaller than a pre-defined (small) tolerance ε\varepsilon (e.g., ε=106\varepsilon = 10^{-6}).

To avoid scaling issues, it is common to use a criterion calculated in relative terms, for example:

f0(x(k+1))f0(x(k))f0(x(k))εorx(k+1)x(k)x(k)ε\frac{\vert f_0(x^{(k+1)}) - f_0(x^{(k)})\vert}{\vert f_0(x^{(k)}) \vert} \leq \varepsilon \quad \text{or} \quad \frac{\Vert x^{(k+1)} - x^{(k)}\Vert}{\Vert x^{(k)}\Vert} \leq \varepsilon

Now, let us turn our attention to gradient descent.