Skip to article frontmatterSkip to article content

4.7Choosing an algorithm: convergence rates and computational cost tradeoffs

Consider the optimization problem

minimizef0(x)subject toxRn\begin{array}{ll} \minimize & \quad f_0(x) \\ \st & \quad x \in \mathbb{R}^n \end{array}

and assume a solution exists, i.e., there exists xRnx^\star \in \mathbb{R}^n such that f(x)=p<f(x^\star) = p^\star < \infty. The choice of an algorithm for solving the problem usually depends on a tradeoff between two essential properties:

Typical convergence rates

Depending on how the metric εk\varepsilon_k behaves asymptotically, we define several convergence rates:

εk+1εk2kμ,0<μ<1\frac{\varepsilon_{k+1}}{\varepsilon_k^2} \underset{k\rightarrow \infty}{\rightarrow} \mu, \quad 0 < \mu < 1

The terms linear, superlinear and quadratic refer to the curves made by the metric εk\varepsilon_k across iterations, in a semilogy plot, as shown below.

Source
import numpy as np
import matplotlib.pyplot as plt

iters = np.arange(0, 50)
eps0 = 1
rate=.95
linear = eps0 * rate ** iters
superlinear = eps0 * rate ** (iters ** 1.5)
quadratic = eps0 * rate ** (2 ** iters)
quadratic[10:] = 0
plt.semilogy(iters, linear, 'o-', label='Linear')
plt.semilogy(iters, superlinear, 's-', label='Superlinear')
plt.semilogy(iters, quadratic, '^-', label='Quadratic')
plt.xlabel('Iteration')
plt.ylabel('Error metric')
plt.legend()
plt.title('Convergence Rates')
plt.show()
<Figure size 640x480 with 1 Axes>

Computational complexity

The computational complexity of an algorithm is often measured in terms of flops (floating-point operations), which count the number of basic arithmetic operations such as addition and multiplication.

The table below summarizes the complexity of some basic operations.

OperationDescriptionFlops Complexity
Dot productaba^\top b with a,bRna, b \in \mathbb{R}^nO(n)\mathcal{O}(n)
Matrix-vector productAbA b with ARm×n, bRnA \in \mathbb{R}^{m\times n},\ b \in \mathbb{R}^nO(mn)\mathcal{O}(mn)
Matrix-matrix productABA B with ARm×n, BRn×pA \in \mathbb{R}^{m\times n},\ B \in \mathbb{R}^{n\times p}O(mnp)\mathcal{O}(mnp)
Matrix inversionA1A^{-1} with ARn×nA \in \mathbb{R}^{n\times n}O(n3)\mathcal{O}(n^3)

Note these are complexity obtained by standard, simple implementations; some complexities can be refined in some settings by taking into account the structure (sparsity, etc.) of matrices and vectors.

The total computational cost of an algorithm depends on the sequence and combination of such operations. In practice, other factors like memory usage and data movement can also significantly affect performance.

Properties of some classical algorithms

More to come in the dedicated chapter on convergence properties. These results apply to unconstrained problems (for constrained problems, some of these results hold under certain conditions).

x(k+1)=x(k)αk[2f0(x(k))]1f0(x(k))x^{(k+1)} = x^{(k)} - \alpha_k \left[\nabla^2 f_0(x^{(k)})\right]^{-1}\nabla f_0(x^{(k)})