About convergence - Mines Nancy - Optimization course

Introduction¶

In this chapter, we will focus on studying convergence properties of unconstrained optimization algorithms. Specifically, we focus on problems of the form

\begin{array}{ll} \minimize &\quad f(x)\\ \st& \quad x \in \mathbb{R}^n \end{array}

(1)

(Note that we will use $f$ to denote the objective function instead of $f_0$ to simplify the notation in the chapter). We also make two assumptions:

the function $f$ is (at least) continuously differentiable
the problem admits at least one solution, i.e., there exists a (global) optima $x^\star$ and we denote by $p^\star = f(x^\star) < \infty$ the optimal value.

Recall from Chapter 4 the principle of descent methods: starting from an initial point $x^{(0)}$ , iteratively compute

x^{(k+1)} = x^{(k)} + \alpha_k d_k

(2)

where $\alpha_k$ is the stepsize and where $d_k$ is a descent direction, i.e., such that $d_k^\top \nabla f(x^{(k)}) < 0$ .

Two important examples

Gradient descent: first-order update rule $x^{(k+1)} = x^{(k)} - \alpha_k \nabla f(x^{(k)})$
Newton’s method: second-order update rule $x^{(k+1)} = x^{(k)} - \alpha_k \left[\nabla^2f(x^{(k)})\right]^{-1} \nabla f(x^{(k)})$

In this chapter, we will study convergence properties of such algorithms. We will see how properties of the objective function $f$ impact what can be said about the sequence $\lbrace x^{(k)}\rbrace$ , in terms of type and rate of convergence.

References¶

Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Nocedal, J., & Wright, S. J. (2006). Numerical optimization (Second Edition). Springer.
Wright, S. J., & Recht, B. (2022). Optimization for data analysis. Cambridge University Press.
Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3–4), 231–357.
Nesterov, Y. (2013). Introductory lectures on convex optimization: A basic course (Vol. 87). Springer Science & Business Media.

6 About convergence

Introduction¶