Introduction¶
In this chapter, we will focus on studying convergence properties of unconstrained optimization algorithms. Specifically, we focus on problems of the form
(Note that we will use to denote the objective function instead of to simplify the notation in the chapter). We also make two assumptions:
the function is (at least) continuously differentiable
the problem admits at least one solution, i.e., there exists a (global) optima and we denote by the optimal value.
Recall from Chapter 4 the principle of descent methods: starting from an initial point , iteratively compute
where is the stepsize and where is a descent direction, i.e., such that .
Two important examples
Gradient descent: first-order update rule
Newton’s method: second-order update rule
In this chapter, we will study convergence properties of such algorithms. We will see how properties of the objective function impact what can be said about the sequence , in terms of type and rate of convergence.
- Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
- Nocedal, J., & Wright, S. J. (2006). Numerical optimization (Second Edition). Springer.
- Wright, S. J., & Recht, B. (2022). Optimization for data analysis. Cambridge University Press.
- Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3–4), 231–357.
- Nesterov, Y. (2013). Introductory lectures on convex optimization: A basic course (Vol. 87). Springer Science & Business Media.