LW2: descent methods for unconstrained optimization problems - Mines Nancy

Part 1: gradient descent methods¶

In this notebook, we will explore first-order descent methods -- namely gradient descent to solve (unconstrained) optimization problems. The goals are:

to write your own gradient descent algorithm;
investigate different step-size strategies (fixed size, optimal, and backtracking)
compute convergence metrics to monitor the progress of the algorithm -- and deciding when to stop

Just like in LW1, practical implementation of the algorithms relies on classical Python libraries:

NumPy: For efficient numerical computations, matrix operations, and solving least-squares problems using built-in linear algebra functions.
Matplotlib: For visualizing the data and displaying / interpreting results.
Scipy: a NumPy-based python library for scientific computations. In particular, we we’ll use some functions from scipy.optimize.

Running the Notebook¶

This notebook can be executed in the following environments:

Google Colab: A convenient, cloud-based option that requires no setup. Simply open the notebook in Colab, and you’re ready to run the code.
Locally: You can run the notebook on your local machine using environments like JupyterLab or Jupyter Notebook. You can also run it directly in Visual Studio Code if you have the Jupyter extension. In all cases, ensure you have Python installed along with the required libraries, NumPy and Matplotlib and scipy.

import numpy as np
import matplotlib.pyplot as plt 
import scipy as sp

First example: a quadratic function¶

Consider the following optimization problem

\min_{x \in \mathbb{R}^2} f(x)

where

f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^\top \mathbf{Q} \mathbf{x} - \mathbf{p}^\top \mathbf{x} \text{ with } \begin{cases} \mathbf{Q} = \begin{bmatrix} 1 & 0 \\ 0 & \eta\end{bmatrix}, \: \eta > 0\\ \mathbf{p} = \begin{bmatrix} 1\\1\end{bmatrix}\end{cases}

Preliminary questions

Recall the expression of $\nabla f(\mathbf{x})$ .
What is a minimizer of this optimization problem? Is it unique?
Recall the principle of gradient descent using pseudo-code.

write here your answers using markdown.

Programming questions

Create a function that returns the value of the objective function for a given vector $\mathbf{x} = [x_1, x_2]$ , for $\eta = 5$ .
Display the objective function on the grid $[-1, 1] \times [-1, 1]$ , with, say 100 points on each axis. (check the functions np.meshgrid, plt.pcolormesh, plt.contour, plt.contourf). Comment the different graphical representations.
Gradient descent (constant step-size)
1. write an algorithm that performs constant step-size gradient descent for a fixed number of iterations from a initial point $\mathbf{x}_0$ .
2. display such iterations on top of the cost function.
3. propose several metrics to assess the progress (or convergence) of the algorithm as the iterations goes. Display them as a function of iterations (using plt.semilogy)
4. modify your algorithm to stop whenever one of these metrics goes below a prescribed threshold (fixed in advance).
5. Using numerical experiments, comment on how to choose a constant step size.
Gradient descent (optimal step-size)
1. recall the principle of the optimal stepsize strategy.
2. adapt the previous algorithm to compute the optimal stepsize at each iteration.
3. display the evolution of performance metrics along iterations.
Compare both strategies in terms of convergence speed and comment. How does the parameter $\eta$ influences algorithmic properties?

# question 1

# question 2

# question 3

# question 4

# question 5

Second example: a non-quadratic function¶

Example from Boyd and Vanderberghe.

Consider the optimization problem

\min_{x_1, x_2} f(x_1, x_2)

where

f(x_1, x_2)= e^{x_1 + 3x_2 - 0.1} + e^{x_1 - 3x_2 - 0.1} + e^{-x_1 - 0.1}

In this example, we’ll implement the backtracking method seen in class.

Preliminary questions

Compute the expression of $\nabla f(\mathbf{x})$ .
Is the problem convex? Comment the existence and uniqueness of solutions to this optimization problem.

write here your answers using markdown.

Programming questions

Create a function that returns the value of the objective function for a given vector $\mathbf{x} = [x_1, x_2]$ .
Display the objective function on the grid $[-2, 0.5] \times [-0.5, 0.5]$ , with, say 100 points on each axis.
Gradient descent (constant step-size and optimal step size)
1. Following the quadratic example, write a gradient stepsize algorithm for this problem.
2. Write a second algorithm with optimal step size selection at each iteration. Do not try to compute the optimal step analytically; rather, look at the function scipy.optimize.minimize_scalar to compute the step numerically.
Gradient descent (backtracking)
1. Recall the principle of backtracking using pseudocode.
2. For the first gradient descent iteration, implement the backtracking procedure seen in class for arbitrary parameters $(s, \eta)$ .
3. How does the amount of backtracks varies as parameters $(s, \eta)$ are changed?
4. Incorporate the backtracking strategy in the gradient descent algorithm. Add a counter that tracks the number of backtracks in the algorithm and monitor this information.
Compare all three approaches trough selected graphical plots.

# Question 1

# Question 2

# Question 3

# Question 4

# Question 5

Part 2: second-order descent methods¶

Consider again the second optimization problem, i.e.,

\min_{x_1, x_2} f(x_1, x_2)

where

f(x_1, x_2)= e^{x_1 + 3x_2 - 0.1} + e^{x_1 - 3x_2 - 0.1} + e^{-x_1 - 0.1}

Preliminary questions

compute the Hessian of $f$
using pseudo-code, recall the principle of Newton methods and quasi-Newton methods.

write here your answers using markdown.

Programming questions

Implement Newton’s method for this problem.
Implement the BFGS algorithm for this problem. For the first iteration, use $B_0 = \nabla^2 f(x^{(0)})$ to initialize the method.
Compare gradient descent, Newton’s algorithm and BFGS through numerical experiments. Evaluate convergence speed, numerical complexity and timings.

# Question 1

# Question 2

# Question 3