Launch in Colab Download .ipynb file
Part 1: gradient descent methods¶
In this notebook, we will explore first-order descent methods -- namely gradient descent to solve (unconstrained) optimization problems. The goals are:
to write your own gradient descent algorithm;
investigate different step-size strategies (fixed size, optimal, and backtracking)
compute convergence metrics to monitor the progress of the algorithm -- and deciding when to stop
Just like in LW1, practical implementation of the algorithms relies on classical Python libraries:
NumPy: For efficient numerical computations, matrix operations, and solving least-squares problems using built-in linear algebra functions.
Matplotlib: For visualizing the data and displaying / interpreting results.
Scipy: a NumPy-based python library for scientific computations. In particular, we we’ll use some functions from
scipy.optimize.
Running the Notebook¶
This notebook can be executed in the following environments:
Google Colab: A convenient, cloud-based option that requires no setup. Simply open the notebook in Colab, and you’re ready to run the code.
Locally: You can run the notebook on your local machine using environments like JupyterLab or Jupyter Notebook. You can also run it directly in Visual Studio Code if you have the Jupyter extension. In all cases, ensure you have Python installed along with the required libraries,
NumPyandMatplotlibandscipy.
import numpy as np
import matplotlib.pyplot as plt
import scipy as spFirst example: a quadratic function¶
Consider the following optimization problem
where
Preliminary questions
Recall the expression of .
What is a minimizer of this optimization problem? Is it unique?
Recall the principle of gradient descent using pseudo-code.
write here your answers using markdown.
Programming questions
Create a function that returns the value of the objective function for a given vector , for .
Display the objective function on the grid , with, say 100 points on each axis. (check the functions
np.meshgrid,plt.pcolormesh,plt.contour,plt.contourf). Comment the different graphical representations.Gradient descent (constant step-size)
write an algorithm that performs constant step-size gradient descent for a fixed number of iterations from a initial point .
display such iterations on top of the cost function.
propose several metrics to assess the progress (or convergence) of the algorithm as the iterations goes. Display them as a function of iterations (using
plt.semilogy)modify your algorithm to stop whenever one of these metrics goes below a prescribed threshold (fixed in advance).
Using numerical experiments, comment on how to choose a constant step size.
Gradient descent (optimal step-size)
recall the principle of the optimal stepsize strategy.
adapt the previous algorithm to compute the optimal stepsize at each iteration.
display the evolution of performance metrics along iterations.
Compare both strategies in terms of convergence speed and comment. How does the parameter influences algorithmic properties?
# question 1# question 2
# question 3# question 4# question 5Second example: a non-quadratic function¶
Example from Boyd and Vanderberghe.
Consider the optimization problem
where
In this example, we’ll implement the backtracking method seen in class.
Preliminary questions
Compute the expression of .
Is the problem convex? Comment the existence and uniqueness of solutions to this optimization problem.
write here your answers using markdown.
Programming questions
Create a function that returns the value of the objective function for a given vector .
Display the objective function on the grid , with, say 100 points on each axis.
Gradient descent (constant step-size and optimal step size)
Following the quadratic example, write a gradient stepsize algorithm for this problem.
Write a second algorithm with optimal step size selection at each iteration. Do not try to compute the optimal step analytically; rather, look at the function
scipy.optimize.minimize_scalarto compute the step numerically.
Gradient descent (backtracking)
Recall the principle of backtracking using pseudocode.
For the first gradient descent iteration, implement the backtracking procedure seen in class for arbitrary parameters .
How does the amount of backtracks varies as parameters are changed?
Incorporate the backtracking strategy in the gradient descent algorithm. Add a counter that tracks the number of backtracks in the algorithm and monitor this information.
Compare all three approaches trough selected graphical plots.
# Question 1# Question 2# Question 3# Question 4# Question 5Part 2: second-order descent methods¶
Consider again the second optimization problem, i.e.,
where
Preliminary questions
compute the Hessian of
using pseudo-code, recall the principle of Newton methods and quasi-Newton methods.
write here your answers using markdown.
Programming questions
Implement Newton’s method for this problem.
Implement the BFGS algorithm for this problem. For the first iteration, use to initialize the method.
Compare gradient descent, Newton’s algorithm and BFGS through numerical experiments. Evaluate convergence speed, numerical complexity and timings.
# Question 1# Question 2# Question 3