Skip to article frontmatterSkip to article content

Launch in Colab Download .ipynb file

In this notebook, we will explore some least-squares problems, one of the first fundamental optimization problems encountered in this course. We’ll focus on two classical problems:

This first TP involves the practical implementation of least-squares problems using two classical Python libraries:

  1. NumPy: For efficient numerical computations, matrix operations, and solving least-squares problems using built-in linear algebra functions.
  2. Matplotlib: For visualizing the data and displaying / interpreting results.

Running the Notebook

This notebook can be executed in the following environments:

# load necessary dependencies 
import numpy as np
import matplotlib.pyplot as plt

1. Polynomial regression

This first exercise will permit to review basic instructions in Python as well as standard functions from NumPy and Matplotlib. The goal is to estimate the parameters of a polynomial model given experimental (i.e. noisy) data using the least-squares method. We’ll first define the true model, generate some noisy synthetic data, and then estimate the model parameters.

Consider the ground truth polynomial model ytrue(t)=10t3+3t2+2t+1y_{\text{true}}(t) = 10t^3 + 3t^2 + 2t + 1 for tRt\in \mathbb{R}.

Questions

  1. Represent the function ytrue(t)y_{\text{true}}(t) on [1,1][-1, 1].

  2. Given N=10N =10 equispaced time instants t1,,tNt_1, \ldots, t_N over [1,1][-1, 1], generate a noisy vector of measurements yRN\mathbf{y} \in \mathbb{R}^N with entries yn=y(tn)=ytrue(t)+ϵy_n = y(t_n) = y_{\text{true}}(t) + \epsilon, where ϵN(0,1)\epsilon \sim \mathcal{N}(0, 1) is Gaussian. Represent noisy measurements and the ground truth model on the same graph.

  3. From the data y\mathbf{y}, we wish to fit a polynomial model of order d<10d < 10 of the form

    ymodel(t)=x0+x1t+x2t2++xd+1tdy_{\text{model}}(t) = x_0 + x_1 t + x_2 t^2 + \ldots + x_{d+1}t^d

    Write the linear model matrix A\mathbf{A} for an arbitrary order dd. Construct this matrix numerically, given dd and the vector of time instants t=[t1,t2,,tN]\mathbf{t} = [t_1, t_2, \ldots, t_N]. (Hint: check the np.vander function).

  4. Given an integer d<10d <10, write and solve the associated least squares problem. Display the estimated vector of coefficients. Compare the solution obtained by the explicit expression and that obtained with the np.linalg.pinv function.

  5. Represent the estimated polynomial model for several values of dd (say d=1,3,7d=1, 3, 7). Comment. What happens for d=9d=9?

  6. (Bonus) The choice of the degree of the polynomial can be evaluated using a standard criterion such as Akaike Information Criterion, defined as AIC=2p+2logr2\text{AIC} = 2p + 2\log r^2, where pp is the number of parameters to be estimated and rr is the norm of the residual (in this context). Plot this criterion for the different choices of dd. For each value of dd is the minimum attained?

  7. (Bonus bis) Play around with differents values of NN, true polynomial ytrue(t)y_{\text{true}}(t), or noise level σ\sigma!

# question 1
# question 2
# question 3
# question 4
# question 5: 
# question 6: Bonus

2. Denoising

This second problem is the denoising problem which corresponds to the reconstruction of a signal x(t)x(t) from noisy measurements y(t)=x(t)+n(t)y(t) = x(t) + n(t), where n(t)n(t) is some noise. A typical assumption is that noise is Gaussian, i.e., for a given time instant tit_i, n(ti)N(0,σ2)n(t_i) \sim \mathcal{N}(0, \sigma^2) where σ2\sigma^2 is the noise variance.

The denoising problem can be formulated as a least squares problem. Let t1,,tNt_1, \ldots, t_N be NN time instants and denote by x,y\mathbf{x}, \mathbf{y} the vectors encoding the signal and the measurements, respectively. The denoising problem seeks to solve

minxRNyx22\min_{\mathbf{x}\in\mathbb{R}^N} \Vert \mathbf{y}-\mathbf{x}\Vert^2_2

Preliminary questions

  1. What is the solution to the least squares problem x^\hat{\mathbf{x}}?

Usually, we have access to extra information about the signal x\mathbf{x}: e.g. it is non-negative, it belongs to some class of constraints, it exhibits regularity in some representation domain, etc. A classical assumption is that the signal is smooth. At first order, this is encoded by the function

r(x)=n=1N1(xn+1xn)2r(\mathbf{x}) = \sum_{n=1}^{N-1} (x_{n+1}- x_{n})^2
  1. Show that r(x)r(\mathbf{x}) can be expressed as r(x)=D1x22r(\mathbf{x}) = \Vert \mathbf{D}_1\mathbf{x}\Vert^2_2

To take into account the smoothness of the solution into the denoising problem, we formulate a new problem, called regularized least squares or penalized least squares. It reads

minxRNyx22+λr(x),λ0\min_{\mathbf{x}\in \mathbb{R}^N} \Vert \mathbf{y} - \mathbf{x}\Vert^2_2 + \lambda r(\mathbf{x}), \quad \lambda \geq 0

where

  • yx\Vert \mathbf{y}-\mathbf{x}\Vert is called the data fidelity term;
  • r(x)r(\mathbf{x}) is the regularization (or penalty) and λ0\lambda\geq 0 is called the regularization parameter.
  1. Compute the solution to the regularized least squares problem above with r(x)=D1x22r(\mathbf{x}) = \Vert \mathbf{D}_1\mathbf{x}\Vert^2_2. Comment on the uniqueness of the solution. What happens when λ0\lambda \rightarrow 0? When λ\lambda \rightarrow \infty?

Write here your answers to preliminary questions using Markdown

Implementation tasks

  1. Generate a smooth signal x(t)=cos(2πf0t)x(t) = \cos(2\pi f_0 t) where f0=5f_0 = 5 Hz and t[0,1]t \in [0, 1] s. Choose a sampling frequency that satisfies the Shannon-Nyquist criterion.
  2. Generate noisy measurements by corrupting the signal vector x\mathbf{x} by additive white Gaussian noise with SNR = 10 dB. (Recall the formula for the SNR!)
  3. Solve the regularized least squares for different values of λ\lambda, and represent corresponding solutions. Compare it to the ground truth signal.
  4. A common question in penalized optimization problems is to choose the hyperparameter λ\lambda. A classical approach is to use a heuristic called L-curve: for various values of λ\lambda, plot the data fidelity term vs the penalty term in the cost function. From this tool, what would be a good choice of λ\lambda according to you? Why?
  5. Start again the exercise with the second-order penalty function
    r(x)=n=2N1(xn+12xn+xn1)2r(\mathbf{x}) = \sum_{n=2}^{N-1} (x_{n+1}- 2x_{n} + x_{n-1})^2
    Compare your results with the first-order difference penalty.
# question 1. 
# Question 2: 
# question 3: 
# question 4
# question 5