LW1: Least-Squares Problems - Mines Nancy

Launch in Colab Download .ipynb file

In this notebook, we will explore some least-squares problems, one of the first fundamental optimization problems encountered in this course. We’ll focus on two classical problems:

Polynomial regression
Denoising a signal using least-squares and regularizers or penalization methods.

This first TP involves the practical implementation of least-squares problems using two classical Python libraries:

NumPy: For efficient numerical computations, matrix operations, and solving least-squares problems using built-in linear algebra functions.
Matplotlib: For visualizing the data and displaying / interpreting results.

Running the Notebook¶

This notebook can be executed in the following environments:

Google Colab: A convenient, cloud-based option that requires no setup. Simply open the notebook in Colab, and you’re ready to run the code.
Locally: You can run the notebook on your local machine using environments like JupyterLab or Jupyter Notebook. You can also run it directly in Visual Studio Code if you have the Jupyter extension. In all cases, ensure you have Python installed along with the required libraries, NumPy and Matplotlib.

# load necessary dependencies 
import numpy as np
import matplotlib.pyplot as plt

1. Polynomial regression¶

This first exercise will permit to review basic instructions in Python as well as standard functions from NumPy and Matplotlib. The goal is to estimate the parameters of a polynomial model given experimental (i.e. noisy) data using the least-squares method. We’ll first define the true model, generate some noisy synthetic data, and then estimate the model parameters.

Consider the ground truth polynomial model $y_{\text{true}}(t) = 10t^3 + 3t^2 + 2t + 1$ for $t\in \mathbb{R}$ .

Questions

Represent the function $y_{\text{true}}(t)$ on $[-1, 1]$ .
Given $N =10$ equispaced time instants $t_1, \ldots, t_N$ over $[-1, 1]$ , generate a noisy vector of measurements $\mathbf{y} \in \mathbb{R}^N$ with entries $y_n = y(t_n) = y_{\text{true}}(t) + \epsilon$ , where $\epsilon \sim \mathcal{N}(0, 1)$ is Gaussian. Represent noisy measurements and the ground truth model on the same graph.
From the data $\mathbf{y}$ , we wish to fit a polynomial model of order $d < 10$ of the form
$y_{\text{model}}(t) = x_0 + x_1 t + x_2 t^2 + \ldots + x_{d+1}t^d$
Write the linear model matrix $\mathbf{A}$ for an arbitrary order $d$ . Construct this matrix numerically, given $d$ and the vector of time instants $\mathbf{t} = [t_1, t_2, \ldots, t_N]$ . (Hint: check the np.vander function).
Given an integer $d <10$ , write and solve the associated least squares problem. Display the estimated vector of coefficients. Compare the solution obtained by the explicit expression and that obtained with the np.linalg.pinv function.
Represent the estimated polynomial model for several values of $d$ (say $d=1, 3, 7$ ). Comment. What happens for $d=9$ ?
(Bonus) The choice of the degree of the polynomial can be evaluated using a standard criterion such as Akaike Information Criterion, defined as $\text{AIC} = 2p + 2\log r^2$ , where $p$ is the number of parameters to be estimated and $r$ is the norm of the residual (in this context). Plot this criterion for the different choices of $d$ . For each value of $d$ is the minimum attained?
(Bonus bis) Play around with differents values of $N$ , true polynomial $y_{\text{true}}(t)$ , or noise level $\sigma$ !

# question 1

# question 2

# question 3

# question 4

# question 5:

# question 6: Bonus

2. Denoising¶

This second problem is the denoising problem which corresponds to the reconstruction of a signal $x(t)$ from noisy measurements $y(t) = x(t) + n(t)$ , where $n(t)$ is some noise. A typical assumption is that noise is Gaussian, i.e., for a given time instant $t_i$ , $n(t_i) \sim \mathcal{N}(0, \sigma^2)$ where $\sigma^2$ is the noise variance.

The denoising problem can be formulated as a least squares problem. Let $t_1, \ldots, t_N$ be $N$ time instants and denote by $\mathbf{x}, \mathbf{y}$ the vectors encoding the signal and the measurements, respectively. The denoising problem seeks to solve

\min_{\mathbf{x}\in\mathbb{R}^N} \Vert \mathbf{y}-\mathbf{x}\Vert^2_2

Preliminary questions

What is the solution to the least squares problem $\hat{\mathbf{x}}$ ?

Usually, we have access to extra information about the signal $\mathbf{x}$ : e.g. it is non-negative, it belongs to some class of constraints, it exhibits regularity in some representation domain, etc. A classical assumption is that the signal is smooth. At first order, this is encoded by the function

r(\mathbf{x}) = \sum_{n=1}^{N-1} (x_{n+1}- x_{n})^2

Show that $r(\mathbf{x})$ can be expressed as $r(\mathbf{x}) = \Vert \mathbf{D}_1\mathbf{x}\Vert^2_2$

To take into account the smoothness of the solution into the denoising problem, we formulate a new problem, called regularized least squares or penalized least squares. It reads

\min_{\mathbf{x}\in \mathbb{R}^N} \Vert \mathbf{y} - \mathbf{x}\Vert^2_2 + \lambda r(\mathbf{x}), \quad \lambda \geq 0

where

$\Vert \mathbf{y}-\mathbf{x}\Vert$ is called the data fidelity term;
$r(\mathbf{x})$ is the regularization (or penalty) and $\lambda\geq 0$ is called the regularization parameter.

Compute the solution to the regularized least squares problem above with $r(\mathbf{x}) = \Vert \mathbf{D}_1\mathbf{x}\Vert^2_2$ . Comment on the uniqueness of the solution. What happens when $\lambda \rightarrow 0$ ? When $\lambda \rightarrow \infty$ ?

Write here your answers to preliminary questions using Markdown

Implementation tasks

Generate a smooth signal $x(t) = \cos(2\pi f_0 t)$ where $f_0 = 5$ Hz and $t \in [0, 1]$ s. Choose a sampling frequency that satisfies the Shannon-Nyquist criterion.
Generate noisy measurements by corrupting the signal vector $\mathbf{x}$ by additive white Gaussian noise with SNR = 10 dB. (Recall the formula for the SNR!)
Solve the regularized least squares for different values of $\lambda$ , and represent corresponding solutions. Compare it to the ground truth signal.
A common question in penalized optimization problems is to choose the hyperparameter $\lambda$ . A classical approach is to use a heuristic called L-curve: for various values of $\lambda$ , plot the data fidelity term vs the penalty term in the cost function. From this tool, what would be a good choice of $\lambda$ according to you? Why?
Start again the exercise with the second-order penalty function
$r(\mathbf{x}) = \sum_{n=2}^{N-1} (x_{n+1}- 2x_{n} + x_{n-1})^2$
Compare your results with the first-order difference penalty.

# question 1.

# Question 2:

# question 3:

# question 4

# question 5

Mines Nancy - Optimization course

Lab Work