3.4 Back to least squares: existence, uniqueness, and expression of solutions
Given y ∈ R m y \in \mathbb{R}^m y ∈ R m and A ∈ R m × n A\in \mathbb{R}^{m\times n} A ∈ R m × n a general linear least squares problem is written as
minimize ∥ y − A x ∥ 2 2 subject to x ∈ R n \begin{array}{ll}
\minimize& \Vert y - A x\Vert_2^2\\
\st & x \in \mathbb{R}^n
\end{array} minimize subject to ∥ y − A x ∥ 2 2 x ∈ R n As we already saw , this is an unconstrained quadratic program with Q = A ⊤ A Q = A^\top A Q = A ⊤ A and p = A ⊤ y p = A^\top y p = A ⊤ y .
Setting the gradient to zero yields the so-called normal equations :
A ⊤ A x = A ⊤ y A^\top Ax = A^\top y A ⊤ A x = A ⊤ y Existence of solutions ¶ A few observations are in order:
we have Q = A ⊤ A Q = A^\top A Q = A ⊤ A , which shows that Q ⪰ 0 Q \succeq 0 Q ⪰ 0 . we have in addition p = A ⊤ y ∈ Im Q p = A^\top y \in \operatorname{Im} Q p = A ⊤ y ∈ Im Q . To see this, observe that Im A T A = Im A ⊤ \operatorname{Im} A^T A = \operatorname{Im} A^\top Im A T A = Im A ⊤ , since ker A = ker A ⊤ A \ker A= \ker A^\top A ker A = ker A ⊤ A and ker A = ( Im A ⊤ ) ⊥ \ker A = (\operatorname{Im}A^\top)^\perp ker A = ( Im A ⊤ ) ⊥ . Comparing with all possible cases for QPs, we only have two choices:
either Q ≻ 0 Q \succ 0 Q ≻ 0 , and there is a unique solution; either Q ⪰ 0 Q \succeq 0 Q ⪰ 0 and λ min ( Q ) = 0 \lambda_{\min}(Q) = 0 λ m i n ( Q ) = 0 . Since p ∈ Im Q p\in \operatorname{Im}Q p ∈ Im Q , there is an infinite number of solutions. Typology of solutions ¶ The rank of A A A determines the number of solutions:
If rank A = n \rank A = n rank A = n . This means that A A A is full column rank (n n n linearly independent columns).
Then rank A ⊤ A = n \rank A^\top A = n rank A ⊤ A = n and thus Q = A ⊤ A Q = A^\top A Q = A ⊤ A is invertible.
The solution is unique and reads
x ^ = Q − 1 p = ( A ⊤ A ) − 1 A ⊤ y \hat{x} = Q^{-1}p = \left(A^\top A\right)^{-1}A^\top y x ^ = Q − 1 p = ( A ⊤ A ) − 1 A ⊤ y
This case is also referred to as the overdetermined case (m ≥ n m \geq n m ≥ n ). If rank A < n \rank A < n rank A < n , the matrix A ⊤ A A^\top A A ⊤ A is singular and there are infinitely many solutions of the form
x ^ = x 0 + n , where A x 0 = y , n ∈ ker A \hat{x} = x_0 + \mathbf{n}, \quad \text{where } Ax_0 = y,\, \mathbf{n} \in \ker A x ^ = x 0 + n , where A x 0 = y , n ∈ ker A
This case is referred to as the underdetermined case. About the Moore-Penrose pseudo-inverse ¶ Consider a linear system of equations y = A x y = Ax y = A x . Often the solution to this system is denoted by
x ^ = A † y \hat{x} = A^\dagger y x ^ = A † y where A † A^\dagger A † is the Moore-Penrose pseudo-inverse of A A A . While it relates to least squares problems, they are some key subtleties to keep in mind:
When rank A = n \rank A = n rank A = n (and thus m ≥ n m \geq n m ≥ n ), the pseudo-inverse is defined as
A † = ( A ⊤ A ) − 1 A ⊤ A^\dagger = \left(A^\top A\right)^{-1} A^\top A † = ( A ⊤ A ) − 1 A ⊤
which matches the solution to the least squares problem with objective ∥ y − A x ∥ 2 2 \Vert y - Ax\Vert_2^2 ∥ y − A x ∥ 2 2 . When rank ( A ) = m < n \operatorname{rank}(A) = m < n rank ( A ) = m < n (underdetermined), A † A^\dagger A † gives the minimum-norm solution to the system of equations y = A x y = Ax y = A x . Its expression differs and will be discussed in later lectures. A † A^\dagger A † is a left inverse: A † A = I n A^\dagger A = I_n A † A = I n If m = n m = n m = n and A A A is invertible, then A † = A − 1 A^\dagger = A^{-1} A † = A − 1