Skip to article frontmatterSkip to article content

An important special case of (unconstrained) optimization problems are non-linear least squares:

minimizef0(x):=12j=1mrj(x)2subject toxRn\begin{array}{ll} \minimize &\quad f_0(x):= \dfrac{1}{2} \sum_{j=1}^m r_j(x)^2\\ \st & \quad x \in \mathbb{R}^n \end{array}

where rj:RnRr_j: \mathbb{R}^n\to \mathbb{R} are residual functions. Equivalently, define

R(x)=[r1(x),r2(x),,rm(x)],R(x)= \left[r_1(x),r_2(x), \ldots, r_m(x)\right]^\top,

such that the optimization problem (1) can be written as

minimizef0(x):=R(x)22,R:RnRmsubject toxRn\begin{array}{ll} \minimize &\quad f_0(x):= \Vert R(x)\Vert_2^2, \quad R: \mathbb{R}^n \to \mathbb{R}^m\\ \st & \quad x \in \mathbb{R}^n \end{array}

Exploiting the structure of the problem

Problems (1) and (3) have a specific structure that can be exploited to devise efficient algorithms.

Using the Jacobian matrix allows for a simple expression of the gradient and Hessian of f0(x)=12R(x)22f_0(x) = \frac{1}{2}\Vert R(x)\Vert^2_2. Let us compute the gradient and Hessian of f0f_0:

f0(x)=j=1mrj(x)rj(x)=J(x)R(x)2f0(x)=j=1mrj(x)rj(x)+j=1mrj(x)2rj(x)=J(x)J(x)+j=1mrj(x)2rj(x)\begin{align*} \nabla f_0(x) & = \sum_{j=1}^m r_j(x)\nabla r_j(x) = J(x)^\top R(x)\\ \nabla^2 f_0(x) & = \sum_{j=1}^m \nabla r_j(x)\nabla r_j(x)^\top + \sum_{j=1}^m r_j(x)\nabla^2 r_j(x) \\ & = J(x)^\top J(x) + \sum_{j=1}^m r_j(x)\nabla^2 r_j(x) \end{align*}

Why are these expression interesting? Because, in many applications:

We exploit these properties to construct dedicated nonlinear least squares algorithms, known as the Gauss-Newton method and the Levenberg-Marquardt method, respectively.

Algorithms for nonlinear least squares

The Gauss-Newton method can be seen as a quasi-Newton method with matrix Bk=J(x(k))J(x(k))B_k = J(x^{(k)})^\top J(x^{(k)}) and constant step-size αk=1\alpha_k = 1.

The Levenberg-Marquardt method can also be interpreted as a quasi-Newton method with constant step-size αk=1\alpha_k = 1. The method introduces a regularization parameter λk\lambda_k, which is tuned throughout iterations.

The Levenberg-Marquardt method is one the standard algorithms for non-linear least squares problems. It is implemented in many libraries, such as scipy.optimize.least_squares (use option method='lm').

References
  1. Nocedal, J., & Wright, S. J. (2006). Numerical optimization (Second Edition). Springer.