Start by evaluating the distance and exploit the fact that ∇f(x)⋆=0
x(k+1)−x⋆=x(k)−x⋆−[∇2f(x(k))]−1∇f(x(k))=[∇2f(x(k))]−1[∇2f(x(k))(x(k)−x⋆)−(∇f(x(k))−∇f(x⋆))] Recall Taylor’s theorem (Chapter 2) applied to gradients:
∇f(x(k))−∇f(x⋆)=∫01∇2f(x(k)+t(x⋆−x(k)))(x(k)−x⋆)dt We are now ready to bound the second parenthesis in the RHS above. Then
∥∥∇2f(x(k))(x(k)−x⋆)−(∇f(x(k))−∇f(x⋆))∥∥2=∥∥∫01[∇2f(x(k))−∇2f(x(k)+t(x⋆−x(k)))](x(k)−x⋆)dt∥∥2≤∥x(k)−x⋆∥2∫01∥∥∇2f(x(k))−∇2f(x(k)+t(x⋆−x(k)))∥∥dt≤∥x(k)−x⋆∥22∫01Ltdt(Lipschitz-continuity of the Hessian near x⋆)=21L∥x(k)−x⋆∥22 Moreover, ∇2f(x⋆) is non-singular and continuous. Thus there is a radius r>0 such that ∥∇2f(x(k))−1∥≤2∥∇2f(x⋆)−1∥ for every x(k) s.t. ∥x(k)−x⋆∥≤r.
By substitution, we get
∥x(k+1)−x⋆∥2≤2L∥∇2f(x(k))−1∥∥x(k)−x⋆∥22≤=L′L∥∇2f(x⋆)−1∥∥x(k)−x⋆∥22 If we choose x(0) such that ∥x(0)−x⋆∥≤min(r,1/(2L′)) we obtain quadratic convergence to x⋆ as
∥x(k+1)−x⋆∥2≤L′∥x(k)−x⋆∥22≤L′⋅(L′)2∥x(k−1)−x⋆∥24≤L′(L′)2⋯(L′)2k+1∥x(0)−x⋆∥22k+1=(L′)2k+1−1∥x(0)−x⋆∥22k+1 Regarding the gradients, note p(k)=−[∇2f(x(k))]−1∇f(x(k)) and let us observe that
and x(k+1)−x(k)=p(k)∇f(x(k))+∇2f(x(k))p(k)=0 Then,
∥∇f(x(k+1))∥2=∥∇f(x(k+1))−∇f(x(k))−∇2f(x(k))p(k)∥2=∥∥∫01∇2f(x(k)+tp(k))p(k)dt−∇2f(x(k))p(k)∥∥2≤21L∥p(k)∥22≤L′∥∇f(x(k))∥22 which shows that gradient norms converge quadratically.