Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

In this section we review some important concepts about objective functions f:RnRf:\mathbb{R}^n \to \mathbb{R}. We assume throughout that ff is (at least) continuously differentiable.

Convex functions

Recall from Chapter 2 two useful characterizations of convex functions.

Smooth functions

Another important notion is that of smoothness of a function ff, which relates to Lipschitz continuity properties of its gradient f\nabla f.

In particular, functions that are not differentiable, or differentiable with non-continuous derivatives are nonsmooth functions. Below are two important properties of smooth functions. On the other hand, if ff is LL-smooth, then this implies continuous differentiability of ff.

Proof

The proof is adapted from Wright & Recht (2022). Let ff be continuously differentiable. Recall the integral version of Taylor’s theorem

f(x+p)=f(x)+01f(x+γp)pdγf(x + p) = f(x) + \int_0^1 \nabla f(x+ \gamma p )^\top p \mathrm{d}\gamma

for any x,pRnx, p \in \mathbb{R}^n. Then given x,yRnx, y \in \mathbb{R}^n we can build the quantity (using x+p=yx+p = y)

f(y)f(x)f(x)(yx)=01[f(x+γ(yx))f(x)](yx)dγ\begin{align*} f(y) - f(x) - \nabla f(x)^\top (y-x) = \int_0^1 \left[\nabla f(x + \gamma(y-x)) - \nabla f(x)\right]^\top (y-x)\mathrm{d}\gamma \end{align*}

Using LL-smoothness of ff, we bound the integrand as

[f(x+γ(yx))f(x)](yx)f(x+γ(yx))f(x)yx(CS)Lγyx2(L-smoothness)\begin{align*} &\left[\nabla f(x + \gamma(y-x)) - \nabla f(x)\right]^\top (y-x)\\ & \leq \Vert \nabla f(x + \gamma(y-x)) - \nabla f(x) \Vert \cdot \Vert y -x\Vert & \text{(CS)}\\ & \leq L \gamma \Vert y - x \Vert^2 & \text{($L$-smoothness)} \end{align*}

Finally, by integrating the upper bound

01[f(x+γ(yx))f(x)](yx)dγ12Lyx2\int_0^1 \left[\nabla f(x + \gamma(y-x)) - \nabla f(x)\right]^\top (y-x)\mathrm{d}\gamma \leq \frac{1}{2}L\Vert y - x \Vert^2

and thus the desired result

f(y)f(x)+f(x)(yx)+12Lyx2.f(y) \leq f(x) + \nabla f(x)^\top (y-x) + \frac{1}{2}L\Vert y - x \Vert^2.

Noting gx:yf(x)+f(x)(yx)+12Lyx2g_x: y \mapsto f(x) + \nabla f(x)^\top (y-x) + \frac{1}{2}L\Vert y - x \Vert^2 we observe that gx(x)=f(x)g_x(x) = f(x); gx(x)=f(x)\nabla g_x(x) = \nabla f(x); 2gx(x)=LIn\nabla^2 g_x(x) = LI_n. This means that ff is upper bounded by a quadratic that takes, at xx, the same value f(x)f(x) and gradient f(x)\nabla f(x), with Hessian LInL I_n (curvature is LL in all directions).

Remark The notation 2f(x)LIn\nabla^2 f(x) \preceq L I_n means that the matrix LIn2f(x)LI_n -\nabla^2 f(x) is positive semidefinite, or in other terms, the eigenvalues of the Hessian are upper-bounded by LL.

Proof

The proof is adapted from Wright & Recht (2022). We suppose that ff is twice continuously differentiable.

  • Suppose that ff is LL-smooth. Let y=x+αpy = x + \alpha p and using Property 1, one gets

    f(x+αp)f(x)αf(x)pL2α2p2.f(x+\alpha p) - f(x) -\alpha \nabla f(x)^\top p \leq \frac{L}{2}\alpha^2 \Vert p \Vert^2.

    On the other hand, using Taylor’s theorem (second-order mean-value form) one gets

    f(x+αp)f(x)αf(x)p=12α2p2f(x+γαp)p for some γ(0,1)f(x+\alpha p) - f(x) -\alpha \nabla f(x)^\top p = \frac{1}{2}\alpha^2 p^\top \nabla^2 f(x+\gamma \alpha p)p\quad \text{ for some }\gamma \in (0, 1)

    Comparing the two expressions we get the inequality

    p2f(x+γαp)pLp2p^\top \nabla^2 f(x+\gamma \alpha p)p \leq L \Vert p \Vert^2

    In the limit α0\alpha \to 0, we get p2f(x)pLp2p^\top \nabla^2 f(x)p \leq L \Vert p \Vert^2 (using continuity of the Hessian). Taking pp as any of eigenvectors of 2f(x)\nabla^2 f(x) shows that the associated eigenvalues is bounded by LL. Hence, 2f(x)LIn\nabla^2 f(x) \preceq L I_n.

  • Suppose that LIn2f(x)LIn-LI_n \preceq \nabla^2 f(x) \preceq LI_n for all xx. As a result, the operator norm 2f(x)2=max(λ1,,λn)\Vert\nabla^2 f(x)\Vert_2 = \max(\vert \lambda_1\vert, \ldots, \vert \lambda_n\vert) where the λi\lambda_i are the eigenvalues of 2f(x)\nabla^2 f(x). Since by assumptions all eigenvalues are smaller than LL (in absolute value), we get 2f(x)2L\Vert \nabla^2 f(x)\Vert_2 \leq L. Now, recall another version of Taylor’s theorem, i.e.,

    f(x+p)=f(x)+012(x+γp)pdγ\nabla f(x+p) = \nabla f(x) + \int_0^1 \nabla^2 (x+\gamma p)p \mathrm{d}\gamma

    We can apply directly this result to bound f(y)f(x)\Vert \nabla f(y)-\nabla f(x)\Vert as

    f(y)f(x)=012(x+γ(yx))(yx)dγ(Taylor’s theorem)012(x+γ(yx))yxdγ01Lyxdγ(bounded operator norm)=Lyx\begin{align*} \Vert \nabla f(y)-\nabla f(x)\Vert &= \left\Vert \int_0^1 \nabla^2 (x+\gamma (y-x))(y-x) \mathrm{d}\gamma \right\Vert & \text{(Taylor's theorem)}\\ & \leq \int_0^1 \left\Vert \nabla^2 (x+\gamma (y-x))\right\Vert\Vert y-x \Vert \mathrm{d}\gamma &\\ &\leq \int_0^1 L \Vert y -x \Vert \mathrm{d}\gamma& \text{(bounded operator norm)}\\ & = L\Vert y -x \Vert& \end{align*}

    which shows that ff is LL-smooth.

Strong convexity

Remarks

The following properties are direct consequences of strong convexity and / or smoothness applied to convex functions.

This result shows that Hessian eigenvalues are lower-bounded by mm.

Proof
  • \Rightarrow. Suppose that ff is mm-strongly convex. For any x,pRNx, p \in \mathbb{R}^N and α>0\alpha > 0, Taylor’s theorem gives

    f(x+αp)=f(x)+αf(x)p+12α2p2f(x+γαp)p for some γ(0,1).f(x+\alpha p) = f(x) + \alpha \nabla f(x)^\top p + \frac{1}{2}\alpha^2 p^\top \nabla^2 f(x+\gamma \alpha p) p \text{ for some }\gamma \in (0, 1).

    Using the strong convexity property, one gets

    f(x+αp)f(x)+αf(x)p+m2α2p2f(x+\alpha p) \geq f(x) + \alpha \nabla f(x)^\top p + \frac{m}{2}\alpha^2\Vert p\Vert^2

    Plugging the two statements into one another, one gets the inequality

    p2f(x+γαp)pmp2p^\top \nabla^2 f(x+\gamma \alpha p) p \geq m \Vert p\Vert^2

    As before, taking the limit α0\alpha \rightarrow 0, one has p2f(x)pmp2p^\top \nabla^2 f(x) p \geq m \Vert p\Vert^2 and thus 2f(x)mIn\nabla^2 f(x) \succeq m I_n.

  • \Leftarrow. Suppose that 2f(x)mIn\nabla^2 f(x) \succeq m I_n. Using once again Taylor’s theorem,

    f(y)=f(x)+f(x)(yx)+12(yx)2f(x+γ(yx))(yx) for some γ(0,1).f(y) = f(x) + \nabla f(x)^\top (y-x) + \frac{1}{2}(y-x)^\top \nabla^2 f(x + \gamma (y-x)) (y-x)\text{ for some }\gamma \in (0, 1).

    Since 2f(x)mIn\nabla^2 f(x) \succeq m I_n this means that 2f(x)mIn0\nabla^2 f(x)- m I_n\succeq 0, hence for any z,xRNz, x\in \mathbb{R}^N one has z(2f(x)mIn)z0z^\top (\nabla^2 f(x)- m I_n)z \geq 0, i.e., z2f(x)zmz2z^\top \nabla^2 f(x)z \geq m \Vert z\Vert^2. Thus,

    (yx)2f(x+γ(yx))(yx)myx2(y-x)^\top \nabla^2 f(x + \gamma (y-x)) (y-x) \geq m \Vert y -x \Vert^2

    and

    f(y)f(x)+f(x)(yx)+12myx2f(y) \geq f(x) + \nabla f(x)^\top (y-x) + \frac{1}{2}m \Vert y -x\Vert^2

    which shows that ff is mm-strongly convex.

Another property is the following.

Proof

Let ff be a twice differentiable convex function.

  • \Rightarrow suppose that ff is LL-smooth. Then by Property 2, 2f(x)LIn\nabla^2 f(x) \preceq L I_n. Moreover, since ff is convex, 2f(x)0\nabla^2 f(x)\succeq 0, so that 02f(x)LIn0\preceq \nabla^2 f(x)\preceq L I_n.

  • \Leftarrow Suppose that 02f(x)LIn0\preceq \nabla^2 f(x)\preceq L I_n. Then, in particular LIn2f(x)LIn-LI_n\preceq \nabla^2 f(x)\preceq L I_n. By Property 2, ff is LL-smooth.

Summary: smoothness and strong convexity

The following numerical example illustrates how for a given function ff, its properties (convexity, smoothness, strong convexity) permit to “bound” the function on its domain by simple (linear or quadratic) functions.

Let us consider f:x12x2f: x \mapsto \frac{1}{2}x^2. The function is obviously convex, LL-smooth (for any L1L \geq 1) and mm-strongly convex (for any 0<m10< m \leq 1). Then we can display the upper and lower bounds induced by considering the smoothness, convexity and strong convexity alone - or by combining them. These bounds become ultimately important when one wants to characterize convergence properties of optimization algorithms.

Source
import numpy as np
import matplotlib.pyplot as plt
import palettable as pal
from ipywidgets import interact, FloatSlider

cmap = pal.colorbrewer.qualitative.Paired_4.mpl_colors

myblue = cmap[1]
mygreen = cmap[3]
def f(x):
    return 0.5*x**2

def gradf(x):
    return x

def upper_bound(x0, x, L):
    y = f(x0) + gradf(x0)*(x-x0) + 0.5*L*(x-x0)**2
    return y

def lower_bound(x0, x, m):
    y = f(x0) + gradf(x0)*(x-x0) + 0.5*m*(x-x0)**2
    return y

x = np.linspace(-1,2, 100)
L = 1.5
m = 0.6

def plot_bounds(x0):
    fig, ax = plt.subplots(figsize=(12, 8), ncols=2, nrows=2, sharex=True, sharey=True)

    # L-smoothness
    ax[0, 0].plot(x, f(x), color=myblue, linewidth=2)
    ax[0, 0].plot(x, upper_bound(x0, x, L), color=mygreen, linewidth=2)
    ax[0, 0].plot(x, upper_bound(x0, x, -L), color=mygreen, linewidth=2)
    ax[0, 0].fill_between(x, upper_bound(x0, x, -L),\
        upper_bound(x0, x, L), alpha=0.1, color=mygreen)
    ax[0, 0].scatter(x0, f(x0), marker='o', color='k', zorder=10)

    # strong convexity
    ax[0, 1].plot(x, f(x), color=myblue, linewidth=2)
    ax[0, 1].plot(x, lower_bound(x0, x, m), color=mygreen, linewidth=2)
    ax[0, 1].fill_between(x, lower_bound(x0, x, m), 2, alpha=0.1, color=mygreen)
    ax[0, 1].scatter(x0, f(x0), marker='o', color='k', zorder=10)

    # L-smooth, convex
    ax[1, 0].plot(x, f(x), color=myblue, linewidth=2)
    ax[1, 0].plot(x, upper_bound(x0, x, L), color=mygreen, linewidth=2)
    ax[1, 0].plot(x, upper_bound(x0, x, 0), color=mygreen, linewidth=2)
    ax[1, 0].fill_between(x, upper_bound(x0, x, 0),\
        upper_bound(x0, x, L), alpha=0.1, color=mygreen)
    ax[1, 0].scatter(x0, f(x0), marker='o', color='k', zorder=10)

    # L-smooth, m-strg convex
    ax[1, 1].plot(x, f(x), color=myblue, linewidth=2)
    ax[1, 1].plot(x, upper_bound(x0, x, L), color=mygreen, linewidth=2)
    ax[1, 1].plot(x, lower_bound(x0, x, m), color=mygreen, linewidth=2)
    ax[1, 1].fill_between(x, lower_bound(x0, x, m),\
        upper_bound(x0, x, L), alpha=0.1, color=mygreen)
    ax[1, 1].scatter(x0, f(x0), marker='o', color='k', zorder=10)

    # clean
    for axis in np.ravel(ax):
        axis.xaxis.set_visible(False)
        axis.yaxis.set_visible(False)
        for side in ['top','right','bottom','left']:
            axis.spines[side].set_visible(False)

    fig.tight_layout()
    ax[0, 0].set_ylim(-.2, 1)
    labelsize=20
    ax[0, 0].set_title('$L$-smooth', size=labelsize, y=0.9)
    ax[0, 1].set_title('$m$-strongly convex', size=labelsize, y=0.9)
    ax[1, 0].set_title('$L$-smooth, convex', size=labelsize, y=0.9)
    ax[1, 1].set_title('$L$-smooth, $m$-strongly convex', size=labelsize, y=0.9)
    plt.show()

interact(plot_bounds, x0=FloatSlider(value=0.5, min=-1, max=2, step=0.01, description='$x_0$'));
<Figure size 1200x800 with 4 Axes>
Loading...

Assessing convergence

The kind of convergence guarantees strongly depend on the regularity properties (convexity, smoothness, strong convexity) of the objective ff:

In some cases, both type of convergence can be related.

References
  1. Wright, S. J., & Recht, B. (2022). Optimization for data analysis. Cambridge University Press.