Gradient rules
Cheat sheet to differentiate expressions with the \( \nabla \) operator to compute gradients of various functions.
\( \newcommand{\dotprod}[2]{ \langle #1 \cdot #2 \rangle } \)
Usual operations
Sum rule, Product rule, Division rule, Scalar rule.
\(f\) and \(g\) both scalar functions
\(g: \mathbb R^n \rightarrow \mathbb R\)
\(f: \mathbb R^n \rightarrow \mathbb R\)
$$ \begin{array}{lcl} \nabla [ f + g ] & = &\nabla f + \nabla g \\ \nabla [ f . g ] & = & \nabla f . g + f . \nabla g \\ \nabla \left [ \frac{f}{g} \right ] & = &\frac{\nabla f . g – f . \nabla g}{g^2} \\ \nabla [ \alpha . f ] & = & \alpha . \nabla f \end{array} $$
Gradient of the norm
The norm is a scalar function, \( \| \|: \mathbb R^n \rightarrow \mathbb R\)
\( \vec x \) is a vector of dimension \(n\):
\( \{x_1, \cdots, x_n\} \)
$$\nabla \| \vec {x} \| = \frac{ \vec {x}}{ \| \vec {x} \|} $$ If we re-write the norm as \( \| \| = norm() \) it may be more legible to some people: $$ norm(x_1, \cdots, x_n) = \frac{ \{x_1, \cdots, x_n\} } { norm(x_1, \cdots, x_n) } $$
Related: gradient of the squared norm (i.e. vector dot product).
The squared norm is a scalar function, \( \| \|^2: \mathbb R^n \rightarrow \mathbb R\)
$$\nabla \Big [ \| \vec {x} \|^2 \Big ] = \nabla \dotprod{\vec x }{ \vec x} = 2 . \vec {x} $$
Not to be confused with when we compose with a vectored-valued function:
\( \| \vec u(t) \|^2: \mathbb R \rightarrow\mathbb R^n \rightarrow \mathbb R\)
\(\Big[\| \vec u(t) \|^2\Big]_t = \Big[\dotprod{\vec u}{\vec u}\Big]_t = 2 \dotprod{\vec u}{\vec u'}\)
Gradient of a matrix
With \( M \) a \( n \times n \) matrix:
$$\nabla [ M \vec {x} ] = M $$Likewise for rigid transformations (rotations \( M \in \mathbb R^{3 \times 3}\) and translations \(\vec t \in \mathbb R^3\)):
$$\nabla [ M \vec {x} + \vec t ] = M $$If, and only if $M$ is symetric:
Chain rules
With \(s: \mathbb R \rightarrow \mathbb R\) univariate and \(f: \mathbb R^n \rightarrow \mathbb R\) multivariate real valued the operation boils down to an uniform scale of the gradient:
$$ \nabla \left [ s( f(\vec {x}) ) \right ] = s’( f(\vec {x}) ) \nabla f(\vec {x}) $$
Similar to the above consider \(s: \mathbb R^n \rightarrow \mathbb R\) and plug in each parameter a scalar function \(f_n: \mathbb R^n \rightarrow \mathbb R\). The gradient is now a weighted sum of \( \nabla f_n \):
$$ \begin{array}{lcl}
\nabla \left [ s \left( f_1(\vec {p}), \cdots, f_n(\vec {p}) \right) \right ] & = & \frac{ \partial s( f_1(\vec {p}) )}{\partial x_1} . \nabla f_1(\vec {p}) + \cdots + \frac{ \partial s( f_n(\vec {p}) )}{\partial x_n} . \nabla f_n(\vec {p}) \\
\text{(or expressed as a dot product)} & = & \left [ \nabla f_1(\vec {p}), \cdots, \nabla f_n(\vec {p}) \right ] . \nabla s( f_1(\vec p), \cdots, f_n(\vec p))
\end{array}
$$
With \(m: \mathbb R^n \rightarrow \mathbb R^n\) deformation map and \(f: \mathbb R^n \rightarrow \mathbb R\) multivariate scalar function the operation boils down to transform the gradient with a matrix: $$ \nabla \left [ f( m(\vec {x}) ) \right ] = \mathbf{J}\left [ m(\vec {x}) \right ]^\mathsf{T} \nabla f(m(\vec {x}))$$
Where \( \mathbf{J}\left [ m(\vec {x}) \right ]^\mathsf{T} \) denotes the transpose of the \(n \times n \) Jacobian matrix.
Related Jacobian rules
Jacobian of a linear function is the identity matrix \( I \) times the slope \( a \): $$ J[a \vec {x} + c] = a.I $$
Vector-valued function differentiation
Vector-valued functions or parametric functions have the following form: \(\vec g(t) = \{g_1(t), \cdots, g_n(t) \}^T \)
Chain rule
A univariate differentiation can lead to the use of \( \nabla \):
\(f: \mathbb R^n \rightarrow \mathbb R\) multivariable scalar function
\(\vec g: \mathbb R \rightarrow \mathbb R^n\) a parametric function with components \(\vec g(x) = \{g_1, \cdots, g_n \}^T \):
$$ \begin{array}{lcl} \left [ f(\vec g(x)) \right ] ' & = & \langle\nabla f( \vec g(x))^T . \vec {g’(x)}\rangle \\ \text{(otherwise said)} & = & \frac{ \partial f(\vec {g}) }{\partial x_1} . \frac{ \partial g_1(x) }{\partial x} + \cdots + \frac{ \partial f(\vec {g}) }{\partial x_n} . \frac{ \partial g_n(x) }{\partial x} \end{array} $$
In short, we do the dot product between the gradient of \( f \) and the speed of \( g \)
Example: differentiate the norm of the position vector
Consider, \(\| g(t) \|: \mathbb R \rightarrow \mathbb R^n \rightarrow \mathbb R\) where:
\(\| \|: \mathbb R^n \rightarrow \mathbb R\) is a multivariable scalar function
\(\vec g: \mathbb R \rightarrow \mathbb R^n\) is a parametric function,
Usual operations
Sum rule, Product rule, Division rule, Scalar rule but also dot product rule and cross product rule.
\(\vec u: \mathbb R \rightarrow \mathbb R^n\)
\(\vec v: \mathbb R \rightarrow \mathbb R^n\)
Two vector-valued (\( \cong \)parametric) functions:
$$ \newcommand{annotation}[1]{ ~~~~~~\raise 0.8em {\color{grey}\scriptsize{ \text{#1} }} } \begin{array}{lcl} (\vec u + \vec v)' & = & \vec u' + \vec v' & \annotation{sum rule (component wise)} \\ (\vec u / \vec v)' & = & \frac{\vec u' . \vec v + \vec v' . \vec u }{ v^2 } & \annotation{division rule (component wise)} \\ (\alpha . \vec u)' & = & \alpha \vec u' & \annotation{scalar rule} \\ \dotprod{ \vec u }{ \vec v }' & = & \dotprod{\vec u' }{ \vec v} + \dotprod{\vec v' }{ \vec u } & \annotation{dot product rule} \\ \langle \vec u \times \vec v \rangle' & = & \vec u' \times \vec v + \vec u \times \vec v' & \annotation{cross product rule (warning: non-commutative!) } \\ \end{array} $$
\(f: \mathbb R \rightarrow \mathbb R\)
The usual univariate chain rule holds:
$$ \newcommand{annotation}[1]{ ~~~~~~\raise 0.8em {\color{grey}\scriptsize{ #1 }} } \Big [\vec u \big (f(x) \big ) \Big]' = \vec u'\big(f(x)\big) . f'(x) $$
Component-wise division rule:
$$ \left [ {\vec u(t) } / {f(t)} \right ]' = (\vec u' . f - f' . \vec u) / f^2 $$
Property: constant norm equates orthogonal derivative
If the norm of the vector-valued function \(u\) is constant, then its derivative \(u'\) is perpendicular to it and the converse is also true:
$$ \| \vec u(t) \| = c \quad \Leftrightarrow \quad \vec u \perp \vec u' \annotation{c \in \mathbb R } $$
This also implies \(u\) is a circle, however, if \(u\) describes a circle it does not imply the above (e.g. a circle not centered at the origin).
When the norm is constant, the squared norm is constant as well, and can be written as a dot product: $$ \left \{ \begin{matrix} \| \vec u(t) \| & = & c \\ \| \vec u(t) \|^2 & = & c \\ \dotprod{u(t) }{ u(t) } & = & c \\ \end{matrix} \right . \quad \Leftrightarrow \quad \left . \begin{matrix} \vec u \perp \vec u{\color{purple}'} \\ \langle u(t) . u{\color{purple}'}(t) \rangle & = & 0 \\ \end{matrix} \right . $$
Differentiate the dot product of \(u(t)\) over itself: $$ \begin{aligned} \Big( u(t) . u(t) \Big)' & = \Big(c \Big )' \\ u'(t) . u(t) + u(t) . u'(t) & = 0 \annotation{\text{apply dot product rule}} \\ 2(u' . u) & = 0 \\ u' . u & = 0 \\ 😀\\ \end{aligned} $$ A null dot product \(u' . u\) means \(u'\) and \(u\) are perpendicular \( u \perp u' \). In physics, it is well known that constant velocity \(\| v \| = c\) implies a orthogonal acceleration \(v'\)! In which case, only the direction of the velocity can change, not its intensity. In the other hand, the acceleration's intensity will effect at which rate the velocity's orientation changes. More discussed here
Some links
Wikipedia:
- Gradient as a derivative
- Vector Calculus identities
- Vector algebra relations
Others:
- Multivariate / multivariable chain rule
- Vector-valued functions derivatives
- List of uni-variate differentiation rules
No comments