This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.541

# A review of automatic differentiation and its efficient implementation

Can't access this content? Tell your librarian.

Run time to solve and differentiate a system of algebraic equations. All three solvers deploy Powell's dogleg method, an iterative algorithm that uses gradients to find the root, y*, of a function f = f(y, θ). The standard solver (green) uses an analytical expression for Jy and automatic differentiation (AD) to differentiate the iterative algorithm. The super node solver (blue) also uses an analytical expression for Jy and the implicit function theorem to compute derivatives. The built‐in solver (red) uses AD to compute Jy and the implicit function theorem. The computer experiment is run 100 times and the shaded areas represent the region encompassed by the 5th and 95th quantiles. The plot on the right provides the run time on a log scale for clarity purposes. Plot generated with ggplot2 (Wickham, )
[ Normal View | Magnified View ]
Checkpointing with the store‐all approach. This time, the forward sweep records intermediate values at each checkpoint. The forward sweep can then be rerun between two checkpoints, as opposed to starting from the input
[ Normal View | Magnified View ]
Checkpointing with the recompute‐all approach. In the above sketch, the target function is broken into four segments, using three checkpoints (white nodes). The yellow and red nodes, respectively, represent the input and output of the function. During the forward sweep, we only record when the arrow is thick. The dotted arrow represents a reverse automatic differentiation (AD) sweep. After each reverse sweep, we start a new forward evaluation sweep from the input
[ Normal View | Magnified View ]
Expression graph for the log‐normal density. The above graph is generated by the computer code for Equation 5. Each node represents a variable, labeled v1 through v10, which is calculated by applying a mathematical operator to variables lower on the expression graph. The top node (v10) is the output variable and the gray nodes are the input variables for which we require sensitivities. The arrows represent the flow of information when we numerically evaluate the log‐normal density. (Reprinted with permission from Figure of Carpenter et al. (). Copyright https://arxiv.org/licenses/nonexclusive‐distrib/1.0/license.html)
[ Normal View | Magnified View ]
Run time to solve and differentiate a system of algebraic equations. This time we only compare the solvers which use the implicit function theorem. The “super node + analytical” solver uses an analytical expression for Jy both to solve the equation and compute sensitivities. The built‐in solver uses automatic differentiation (AD) to compute Jy. The former is approximately two to three times faster. The computer experiment is run 100 times and the shaded areas represent the region encompassed by the 5th and 95th quantiles. Plot generated with ggplot2 (Wickham, )
[ Normal View | Magnified View ]

### Browse by Topic

Algorithmic Development > Scalable Statistical Methods