This text explains automatic differentiation (AD), a method for calculating derivatives of computer-programmable functions. AD utilizes the chain rule, employing forward and reverse modes. Reverse mode AD generalizes backpropagation, handling multiple outputs. A strong understanding of the chain rule is crucial for comprehending reverse mode AD. The explanation begins with linear chain graphs, using the sigmoid function as an example. The process involves decomposing the function into a sequence of primitive operations, represented as a computational graph. Reverse mode AD applies the chain rule backward through this graph, starting from the final output and propagating derivatives to the inputs. The text then extends the concept to general directed acyclic graphs (DAGs), incorporating functions with multiple inputs and outputs, using Jacobian matrices and the multivariate chain rule. The explanation covers scenarios with fan-in and fan-out nodes, detailing how derivatives are calculated and propagated. Finally, a complete example using a more complex function illustrates the application of reverse mode AD to a general DAG.
eli.thegreenplace.net
eli.thegreenplace.net
