Chapter 5: Matrix Calculus for Machine Learning

Learning Objectives

Apply vector and matrix chain rules
Construct Taylor series function approximations
Design gradient descent algorithm stopping criteria
Solve constrained optimization problems

The objective of Chapter 5 is to introduce important matrix calculus concepts and tools for supporting machine learning algorithm and design. The chapter begins with a review of relevant elementary real analysis concepts. Convergence of sequences of high-dimensional vectors and properties of continuous functions are discussed. Matrix calculus identities are then derived using the vector and matrix chain rules. These matrix calculus identities are then used to derive gradient descent algorithms for linear regression, logistic regression, softmax regression, multilayer perceptrons, and deep learning. The vector Taylor series expansion is then introduced. The Taylor series expansion method is used to show that a gradient descent algorithm that minimizes an objective function modifies the system state vector such that the value of the objective function will decrease at each algorithm iteration provided the stepsize is sufficiently small. Next, explicit criteria for determining if a learning algorithm has converged to a critical point, a local minimizer, or a global minimizer are provided. Finally, the Lagrange Multiplier Theorem is used to derive algorithms implementing: principal components analysis (PCA) unsupervised learning algorithm, optimal linear recoding transformations, the soft margin support vector machine, multilayer perceptron gradient descent, and recurrent neural network gradient descent.

The podcast LM101-083: Ch5: How to Use Calculus to Design Learning Machines provides an overview of the main of this book chapter, some tips for students to help them read this chapter, as well as some guidance to instructors for teaching this chapter to students.