Learning Objectives
- Design Automatic Learning Rate Adjustment Algorithms for Batch Learning
- Design Convergent Batch Learning Algorithms
- Design Classical Gradient Descent Type Optimization Algorithms
The objective of chapter 7 is to provide explicit conditions for ensuring the convergence of a large class of commonly used batch learning algorithms. The Zoutendijk-Wolfe convergence theorem is introduced as a tool for investigating the asymptotic behavior of batch learning algorithms. The change in parameter estimates at each iteration is defined as a stepsize multiplied by a search direction vector. The search direction must satisfy a downhill condition which requires that the search direction vector is less than ninety degrees from the negative gradient descent direction. Rather than choosing a computationally expensive optimal stepsize, the concept of a Wolfe stepsize is introduced that includes an optimal stepsize as a special case but allows for sloppy stepsize choices. The Zoutendijk-Wolfe convergence theorem provides sufficient conditions to ensure a batch learning algorithm with Wolfe stepsize and downhill search direction will converge to a set of critical points of a possibly nonconvex objective function. Convergence of commonly used algorithms used to estimate parameters is analyzed using the Zoutendijk-Wolfe convergence theorem. These algorithms include: linear regression, logistic regression, multilayer perceptrons, gradient descent, Newton-Raphson, Levenberg-Marquardt, limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS), conjugate gradient, block coordinate gradient descent, natural gradient descent, and variants of RMSPROP.
The podcast LM101-085: Ch7: How to Guarantee your Batch Learning Algorithm Converges provides an overview of the main ideas of this book chapter, some tips for students to help them read this chapter, as well as some guidance to instructors for teaching this chapter to students.