Previous | Next --- Slide 62 of 66
Back to Lecture Thumbnails
amilich

Are other optimization algorithms ever used instead? Like coordinate descent, Newton's method, or Newton-Raphson? Each has its own advantages/disadvantages, so I'm wondering if GD is generally best.

szzheng

For those interested, here's a paper that involves an adaptive step size technique: https://arxiv.org/pdf/1412.6980.pdf

An advantage of this is that by increasing the step size when the gradient is very small, gradient descent may converge more quickly.