Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio
Lecture notes in computer science2018pp. 392–402
Citations Over TimeTop 10% of 2018 papers
Stanisław Jastrzȩbski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey
Related Papers
- A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima(2021)
- → Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks(2002)49 cited
- A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast.(2020)
- A Diffusion Theory For Minima Selection: Stochastic Gradient Descent Exponentially Favors Flat Minima(2020)