0 citations0 references

Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio

Lecture notes in computer science2018pp. 392–402

Citations Over TimeTop 10% of 2018 papers

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima(2021)
→ Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks(2002)49 cited
A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast.(2020)
A Diffusion Theory For Minima Selection: Stochastic Gradient Descent Exponentially Favors Flat Minima(2020)