Gradient descent optimizes over-parameterized deep ReLU networks
Machine Learning2019Vol. 109(3), pp. 467–492
Citations Over TimeTop 1% of 2019 papers
Related Papers
- → Analysis of weight initialization methods for gradient descent with momentum(2015)11 cited
- → A Survey on Activation Functions and their relation with Xavier and He\n Normal Initialization(2020)50 cited
- → A Survey on Activation Functions and their relation with Xavier and He Normal Initialization(2020)12 cited
- Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent(2020)