Deformable convolutional networks J Dai*, H Qi*, Y Xiong*, Y Li*, G Zhang*, H Hu, Y Wei (* co-first author) International Conference on Computer Vision, 2017 | 6836 | 2017 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2209 | 2023 |
Picking Winning Tickets Before Training by Preserving Gradient Flow C Wang, G Zhang, R Grosse International Conference on Learning Representations, 2020 | 699 | 2020 |
Benchmarking Model-Based Reinforcement Learning T Wang, X Bao, I Clavera, J Hoang, Y Wen, E Langlois, S Zhang, G Zhang, ... | 473 | 2019 |
Functional Variational Bayesian Neural Networks S Sun*, G Zhang*, J Shi*, R Grosse (* indicates co-first author) International Conference on Learning Representations, 2019 | 308 | 2019 |
Three Mechanisms of Weight Decay Regularization G Zhang, C Wang, B Xu, R Grosse International Conference on Learning Representations, 2019 | 299 | 2019 |
Noisy Natural Gradient as Variational Inference G Zhang*, S Sun*, D Duvenaud, R Grosse (* indicates co-first author) International Conference on Machine Learning, 2018 | 249 | 2018 |
Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks G Zhang, J Martens, R Grosse Advances in Neural Information Processing Systems, 2019 | 150 | 2019 |
Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model G Zhang, L Li, Z Nado, J Martens, S Sachdeva, G Dahl, C Shallue, ... Advances in neural information processing systems, 2019 | 149 | 2019 |
EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis C Wang, R Grosse, S Fidler, G Zhang International Conference on Machine Learning, 2019 | 133 | 2019 |
On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach Y Wang*, G Zhang*, J Ba (* indicates co-first author) International Conference on Learning Representations, 2020 | 117 | 2020 |
Differentiable Compositional Kernel Learning for Gaussian Processes S Sun, G Zhang, C Wang, W Zeng, J Li, R Grosse International Conference on Machine Learning, 2018 | 90 | 2018 |
Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization G Zhang, Y Wang, L Lessard, R Grosse International Conference on Artificial Intelligence and Statistics (AISTATS), 2022 | 63 | 2022 |
An empirical study of stochastic gradient descent with structured covariance noise Y Wen, K Luk, M Gazeau, G Zhang, H Chan, J Ba International Conference on Artificial Intelligence and Statistics, 3621-3631, 2020 | 57* | 2020 |
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise G Zhang, K Hsu, J Li, C Finn, R Grosse Advances in Neural Information Processing Systems, 2021 | 35 | 2021 |
Deep transformers without shortcuts: Modifying self-attention for faithful signal propagation B He, J Martens, G Zhang, A Botev, A Brock, SL Smith, YW Teh arXiv preprint arXiv:2302.10322, 2023 | 33 | 2023 |
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers G Zhang, A Botev, J Martens International Conference on Learning Representations, 2022 | 32 | 2022 |
A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints G Zhang, X Bao, L Lessard, R Grosse Journal of Machine Learning Research, 2021 | 32 | 2021 |
Eigenvalue Corrected Noisy Natural Gradient J Bae, G Zhang, R Grosse Neural Information Processing Systems (Bayesian Deep Learning Workshop), 2018 | 24 | 2018 |
On the suboptimality of negative momentum for minimax optimization G Zhang, Y Wang International Conference on Artificial Intelligence and Statistics, 2021 | 23 | 2021 |