Llama 2: Open Foundation and Fine-Tuned Chat Models H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... arXiv preprint arXiv:2307.09288, 2023 | 8620 | 2023 |
fairseq: A Fast, Extensible Toolkit for Sequence Modeling M Ott, S Edunov, A Baevski, A Fan, S Gross, N Ng, D Grangier, M Auli arXiv preprint arXiv:1904.01038, 2019 | 3174 | 2019 |
Language modeling with gated convolutional networks YN Dauphin, A Fan, M Auli, D Grangier Proceedings of the 34th International Conference on Machine Learning-Volume …, 2017 | 2834 | 2017 |
Hierarchical Neural Story Generation A Fan, M Lewis, Y Dauphin arXiv preprint arXiv:1805.04833, 2018 | 1655 | 2018 |
Bloom: A 176b-parameter open-access multilingual language model BS Workshop, TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, ... arXiv preprint arXiv:2211.05100, 2022 | 1493* | 2022 |
Wizard of Wikipedia: Knowledge-Powered Conversational agents E Dinan, S Roller, K Shuster, A Fan, M Auli, J Weston arXiv preprint arXiv:1811.01241, 2018 | 970 | 2018 |
Beyond english-centric multilingual machine translation A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky, S Goyal, M Baines, ... Journal of Machine Learning Research 22 (107), 1-48, 2021 | 766 | 2021 |
Pay Less Attention with Lightweight and Dynamic Convolutions F Wu, A Fan, A Baevski, YN Dauphin, M Auli arXiv preprint arXiv:1901.10430, 2019 | 678 | 2019 |
No Language Left Behind: Scaling Human-Centered Machine Translation N Team, MR Costa-jussà, J Cross, O Çelebi, M Elbayad, K Heafield, ... | 608* | 2022 |
Reducing Transformer Depth on Demand with Structured Dropout A Fan, E Grave, A Joulin arXiv preprint arXiv:1909.11556, 2019 | 604 | 2019 |
Multilingual Translation from Denoising Pre-Training Y Tang, C Tran, X Li, PJ Chen, N Goyal, V Chaudhary, J Gu, A Fan Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 …, 2021 | 492* | 2021 |
ELI5: Long Form Question Answering A Fan, Y Jernite, E Perez, D Grangier, J Weston, M Auli arXiv preprint arXiv:1907.09190, 2019 | 483 | 2019 |
KILT: a benchmark for knowledge intensive language tasks F Petroni, A Piktus, A Fan, P Lewis, M Yazdani, N De Cao, J Thorne, ... arXiv preprint arXiv:2009.02252, 2020 | 464 | 2020 |
The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation N Goyal, C Gao, V Chaudhary, PJ Chen, G Wenzek, D Ju, S Krishnan, ... Transactions of the Association for Computational Linguistics 10, 522-538, 2022 | 391 | 2022 |
Controllable abstractive summarization A Fan, D Grangier, M Auli arXiv preprint arXiv:1711.05217, 2017 | 328 | 2017 |
The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv preprint arXiv:2407.21783, 2024 | 285 | 2024 |
Nearest Neighbor Machine Translation U Khandelwal, A Fan, D Jurafsky, L Zettlemoyer, M Lewis arXiv preprint arXiv:2010.00710, 2020 | 283 | 2020 |
Training with quantization noise for extreme model compression A Fan, P Stock, B Graham, E Grave, R Gribonval, H Jégou, A Joulin arXiv e-prints, arXiv: 2004.07320, 2020 | 256 | 2020 |
Strategies for Structuring Story Generation A Fan, M Lewis, Y Dauphin arXiv preprint arXiv:1902.01109, 2019 | 239 | 2019 |
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the Web H Schwenk, G Wenzek, S Edunov, E Grave, A Joulin, A Fan Proceedings of the 59th Annual Meeting of the Association for Computational …, 2021 | 221 | 2021 |