Self-attentive model for headline generation D Gavrilov, P Kalaidin, V Malykh Advances in Information Retrieval: 41st European Conference on IR Research …, 2019 | 72 | 2019 |
Learn your reference model for real good alignment A Gorbatovski, B Shaposhnikov, A Malakhov, N Surnachev, Y Aksenov, ... arXiv preprint arXiv:2404.09656, 2024 | 25 | 2024 |
Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement Learning E Lagutin, D Gavrilov, P Kalaidin Proceedings of the 16th Conference of the European Chapter of the …, 2021 | 18 | 2021 |
PALBERT: Teaching ALBERT to Ponder N Balagansky, D Gavrilov Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 14002 …, 2022 | 14 | 2022 |
Diffusion Language Models Generation Can Be Halted Early SM Lo Cicero Vaina, N Balagansky, D Gavrilov arXiv e-prints, arXiv: 2305.10818, 2023 | 6* | 2023 |
Classifiers are better experts for controllable text generation A Sitdikov, N Balagansky, D Gavrilov, A Markov arXiv preprint arXiv:2205.07276, 2022 | 6 | 2022 |
Linear transformers with learnable kernel functions are better in-context models Y Aksenov, N Balagansky, SMLC Vaina, B Shaposhnikov, A Gorbatovski, ... arXiv preprint arXiv:2402.10644, 2024 | 5 | 2024 |
Linear interpolation in parameter space is good enough for fine-tuned language models M Rofin, N Balagansky, D Gavrilov arXiv preprint arXiv:2211.12092, 2022 | 3 | 2022 |
Mechanistic Permutability: Match Features Across Layers N Balagansky, I Maksimov, D Gavrilov arXiv preprint arXiv:2410.07656, 2024 | 2 | 2024 |
Ahead-of-Time P-Tuning D Gavrilov, N Balagansky arXiv preprint arXiv:2305.10835, 2023 | 2 | 2023 |
Weight squeezing: Reparameterization for extreme compression and fast inference C Artem, G Daniil, B Nikita, K Pavel arXiv: 2010.06993, 2020 | 2 | 2020 |
You Do Not Fully Utilize Transformer's Representation Capacity G Gerasimov, Y Aksenov, N Balagansky, V Sinii, D Gavrilov arXiv preprint arXiv:2502.09245, 2025 | | 2025 |
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models D Laptev, N Balagansky, Y Aksenov, D Gavrilov arXiv preprint arXiv:2502.03032, 2025 | | 2025 |
The Differences Between Direct Alignment Algorithms are a Blur A Gorbatovski, B Shaposhnikov, V Sinii, A Malakhov, D Gavrilov arXiv preprint arXiv:2502.01237, 2025 | | 2025 |
Diffusion Language Models Generation Can Be Halted Early SMLC Vaina, N Balagansky, D Gavrilov arXiv preprint arXiv:2305.10818, 2023 | | 2023 |
FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks M Zubkov, D Gavrilov arXiv preprint arXiv:2202.11364, 2022 | | 2022 |