Michael Noukhovitch
Hi, I’m a PhD candidate in artificial intelligence at Mila associated with Université de Montréal and supervised by Aaron Courville. I finished my Master’s there in 2019 and before that I graduated in 2017 with a Bachelor’s in Software Engineering from the University of Waterloo, including an exchange term at Lund University in Sweden.
My research goal is to learn how to use language efficiently and effectively for complex interactions with humans and AI. My interests span reinforcement learning, NLP, competitive multi-agent games, LLM efficiency, constrastive learning, and more.
Papers #
- Olmo 3, 3.1 Olmo Team, Allen Institute for AI core contributor, lead RL-Zero Released Nov 2025 · Repro Code
- Learning Robust Social Strategies with Large Language Models D Piche, M Muqeeth, M Aghajohari, J Duque, M Noukhovitch, A Courville under review
- Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models S Lavoie, M Noukhovitch, A Courville NeurIPS 2025 · Code · Podcast Discussion
- Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models M Noukhovitch, S Huang, S Xhonneux, A Hosseini, R Agarwal, A Courville ICLR 2025 · Code · Recorded Talk
- The N+ implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization S Huang, M Noukhovitch, A Hosseini, K Rasul, W Wang, L Tunstall CoLM 2024 · Code
- Learning Multi-Agent Communication with Contrastive Learning Yat Long Lo, Biswa Sengupta, Jakob Nicolaus Foerster, M Noukhovitch ICLR 2024
- Language Model Alignment with Elastic Reset M Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville NeurIPS 2023 · Code
- Simplicial Embeddings in Self-Supervised Learning and Downstream Classification Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Ankit Vani, M Noukhovitch, Kenji Kawaguchi, Aaron Courville ICLR 2023 · Spotlight (Top 25%) · Code
- Countering Language Drift with KL Regularization M Noukhovitch, Samuel Lavoie, Issam H. Laradji, Douwe Kiela, Florian Strub, Aaron Courville InterNLP Workshop @ NeurIPS 2022
- Competition exacerbates Language Drift M Noukhovitch, Aaron Courville, Issam H. Laradji Machine Learning and the Evolution of Language Workshop @ JCoLE 2022
- Pretraining Representations for Data-Efficient Reinforcement Learning Max Schwarzer, Nitarshan Rajkumar, M Noukhovitch, Ankesh Anand, Laurent Charlin, Devon Hjelm, Philip Bachman, Aaron Courville NeurIPS 2021 · Code
- Emergent Communication under Competition M Noukhovitch, Travis LaCroix, Angeliki Lazaridou, Aaron Courville AAMAS 2021 · Circular Game · Negotiation Game · Talk & Slides
- Emergence of Communication with Selfish Agents M Noukhovitch, Travis LaCroix, Angeliki Lazaridou, Aaron Courville EVOLANG 13 · Short paper, Oral
- Considering Assumptions of Emergent Communication M Noukhovitch Montreal AI Symposium 2020 · Short paper, Poster
- Emerging Communication between Competitive Agents M Noukhovitch Master’s Thesis, 2020
- Systematic Generalization: What Is Required and Can It Be Learned? Dzmitry Bahdanau, Shikhar Murty, M Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville ICLR 2019 · Code
- Selective Emergent Communication with Partially Aligned Agents M Noukhovitch, Aaron Courville Emergent Communication Workshop @ NeurIPS 2018
- Oríon: Experiment Version Control for Efficient Hyperparameter Optimization Christos Tsirigotis, Xavier Bouthillier, François Corneau-Tremblay, Peter Henderson, Reyhane Askari, Samuel Lavoie-Marchildon, Tristan Deleu, Dendi Suhubdy, M Noukhovitch, Frédéric Bastien, Pascal Lamblin Workshop on Automatic Machine Learning @ ICML 2018 · Code
- Commonsense mining as knowledge base completion? A study on the impact of novelty Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, M Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung NAACL 2018 Workshop on New Forms of Generalization in Deep Learning and Natural Language Processing · Code
Projects #
- Open-Instruct pytorch · vllm · transformers Open-sourcing SOTA RL for LLM techniques with concurrency: asynchronous RLHF, active sampling, in-flight updates
- Huggingface TRL pytorch · transformers Open-source contributions to a great RLHF library, correcting PPO and GRPO implementations
- Oríon Python Mila’s asynchronous distributed hyperparameter optimization for deep neural networks.
Experience #
- Allen Institute for AI — Research Intern pytorch · vllm · open-instruct RL training of Olmo 3 with Nathan Lambert, helping build infra and leading the RL-Zero effort to make a SOTA, fully-open, RL reasoning benchmark
- ServiceNow Research — Visiting Researcher pytorch · transformers NLP research with Issam Laradji, semi-supervised learning for dialogue state tracking and improving RLHF robustness
- Meta AI — Research Intern pytorch · fairseq Scaling NLP with Douwe Kiela, improving language finetuning for translation and interaction and scaling to larger models
- ElementAI — Research Intern pytorch · transformers Text-to-SQL with Dzmitry Bahdanau and Harm de Vries, early language model finetuning and building infra at a fun Montreal startup
- NextAI — Scientist in Residence
Consulted the 2019 Montreal cohort on machine learning for water data, supply chains, dark web tracking, and dental information - Mila — Research Intern Python · Theano Backpropagation through stochastic discrete neurons with Yoshua Bengio, applying novel ideas to GANs and word embeddings
- Google Research — Software Engineering Intern Python · TensorFlow · Blender Building deep baselines and pipelines with Wei Hua, designing benchmarks and building deep learning CV models for motion tracking
- Premise — Social Capital Fellowship Intern Python · Java · Scala · Spark Backend and data engineering for data analysis, an Android app, and server code to enable crowd-sourced economic data mining
- Yelp — Software Engineering Intern Python · JavaScript Full-stack engineer on biz.yelp.com, building a new landing page, improving purchasing flow, and internationalized check-in offers
- Watrhub — Software Engineering Intern Python · ElasticSearch Algorithm engineering for semi-supervised classification, web scraping, and ElasticSearch integrations to parse wastewater data PDFs into structured formats
- Canadian Government — Data Scientist Python Implemented statistical machine learning for pattern detection and classification, optimizing classification effectiveness and inference speed
Talks #
Olmo 3
- Courville Group Meeting Dec 2025
- AllenNLP Team Meeting @ Ai2 Oct 2025
Modern Post-Training of LLMs · RL exercise slides · code
- Armenia LLM Summer School Jul 2025
Asynchronous RLHF · recorded talk
- Cohere Labs RG Jun 2025
- Samsung-Mila-NYU Meeting Jun 2025
- Deep Learning: Classics and Trends Mar 2025
- NVIDIA Jan 2025
- U of T Safety RG Nov 2024
N+ Implementation Details of RLHF with PPO
- Andreas Group Meeting @ MIT Jul 2024
- Courville Group Meeting Oct 2023
- Sony-Mila Collaboration Report May 2023
- WARA Summer School Jul 2022
- WARA Podcast.
Competitive Emergent Communication
- Deepmind LIG RG May 2021
- OATML Group Meeting @ Oxford May 2019
Contact #
I am mnoukhov pretty much everywhere. Feel free to email me @gmail.com or find me on Twitter, GitHub, or LinkedIn.