I am interested in Deep Reinforcement Learning and its application to continuous-control tasks.
My research improved the optimization stability of off-policy gradient-based
I've written two works along this research direction:
-
Stabilizing Q-Learning for Continuous Control
David Yu-Tung Hui
MSc Thesis, University of Montreal, 2022
I first investigated the duality between maximizing entropy and maximizing likelihood in the context of RL. I then showed that LayerNorm reduced divergence in$Q$ -learning, especially in high-dimensional continuous control tasks.
[.pdf] [Errata] -
Double Gumbel Q-Learning
David Yu-Tung Hui, Aaron Courville, Pierre-Luc Bacon
Spotlight at NeurIPS 2023
We showed that function approximation in$Q$ -learning induces two heteroscedastic Gumbel noise sources. An algorithm modeling these noise sources attained almost$2\times$ the aggregate performance of SAC at 1M timesteps over 33 continuous control tasks.
[.pdf] [Reviews] [Poster (.png)] [5-min talk] [1-hour seminar] [Code (GitHub)] [Errata]