Skip to content

Latest commit

 

History

History

CQL

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Reproduce CQL with PARL

Based on PARL, the CQL algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper on continuous control datasets from the D4RL benchmark.

Paper: CQL in Conservative Q-Learning for Offline Reinforcement Learning

Env and dataset introduction

  • D4RL datasets: The algorithm is tested in the D4RL dataset, one of the most commonly used dataset for offline RL. Please see here to know more about D4RL datasets. D4RL require Mujoco as a dependency. For more D4RL usage methods, please refer to its guide.
  • Mujoco simulator: Please see here to know more about Mujoco simulator and obtain a license.

Benchmark result

learning curve

How to use

Dependencies:

Start Training:

Train

# To train for halfcheetah-medium-expert-v0(default), or [halfcheetah/hopper/walker/ant]-[random/medium/expert/medium-expert/medium-replay]-[v0/v2]
python train.py --env [ENV_NAME]

# To reproduce the performance
python train.py --env [ENV_NAME] --with_automatic_entropy_tuning