| ✅ Proximal Policy Gradient (PPO) | 
 ppo.py,   docs | 
 | 
 ppo_atari.py,   docs | 
 | 
 ppo_continuous_action.py,   docs | 
 | 
 ppo_atari_lstm.py,   docs | 
 | 
 ppo_atari_envpool.py,   docs | 
 | 
 ppo_atari_envpool_xla_jax.py,   docs | 
 | 
 ppo_atari_envpool_xla_jax_scan.py,   docs | 
 | 
 ppo_procgen.py,   docs | 
 | 
 ppo_atari_multigpu.py,   docs | 
 | 
 ppo_pettingzoo_ma_atari.py,   docs | 
 | 
 ppo_continuous_action_isaacgym.py,   docs | 
| ✅ Deep Q-Learning (DQN) | 
 dqn.py,  docs | 
 | 
 dqn_atari.py,  docs | 
 | 
 dqn_jax.py,  docs | 
 | 
 dqn_atari_jax.py,  docs | 
| ✅ Categorical DQN (C51) | 
 c51.py,  docs | 
 | 
 c51_atari.py,  docs | 
 | 
 c51_jax.py,  docs | 
 | 
 c51_atari_jax.py,  docs | 
| ✅ Soft Actor-Critic (SAC) | 
 sac_continuous_action.py,  docs | 
 | 
 sac_atari.py,  docs | 
| ✅ Deep Deterministic Policy Gradient (DDPG) | 
 ddpg_continuous_action.py,  docs | 
 | 
 ddpg_continuous_action_jax.py,   docs | 
| ✅ Twin Delayed Deep Deterministic Policy Gradient (TD3) | 
 td3_continuous_action.py,  docs | 
 | 
 td3_continuous_action_jax.py,  docs | 
| ✅ Phasic Policy Gradient (PPG) | 
 ppg_procgen.py,  docs | 
| ✅ Random Network Distillation (RND) | 
 ppo_rnd_envpool.py,  docs | 
| ✅ Qdagger | 
 qdagger_dqn_atari_impalacnn.py,  docs | 
 | 
 qdagger_dqn_atari_jax_impalacnn.py,  docs |