https://chatgpt.com/share/67b31115-6508-800d-8ada-662b2fed132e
- Basics
- Monte-Carlo, Temporal Difference
- Extensions
- Policy Optimization
- Soft Actor Critic
- Q-learning
- Double Q-learning
- Overestimation issue 피하기 --IQL
- Interpolating Between Policy Optimization and Q-Learning.
- Soft Actor-Critic
- DDPG
- Policy Optimization