This is the code repository for programming exercises of the Reinforcement Learning lecture at the University of Stuttgart. https://ipvs.informatik.uni-stuttgart.de/mlr/reinforcement-learning-ss-20
All exercises will be done with python3.
The first exercise uses numpy and matplotlib:
python3 -m pip install numpy matplotlib --user
Later exercises will use openai gym (https://gym.openai.com/):
python3 -m pip install gym --user
And tensorflow2 for Deep Reinforcement Learning (https://www.tensorflow.org/):
python3 -m pip install tensorflow --user
Epsilon-greedy action selection for a bandit with k-arms.
The Q action-value function is estimated by calculating the expected reward for each action. At each time step, the action that maximizes the Q value function is chosen with probability 1-epsilon (with probability epsilon a random action is chosen).
In the frozen lake environment, the fixed-point value function is calculated using the Bellman equation for all possible policies. The optimal policy is chosen as the one with maximum value function for all states.
This brute force approach (evaluating all possible policies) is however intractable for large state-action spaces.
Implementation of the Value Iteration algorithm in the Frozen Lake environment
Implementation of Monte Carlo ES to obtain the optimal policy and state-value function for blackjack.
Implementation of the SARSA (on-policy) and Q-learning (off-policy) algorithms to solve the FrozenLake environment
Implementation of n-step SARSA to solve the 8x8 Frozen lake environment
Implementation of Q(λ) and SARSA(λ) with state-aggregation to solve the Mountain Car environment
Reinforcement Learning (Policy search), Deep Learning (CNN and Transfer Learning), Image identification (YOLO) and Stochastic Optimization (Genetic algorithm) techniques were used to optimize the policy of the OpenGymAI Lunar-Lander-V2 Environment
Development made with help of Tensorflow and DEAP packages.