rl-course

This is the code repository for programming exercises of the Reinforcement Learning lecture at the University of Stuttgart. https://ipvs.informatik.uni-stuttgart.de/mlr/reinforcement-learning-ss-20

Requirements

All exercises will be done with python3.

The first exercise uses numpy and matplotlib:

python3 -m pip install numpy matplotlib --user

Later exercises will use openai gym (https://gym.openai.com/):

python3 -m pip install gym --user

And tensorflow2 for Deep Reinforcement Learning (https://www.tensorflow.org/):

python3 -m pip install tensorflow --user

Exercises

Exercise 01 - k-arms Bandit

Epsilon-greedy action selection for a bandit with k-arms.

The Q action-value function is estimated by calculating the expected reward for each action. At each time step, the action that maximizes the Q value function is chosen with probability 1-epsilon (with probability epsilon a random action is chosen).

Exercise 02 - Brute force value function

In the frozen lake environment, the fixed-point value function is calculated using the Bellman equation for all possible policies. The optimal policy is chosen as the one with maximum value function for all states.

This brute force approach (evaluating all possible policies) is however intractable for large state-action spaces.

Exercise 03 - Dynamic Programming

Implementation of the Value Iteration algorithm in the Frozen Lake environment

Exercise 04 - Monte Carlo ES (exploring starts)

Implementation of Monte Carlo ES to obtain the optimal policy and state-value function for blackjack.

Exercise 05 -TD learning - Sarsa and Q-learning

Implementation of the SARSA (on-policy) and Q-learning (off-policy) algorithms to solve the FrozenLake environment

Exercise 06 - n-Step TD learning

Implementation of n-step SARSA to solve the 8x8 Frozen lake environment

Exercise 07 - Function approximations and Eligibility traces

Implementation of Q(λ) and SARSA(λ) with state-aggregation to solve the Mountain Car environment

Exercise 99 - Policy Search

Reinforcement Learning (Policy search), Deep Learning (CNN and Transfer Learning), Image identification (YOLO) and Stochastic Optimization (Genetic algorithm) techniques were used to optimize the policy of the OpenGymAI Lunar-Lander-V2 Environment

Development made with help of Tensorflow and DEAP packages.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.vscode		.vscode
ex01-bandits		ex01-bandits
ex02-mdps		ex02-mdps
ex03-dynp		ex03-dynp
ex04-mc		ex04-mc
ex05-td		ex05-td
ex06-nstep		ex06-nstep
ex07-fa		ex07-fa
ex08-pg		ex08-pg
ex99-policy_search		ex99-policy_search
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lecture_01_intro.pdf		lecture_01_intro.pdf
lecture_02_mdps.pdf		lecture_02_mdps.pdf
lecture_03_dynp.pdf		lecture_03_dynp.pdf
lecture_04_mc.pdf		lecture_04_mc.pdf
lecture_05_td.pdf		lecture_05_td.pdf
lecture_06a_nstep.pdf		lecture_06a_nstep.pdf
lecture_06b_plan.pdf		lecture_06b_plan.pdf
lecture_07_fa.pdf		lecture_07_fa.pdf
lecture_08_pg.pdf		lecture_08_pg.pdf
rldata.dat		rldata.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rl-course

Requirements

Exercises

Exercise 01 - k-arms Bandit

Exercise 02 - Brute force value function

Exercise 03 - Dynamic Programming

Exercise 04 - Monte Carlo ES (exploring starts)

Exercise 05 -TD learning - Sarsa and Q-learning

Exercise 06 - n-Step TD learning

Exercise 07 - Function approximations and Eligibility traces

Exercise 99 - Policy Search

About

Releases

Packages

Languages

License

lucasrm25/Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

rl-course

Requirements

Exercises

Exercise 01 - k-arms Bandit

Exercise 02 - Brute force value function

Exercise 03 - Dynamic Programming

Exercise 04 - Monte Carlo ES (exploring starts)

Exercise 05 -TD learning - Sarsa and Q-learning

Exercise 06 - n-Step TD learning

Exercise 07 - Function approximations and Eligibility traces

Exercise 99 - Policy Search

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages