MIST101 Workshop 6: Reinforcement Learning, was held this Thursday, November 23rd.
In this workshop, we began by talking about some preliminary mathematical models for reinforcement learning and dive down to its approaches and applications.
Markov Decision Process(MDP) is a mathematical framework useful for studying optimization problems. It employs dynamic programming to compute the value functions to infer optimal policy, and uses Monte Carlo method to estimate value function by random sampling. The decision making process is only based on the current state. Iterative approaches such as policy/value iterations are greedy techniques to estimate the value function and improve the estimate until convergence.
Three different approaches were illustrated in detail: Value-based, Policy-based, and Actor critic. Actor critic is a generalised policy iteration – alternating between a policy evaluation and a policy improvement step. It no longer reliess on samples, but the trained model. The result of the different approaches on value function and policy are generalized in the table below.
Afterwards, we dive down to the applications of reinforcement learning on Alpha Go, Gaming bots, and self-driving cars. Alpha Go uses value network to reduce search depth, and uses policy network to reduce search breadth to reduce search space and find the next best move by maximizing the estimated reward. Gaming bots such as DotA2 and StarCraft II utilise multi-agnet reinforcement learning. Self-driving cars have an architecture includes a state-space and action space to develop the best driving policy.
The slides of workshop #6 can be found at: https://docs.google.com/presentation/d/1ED_iHBrR7PYTgOrxy8yUqZflpIIVi1d5ox2v4PWjCTc/edit?usp=sharing
This concludes our MIST101 Machine Learning Introductory Workshop Series.
It has been a great pleasure being a part of your journey on machine learning study. We would love you have your continuous support and looking forward to seeing you at our future events!
University of Toronto Machine Intelligence Student Team