Sponsor
Portland State University. Systems Science Graduate Program
First Advisor
Bruno Jedynak
Term of Graduation
Fall 2022
Date of Publication
11-2-2021
Document Type
Thesis
Degree Name
Master of Science (M.S.) in Systems Science
Department
Systems Science
Language
English
Subjects
Artificial intelligence, Machine learning, Reinforcement learning
DOI
10.15760/etd.7738
Physical Description
1 online resource (vi, 95 pages)
Abstract
In this paper I will explain the AlphaGo family of algorithms starting from first principles and requiring little previous knowledge from the reader. The focus will be upon one of the more recent versions AlphaZero but I hope to explain the core principles that allowed these algorithms to be so successful. I will generally refer to AlphaZero as theses [sic] core set of principles and will make it clear when I am referring to a specific algorithm of the AlphaGo family. AlphaZero in short combines Monte Carlo Tree Search (MCTS) with Deep learning and self-play. We will see how these three concepts fit together and we will break down each of these pieces and look at examples to clarify understanding. I implemented a simplified version of the algorithm on TicTacToe and Connect4 and the code is available online as well as a simple web app that allows you to play against a trained agent.
https://github.com/befeltingu/VikingZero
https://github.com/befeltingu/VikingDashboard
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Persistent Identifier
https://archives.pdx.edu/ds/psu/36926
Recommended Citation
SeWell, David Robert, "From MDP to AlphaZero" (2021). Dissertations and Theses. Paper 5867.
https://doi.org/10.15760/etd.7738