CS534 Spring 2019 Adversarial Search, Game Playing Showcase ...

CS534 Spring 2019 Adversarial Search, Game Playing Showcase ...

CS534 Spring 2019 Adversarial Search, Game Playing Showcase by: Varun Bhat, Ruofan Hu, Jiayi Li, Justin Seeley, and Matthew Szpunar Showcasing work by: David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis on AlphaGo References and Resources Chen, Jim X. The Evolution of Computing: AlphaGo. Computing in Science & Engineering 18.4 (2016): 47. Web. David Silver et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529.7587

(2016): 484489. Web. GeeksforGeeks. (2019). Reinforcement learning - GeeksforGeeks. [online] Available at: https://www.geeksforgeeks.org/what-is-reinforcement-learning/ [Accessed 3 Feb. 2019]. Gibney, Elizabeth. What Googles Winning Go Algorithm Will Do Next. Nature 531.7594 (2016): 284285. Web. Jonghoon Bae et al. Social Networks and Inference About Unknown Events: A Case of the Match Between Googles AlphaGo and Sedol Lee. PLoS ONE 12.2 (2017): e0171472. Web. Langford, John. AlphaGo Is Not the Solution to AI.(artificial intelligence)([email protected])(Blog Entry). 59.6 (2016): n. pag. Print. Senseis.xmp.net. (2019). Number of Possible Go Games at Sensei's Library. [online] Available at: https://senseis.xmp.net/?NumberOfPossibleGoGames [Accessed 3 Feb. 2019]. Shannon, Claude E. XXII. Programming a Computer for Playing Chess. The London, Edinburgh and Dublin philosophical magazine and journal of science. 41.314 (1950): 256275. Web. Hui, J. (2019). AlphaGo: How it works technically? Jonathan Hui Medium. [online] Medium. Available at: https:// medium.com/@jonathan_hui/alphago-how-it-works-technically-26ddcc085319 [Accessed 5 Feb. 2019]. 2

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Outline AlphaGo Summary/Introduction What is AlphaGo? Differences from AlphaGo and other GoAI AlphaGo Algorithms Monte Carlo Tree Search Convolutional Neural Networks Reinforcement Learning 4 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar AlphaGo Introduction What is AlphaGo?

Difference between AlphaGo and other Go AI What is AlphaGo? AI to play the game Go Game of Go in Progress Territory control game 2 * 10170 possible game states Go is played on a 19x19 board Two Players (Black and White) Take turns placing stones on the board The Winner wins by:

Controlling more space on the board Capturing more pieces https://www.mastersofgames.com/images/orientalboard/go-table-boardpay.jpg 6 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute What is AlphaGo? AI to play the game Go Position for Black to Capture Territory control game

2 * 10170 possible game states Go is played on a 19x19 board Two Players (Black and White) Take turns placing stones on the board The Winner wins by: Controlling more space on the board Capturing more pieces To Capture a piece: Surround it on all sides https://online-go.com/puzzle/14036

7 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute What is AlphaGo? AI to play the game Go Position for Black to Capture Territory control game 2 * 10170 possible game states Go is played on a 19x19 board

Two Players (Black and White) Take turns placing stones on the board The Winner wins by: Controlling more space on the board Capturing more pieces To Capture a piece: Surround it on all sides https://online-go.com/puzzle/14036 8 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar

Worcester Polytechnic Institute What is AlphaGo? Success in 2016 against Lee Sidol 2nd Highest Ranked Go Player Won the series 4-1 Screenshot of AlphaGo playing Lee Sidol (2016) https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/ 2017/01/09112900/alphago-vs-lee-sedol-2_w_600.jpg 9 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar

Worcester Polytechnic Institute AlphaGo vs. other Go AI The strong Go AIs all rely on Monte Carlo Tree Search (MCTS). AlphaGo however makes extensive use of machine learning to avoid using handcrafted rules. Various Go AI vs. Skill Ranking (ELO) Skill Ranking (ELO) 10 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar

Mastering the Game of Go with Deep Neural Networks and Tree Search. (2017) Worcester Polytechnic Institute AlphaGo Algorithms Monte Carlo Tree Search Convolutional Neural Network Reinforcement Learning Monte Carlo Tree Search(MCTS) Go has approximately 360 actions Typical rules give 30s to 1min for each move Needs efficient (time-wise) algorithm

Problems with Minimax Takes a long time if an optimal solution is sought Relies on an evaluation function and heuristics to prune the minimax tree to prune the tree if an optimal solution would take too much time to calculate AlphaGo uses Monte Carlo Tree Search instead Can be interrupted at any time and give the optimal path so far Does not need explicit evaluation function so it can be used in games without developed theory 12 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute

How to make MCTS work for GO? picture from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf 13 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How to make MCTS work for GO? picture from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf 14

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How to make MCTS work for GO? picture from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf 15 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How to make MCTS work for GO? picture from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

16 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How to make MCTS work for GO? Two ideas to make MCTS work for GO. Idea 1: Value function to truncate tree -> shallower MCTS Idea 2: Better tree & default policies -> smarter MCTS Value function

Expected future reward to a point from a board assuming we play perfectly from that point Tree & Default Policy Tree policy Selecting which part of the search tree to expand Default policy Determine how simulations are run picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 17 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute

How to make MCTS work for GO? In AlphaGo 18 Uses both ideas for improving MCTS Two resources to train Expert data Simulator (self-play) CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute

How to make MCTS work for GO? Main idea for AlphaGo: For better policies and value functions, use Convolutional Neural Networks to train. 19 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute Large search tree ent/uploads/2018/09/alphago-netflix-documentary-4.png?w=1600 20

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar https://i0.wp.com/erickimphotography.com/blog/wp-content/uploads/2018/0 9/alphago-netflix-documentary-4.png?w=1600 Worcester Polytechnic Institute Convolution Neural Network Use Network to take the current game state Produce smaller subset of actions to reduce search space Policy Network Predicting what the next move will be Value Network What is the value of that predicted move

21 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute Policy-network (Supervised learning) Reducing action candidates (Breadth Reduction) Train by supervised learning from a database of human professional games. The output is a probability value for each possible legal move. https://www.slideshare.net/ShaneSeungwhanMoon/how-alphagoworks 22

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute Value network Position evaluation ahead of time (Depth Reduction) There is no need to simulate until the maximum depth if there is a function V(s) can measure board evaluation of state s https://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works 23 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar

Worcester Polytechnic Institute Reinforcement learning AlphaGo collects moves for 30 million games. It then uses these positions to generate a rollout policy which is used to quickly compute and classify the probability of the moves that an expert human might make. picture taken from medium.com 24

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute Reinforcement learning - training picture taken from medium.com 25 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute Thank you! Questions Backup Slides

How does Reinforcement Learning Work Input: The input should be an initial state from which the model will start Output: There are many possible output as there are variety of solution to a

particular problem Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output. The model continues to learn. The best solution is decided based on the maximum reward. picture taken from GeeksforGeeks 28 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute

Go Positions according to policies 29 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How to make MCTS work for GO? Each move Time constraint Deepen/build our MCTS search tree Select our optimal move and only consider subtree from there picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 30

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Loop: Select-Expand-Update-Explore Application of the Bandit-Based Method. Two Fundamental Concepts: The true value of any action can be approximated by running several random simulations. These values can be efficiently used to adjust the policy (strategy) towards a best-first strategy. Builds a partial game tree before each move. Then selection is made. Moves are explored and values are updated/estimated.

picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 31 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Selecting Select a node on the tree that has the highest possibility of winning.

For Example Consider the moves with winning possibility 2/3, 0/1 & 1/2 after the first move 4/6, the node 2/3 has the highest possibility of winning. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 32 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Expanding

After selecting the right node. Expanding is used to increase the options further in the game by expanding the selected node (2/3) and creating many children nodes. We are using only one children node in this case. These children nodes are the future moves that can be played in the game. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 33 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar

Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Simulating|Exploring Since the best child is unknown, we need to find the best-performing move that isnt a dead state To do that, we use Reinforcement Learning to make random decisions in the game further down from every children node. A value is assigned to every children node by calculating how close the output of

their random decision was from the final output that we need to win the game. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 34 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Updating|Back-propagation

The total scores of their parent nodes must be updated by going back up the tree one-by-one. The new updated scores changes the state of the tree and may also change new future node of the selection process. Loop begins again using the newly updated probabilities picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 35

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Loop: Select-Expand-Update-Explore Instead of brute forcing from millions of possible ways to find the right path Monte Carlo Tree Search algorithm chooses the best possible move from the current state of Games Tree with the help of Reinforcement Learning. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 36

CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Selecting Start at root node Based on Tree Policy select child Apply recursively - descend through tree Stop when expandable node is reached

Expandable Node that is non-terminal and has unexplored children picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 37 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Expanding

Add one or more child nodes to tree Depends on what actions are available for the current position Method in which this is done depends on Tree Policy picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 38 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute How does Monte Carlo Tree Search Work

Simulating|Exploring Runs simulation of path that was selected Get position at end of simulation Default Policy determines how simulation is run Board outcome determines value. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 39 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar

Worcester Polytechnic Institute How does Monte Carlo Tree Search Work Updating|Back-propagation Moves backward through saved path Value of Node representative of benefit of going down that path from parent Values are updated dependent on board outcome Based on how the simulated game ends, values are updated

picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 40 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute Increased Proficiency with More Power Elo Ranking 41 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar

Worcester Polytechnic Institute Convolutional Neural Networks 42 CS534 Artificial Intelligence: Bhat - Hu - Li - Seeley - Szpunar Worcester Polytechnic Institute

Recently Viewed Presentations

  • Building Quality & Supporting IFAC Membership

    Building Quality & Supporting IFAC Membership

    Each IFAC member body needs to determine how best to implement the requirements of the IES and is subject to IFAC's Statements of Membership Obligations ("SMOs"). The Board also recognizes that individual IFAC member bodies may adopt learning and development...
  • Chicago-Kent Board of Overseers Technology Committee

    Chicago-Kent Board of Overseers Technology Committee

    How Grokster Works No Central Server No central database indexes all of the files on the Gnutella network. The machines tell each other about available files. Is Grokster Illegal? Vicarious liability: the right and ability to control users; and a...
  • ทฤษฎีและนโยบายการเงิน Monetary Theory and Policy

    ทฤษฎีและนโยบายการเงิน Monetary Theory and Policy

    ข้อโต้แย้งของเคนส์ สาเหตุที่ทำให้เกิดการตกต่ำของเศรษฐกิจ คือ การที่อุปสงค์มวลรวมมีไม่พอ การลดค่าแรงที่เป็นตัวเงินไม่ ...
  • XML - Weber State University

    XML - Weber State University

    * * * * * * * * * * * * * DECLARING A DTD There can only be one DTD per XML document. A document type definition is a collection of rules or declarations that define the content...
  • Chapter 3 Capturing and Editing Digital Images

    Chapter 3 Capturing and Editing Digital Images

    Effect of Resampling Image on Image Quality Scaling an image usually deteriorates the image quality somewhat. Scale down: You are removing pixels, i.e., the image will lose some details. Scale up: Interpolation will be perform to fill in extra pixels....
  • Understanding the Challenges Facing LGBT Older Adults

    Understanding the Challenges Facing LGBT Older Adults

    Gay and bisexual older adult men have significantly fewer children in the household and are significantly more likely to live alone . Older adults who live alone are at serious risk of social isolation, which in the general population is...
  • LO: To understand different approaches to managing water

    LO: To understand different approaches to managing water

    Bedzed - London. Dongtan - China. Both of these eco sustainable communities use water in a sustainable way. In Dongtan, China they have two water systems going into the houses. One is for grey water and the other is for...
  • Absolutism - Wappingers Central School District

    Absolutism - Wappingers Central School District

    Landowning nobles forced to serve state in civilian or military jobs. Imported western technology, Improved education, simplified russian alphabet and set up academies for study for mathematics, science and engineering. Used mercantilist policies to pay for his reforms. Mercantilism. is...