1 00:00:00,000 --> 00:00:00,500 2 00:00:00,500 --> 00:00:04,230 BRIAN YU: In this lab your task is going to be to write a program in Python 3 00:00:04,230 --> 00:00:07,260 to simulate the results of a sports tournament. 4 00:00:07,260 --> 00:00:10,770 In a sports tournament, like the FIFA World Cup or other sport tournaments 5 00:00:10,770 --> 00:00:15,360 as well, oftentimes tournaments end up in a single elimination bracket 6 00:00:15,360 --> 00:00:18,840 where you end up with a bunch of teams each of which play each other, 7 00:00:18,840 --> 00:00:22,050 where the winners then move on to the next round and then play each other, 8 00:00:22,050 --> 00:00:24,760 the winners then move on to the next round, play each other 9 00:00:24,760 --> 00:00:27,420 and then finally the last two teams play each other 10 00:00:27,420 --> 00:00:30,210 and whichever team wins between those last two teams 11 00:00:30,210 --> 00:00:34,010 is ultimately declared the winner of the tournament. 12 00:00:34,010 --> 00:00:36,470 How might we simulate this type of tournament? 13 00:00:36,470 --> 00:00:40,400 Well in order to do so, we need some idea of how good each of these teams 14 00:00:40,400 --> 00:00:43,430 actually is so that we can compare two teams 15 00:00:43,430 --> 00:00:47,510 and make some prediction about who is likely to win a game between those two 16 00:00:47,510 --> 00:00:48,650 teams. 17 00:00:48,650 --> 00:00:51,890 So oftentimes teams or players will have ratings, 18 00:00:51,890 --> 00:00:56,370 some number that determines how good that particular team or player is 19 00:00:56,370 --> 00:00:59,120 and as a result, we can use that information 20 00:00:59,120 --> 00:01:04,099 to compare two ratings to determine who might win a game between any two teams 21 00:01:04,099 --> 00:01:05,209 for instance. 22 00:01:05,209 --> 00:01:08,900 Ultimately your program is going to use this kind of information, 23 00:01:08,900 --> 00:01:12,980 a listing of teams and what their ratings are, to simulate a tournament 24 00:01:12,980 --> 00:01:16,850 and simulate what the probability is that any particular team is 25 00:01:16,850 --> 00:01:18,560 going to win that tournament. 26 00:01:18,560 --> 00:01:21,320 In order to do so you'll need access to some data, 27 00:01:21,320 --> 00:01:24,260 so we'll give you some data formatted as a CSV file, 28 00:01:24,260 --> 00:01:27,980 comma separated values, where every line corresponds 29 00:01:27,980 --> 00:01:30,260 to a team that has two values. 30 00:01:30,260 --> 00:01:33,650 First the name of the team, in other words, like what country for example 31 00:01:33,650 --> 00:01:37,190 that team is from, followed by a comma, and then the rating 32 00:01:37,190 --> 00:01:41,330 for that team, some number representing the strength of that team where 33 00:01:41,330 --> 00:01:43,850 a higher rating means that team is better 34 00:01:43,850 --> 00:01:47,420 and is therefore more likely to win a game against a lower rated 35 00:01:47,420 --> 00:01:48,830 team for example. 36 00:01:48,830 --> 00:01:51,950 The bigger the difference between the ratings of those two teams, 37 00:01:51,950 --> 00:01:55,280 the more likely it is that the team with the higher rating 38 00:01:55,280 --> 00:01:58,080 is going to win that game. 39 00:01:58,080 --> 00:02:01,140 If we stored this information inside of a CSV file, 40 00:02:01,140 --> 00:02:03,420 then your program is going to work as follows, 41 00:02:03,420 --> 00:02:07,590 you'll run python tournament.py followed by a CSV file, the one 42 00:02:07,590 --> 00:02:12,330 we have here is the 2018 men's FIFA World Cup teams, 43 00:02:12,330 --> 00:02:16,080 and after that your program is going to simulate a whole bunch of tournaments, 44 00:02:16,080 --> 00:02:19,810 maybe simulating 1,000 different tournaments within these teams 45 00:02:19,810 --> 00:02:23,220 and then printing out based on those results what the program thinks 46 00:02:23,220 --> 00:02:26,670 the probability is that any particular country will 47 00:02:26,670 --> 00:02:30,290 be the eventual winner of the entire tournament. 48 00:02:30,290 --> 00:02:31,522 How are you going to do that? 49 00:02:31,522 --> 00:02:33,980 Well, let's start by taking a look at the distribution code 50 00:02:33,980 --> 00:02:36,290 that we give to you as part of this lab. 51 00:02:36,290 --> 00:02:38,960 For this lab, we give to you a couple of files. 52 00:02:38,960 --> 00:02:41,390 We give you some CSV files, each of which 53 00:02:41,390 --> 00:02:44,900 is going to contain a listing of teams as well as what rating each 54 00:02:44,900 --> 00:02:48,830 of those teams has, and we give you that for a couple of different tournaments 55 00:02:48,830 --> 00:02:52,160 but then tournament.py is where all of the logic is. 56 00:02:52,160 --> 00:02:54,230 This is the Python file that you're going 57 00:02:54,230 --> 00:02:58,080 to use to actually simulate one of these sports tournaments. 58 00:02:58,080 --> 00:03:00,840 We start here by defining a variable n which 59 00:03:00,840 --> 00:03:03,600 is equal to the number of simulations to run, 60 00:03:03,600 --> 00:03:06,930 and by default we're going to simulate 1,000 different tournaments 61 00:03:06,930 --> 00:03:08,940 with these teams. 62 00:03:08,940 --> 00:03:11,430 Inside the main function, we check to make sure 63 00:03:11,430 --> 00:03:14,340 that the program is being used correctly with a file name provided 64 00:03:14,340 --> 00:03:15,920 as an argument. 65 00:03:15,920 --> 00:03:18,830 Then we define a variable called teams, which 66 00:03:18,830 --> 00:03:21,260 initially is just going to be an empty list, 67 00:03:21,260 --> 00:03:23,220 there are no teams we know of yet. 68 00:03:23,220 --> 00:03:27,290 But the first thing you'll want to do is to read from the CSV file 69 00:03:27,290 --> 00:03:32,060 all of those teams and sort each team inside of this list of teams, 70 00:03:32,060 --> 00:03:35,120 storing each team with a dictionary where that dictionary is going 71 00:03:35,120 --> 00:03:37,550 to store values for both the name of the team 72 00:03:37,550 --> 00:03:42,380 as well as for the rating for that team as well. 73 00:03:42,380 --> 00:03:46,550 After that, we define another dictionary called count. 74 00:03:46,550 --> 00:03:49,700 And count is going to be a dictionary that maps keys to values 75 00:03:49,700 --> 00:03:52,050 as all dictionaries do, where in this case, 76 00:03:52,050 --> 00:03:55,460 the keys are going to be the names of the teams 77 00:03:55,460 --> 00:04:00,320 and the values are going to be how many tournaments that team has won. 78 00:04:00,320 --> 00:04:03,260 Because ultimately we're going to simulate n tournaments, where 79 00:04:03,260 --> 00:04:08,090 by default n is going to be 1,000 and we want to keep track of how many times 80 00:04:08,090 --> 00:04:11,180 any given team wins a tournament. 81 00:04:11,180 --> 00:04:14,060 And if a team wins the tournament 100 times then 82 00:04:14,060 --> 00:04:17,029 that team name is going to be the key and 100 83 00:04:17,029 --> 00:04:19,339 is going to be the value, so that we can remember 84 00:04:19,339 --> 00:04:24,400 for any given team how many tournaments they won according to our simulation. 85 00:04:24,400 --> 00:04:26,660 And based on that simulation, we've already 86 00:04:26,660 --> 00:04:29,690 written code for you that goes through each of those teams 87 00:04:29,690 --> 00:04:32,900 and prints out what probability we expect them to have 88 00:04:32,900 --> 00:04:35,690 of winning the entire tournament. 89 00:04:35,690 --> 00:04:38,270 We've also given to you a couple of other functions. 90 00:04:38,270 --> 00:04:42,920 We've given to a simulate_game function that accepts two teams as input. 91 00:04:42,920 --> 00:04:46,430 And what it's going to do is return true if based on the simulation 92 00:04:46,430 --> 00:04:49,400 team 1 wins and false otherwise. 93 00:04:49,400 --> 00:04:51,860 This function utilizes some randomness, it's 94 00:04:51,860 --> 00:04:54,470 not always going to return the same result to you 95 00:04:54,470 --> 00:04:58,100 every time, just as when two of the same teams play a game 96 00:04:58,100 --> 00:05:00,230 it's not always going to be the case most likely 97 00:05:00,230 --> 00:05:02,420 that the same team is going to win every time. 98 00:05:02,420 --> 00:05:05,510 There is some variability in the function as well. 99 00:05:05,510 --> 00:05:08,330 What the function does is it looks at the rating for both 100 00:05:08,330 --> 00:05:12,800 of those teams, rating 1 and rating 2, and uses that information 101 00:05:12,800 --> 00:05:18,540 to calculate what the probability is that team 1 for example wins the game. 102 00:05:18,540 --> 00:05:21,020 And then randomly using that probability, 103 00:05:21,020 --> 00:05:26,210 returns true sometimes if team 1 wins and false otherwise. 104 00:05:26,210 --> 00:05:29,210 We've also given you a function called simulate_round, 105 00:05:29,210 --> 00:05:32,270 which does the same thing, but not just for one game 106 00:05:32,270 --> 00:05:36,010 but for an entire round of games between many different teams. 107 00:05:36,010 --> 00:05:39,990 The simulate_round function accepts as input a list of teams, 108 00:05:39,990 --> 00:05:42,290 and what the simulate_round function will do 109 00:05:42,290 --> 00:05:47,030 is consider each of those pairs of teams one at a time, teams 0 and 1, then 2 110 00:05:47,030 --> 00:05:51,410 and 3, then 4 and 5, and simulate the game between each of them, 111 00:05:51,410 --> 00:05:55,700 returning to you a list of the winners of that round. 112 00:05:55,700 --> 00:06:00,140 So if you give to simulate_round a list of eight teams for example, 113 00:06:00,140 --> 00:06:02,810 then simulate_round will return to you a list 114 00:06:02,810 --> 00:06:07,410 of the four winners from pairing up teams 0 and 1, 2 and 3, 4 115 00:06:07,410 --> 00:06:10,840 5, 6 and 7 for example. 116 00:06:10,840 --> 00:06:14,460 Finally, here is the simulate_tournament function. 117 00:06:14,460 --> 00:06:18,450 This function ultimately should simulate the entire tournament, starting out 118 00:06:18,450 --> 00:06:21,600 with all of the teams, which you can assume will be some power of 2, 119 00:06:21,600 --> 00:06:23,800 like 16 teams for example. 120 00:06:23,800 --> 00:06:27,630 And then repeatedly simulating rounds until we're down to just one 121 00:06:27,630 --> 00:06:30,690 winner of the entire tournament and it's going to be left up 122 00:06:30,690 --> 00:06:33,900 to you to complete that function. 123 00:06:33,900 --> 00:06:38,070 So let's recap what you'll need to do in tournament.py. 124 00:06:38,070 --> 00:06:43,200 First, you should complete the main function using csv.DictReader 125 00:06:43,200 --> 00:06:46,620 you can read teams from the CSV file one at a time, 126 00:06:46,620 --> 00:06:50,790 treating each team as a dictionary, where there's a key called team that 127 00:06:50,790 --> 00:06:54,180 represents the team's name as well as a key called rating that 128 00:06:54,180 --> 00:06:56,160 represents the team's rating. 129 00:06:56,160 --> 00:07:00,300 Now by default when you read files as a CSV file it's going to treat everything 130 00:07:00,300 --> 00:07:02,867 as a string and because the rating is a number 131 00:07:02,867 --> 00:07:04,950 you'll want to make sure that you actually convert 132 00:07:04,950 --> 00:07:07,740 that rating to an integer first. 133 00:07:07,740 --> 00:07:10,830 Once you do, you're going to store each team as a dictionary 134 00:07:10,830 --> 00:07:13,030 inside that list of teams. 135 00:07:13,030 --> 00:07:17,770 So the teams ends up being a list of dictionaries, one dictionary per team. 136 00:07:17,770 --> 00:07:20,220 And once you have that list of teams and you can then 137 00:07:20,220 --> 00:07:22,800 simulate n tournaments, where n by default 138 00:07:22,800 --> 00:07:27,320 is 1,000, by calling your simulate_tournament function. 139 00:07:27,320 --> 00:07:29,070 After each of those tournaments, which you 140 00:07:29,070 --> 00:07:31,200 might imagine having in some sort of loop 141 00:07:31,200 --> 00:07:34,500 that's going to repeatedly simulate one tournament after another, 142 00:07:34,500 --> 00:07:38,790 you'll want to keep track of the win count inside of your count dictionary. 143 00:07:38,790 --> 00:07:42,600 Keeping track for any given team how many times that team 144 00:07:42,600 --> 00:07:45,600 has won one of your simulated tournaments. 145 00:07:45,600 --> 00:07:49,340 You'll also want to complete the simulate_tournament function. 146 00:07:49,340 --> 00:07:53,160 The simulate_tournament function again should simulate an entire tournament, 147 00:07:53,160 --> 00:07:55,950 accepting a list of teams and producing who 148 00:07:55,950 --> 00:07:58,710 the winner of the simulated tournament is. 149 00:07:58,710 --> 00:08:03,300 In doing so, you'll probably want to call the simulate_round function, which 150 00:08:03,300 --> 00:08:06,990 we've already written for you, which accepts a list of teams 151 00:08:06,990 --> 00:08:10,260 and returns a list of the winners from that round. 152 00:08:10,260 --> 00:08:13,830 And likely you'll want to run this function multiple times, 153 00:08:13,830 --> 00:08:17,970 repeatedly simulating rounds until only one team is left. 154 00:08:17,970 --> 00:08:20,700 If you start off with a tournament with 16 teams 155 00:08:20,700 --> 00:08:23,430 and you pass those teams into simulate_round, 156 00:08:23,430 --> 00:08:25,830 you'll get back a list of eight winners. 157 00:08:25,830 --> 00:08:28,050 If you simulate a round with those eight winners, 158 00:08:28,050 --> 00:08:31,320 you'll get back a list of four winners and then two, all the way down 159 00:08:31,320 --> 00:08:34,350 until one team is left in this tournament. 160 00:08:34,350 --> 00:08:36,809 And once you're down to just one winning team, 161 00:08:36,809 --> 00:08:40,000 you're going to return the name of that winning team 162 00:08:40,000 --> 00:08:43,110 so that you can use that name inside your count dictionary 163 00:08:43,110 --> 00:08:47,460 to figure out who ultimately is going to win the simulation. 164 00:08:47,460 --> 00:08:49,320 After you've done all of that, you should 165 00:08:49,320 --> 00:08:52,500 be able to run tournament.py on a CSV file that 166 00:08:52,500 --> 00:08:57,120 contains teams and their ratings and figure out the approximate probability 167 00:08:57,120 --> 00:09:00,300 that any given team is going to win the tournament. 168 00:09:00,300 --> 00:09:04,070 My name is Brian and this was World Cup.