1 00:00:00,000 --> 00:00:00,150 2 00:00:00,150 --> 00:00:01,858 BRIAN: Let's take a look at how you might 3 00:00:01,858 --> 00:00:03,750 have gone about solving World Cup. 4 00:00:03,750 --> 00:00:06,060 In this problem, your task was to write a Python 5 00:00:06,060 --> 00:00:10,290 program to run multiple simulations of some kind of a sports tournament. 6 00:00:10,290 --> 00:00:12,300 And let's see how you would have done that. 7 00:00:12,300 --> 00:00:15,600 Inside of the main function, the first thing you needed to do 8 00:00:15,600 --> 00:00:18,540 was to read in all of the teams into memory, 9 00:00:18,540 --> 00:00:21,930 because the teams were stored inside of a CSV file that 10 00:00:21,930 --> 00:00:25,770 had the name of each team, well as a rating for each team representing 11 00:00:25,770 --> 00:00:28,970 how good that particular team is. 12 00:00:28,970 --> 00:00:34,190 That CSV file was specified as a command line argument, which in Python, you 13 00:00:34,190 --> 00:00:38,930 can access using sys dot argv, where sys dot argv 1, in this case, 14 00:00:38,930 --> 00:00:41,370 is going to be the name of the file. 15 00:00:41,370 --> 00:00:44,630 So we'll store that inside of a variable called filename. 16 00:00:44,630 --> 00:00:47,660 Next we're going to open that filename, opening that filename 17 00:00:47,660 --> 00:00:49,400 and calling the file f. 18 00:00:49,400 --> 00:00:53,660 And now we want to read that file as a CSV file. 19 00:00:53,660 --> 00:00:56,000 To do so, we can use the CSV module. 20 00:00:56,000 --> 00:00:58,160 And in particular, using something called 21 00:00:58,160 --> 00:01:03,680 CSV dot DictReader, which will let us read a file, f, one row at a time, 22 00:01:03,680 --> 00:01:07,190 treating each row of the CSV file as a dictionary, 23 00:01:07,190 --> 00:01:10,070 with keys corresponding to the names of each of the columns, 24 00:01:10,070 --> 00:01:15,110 and values corresponding to the values actually found inside that row of data 25 00:01:15,110 --> 00:01:17,520 in the CSV file. 26 00:01:17,520 --> 00:01:20,130 So once we have this DictReader, which in this case, 27 00:01:20,130 --> 00:01:23,528 I'm just calling reader, let's loop over all of the data 28 00:01:23,528 --> 00:01:25,070 that that reader is going to give us. 29 00:01:25,070 --> 00:01:27,560 For team in that reader, each time, we're 30 00:01:27,560 --> 00:01:30,710 going to get one dictionary representing that team that we 31 00:01:30,710 --> 00:01:33,620 want to add to our list of teams. 32 00:01:33,620 --> 00:01:37,610 But remember, when we read data from a CSV file, the data, by default, 33 00:01:37,610 --> 00:01:40,850 is all going to be strings, just textual data. 34 00:01:40,850 --> 00:01:42,290 But the rating is a number. 35 00:01:42,290 --> 00:01:44,120 And to be able to do math with that number 36 00:01:44,120 --> 00:01:46,400 to calculate probabilities and the likelihood 37 00:01:46,400 --> 00:01:48,800 that one team wins over another team, we need 38 00:01:48,800 --> 00:01:51,960 to turn those ratings into integers. 39 00:01:51,960 --> 00:01:56,322 So what we're going to do here is say team square bracket rating-- 40 00:01:56,322 --> 00:01:58,280 and recall that you can use that square bracket 41 00:01:58,280 --> 00:02:03,320 notation to access the value for a particular key inside of a dictionary-- 42 00:02:03,320 --> 00:02:07,760 is going to be updated to be equal to team square bracket rating casted 43 00:02:07,760 --> 00:02:09,300 into an integer. 44 00:02:09,300 --> 00:02:12,320 So we're going to take that value, turn it into an integer, 45 00:02:12,320 --> 00:02:16,400 and store that value inside of the dictionary instead. 46 00:02:16,400 --> 00:02:20,030 Once we've done that, we can add this new team to the list of teams 47 00:02:20,030 --> 00:02:23,090 by using this append method, which is a function 48 00:02:23,090 --> 00:02:26,300 we can use on an existing list, which just takes a value 49 00:02:26,300 --> 00:02:29,390 and adds it to the end of the list. 50 00:02:29,390 --> 00:02:33,050 The effect of this will be that we'll have read all of these teams 51 00:02:33,050 --> 00:02:34,430 into memory. 52 00:02:34,430 --> 00:02:38,750 And using those teams, we now want to be able to simulate tournaments. 53 00:02:38,750 --> 00:02:41,278 So how might we gone about simulating tournaments. 54 00:02:41,278 --> 00:02:43,070 Well, let's look at the simulate tournament 55 00:02:43,070 --> 00:02:45,980 function down near the bottom of our file. 56 00:02:45,980 --> 00:02:49,190 To simulate a tournament, we need to take a list of teams 57 00:02:49,190 --> 00:02:52,250 and repeatedly run rounds on that list of teams, 58 00:02:52,250 --> 00:02:54,470 until we're down to just a single team. 59 00:02:54,470 --> 00:02:57,680 And to do so, we could have used the simulate round 60 00:02:57,680 --> 00:03:01,610 function that accepts a list of teams and gives us back 61 00:03:01,610 --> 00:03:05,300 a list of all of the winners in that round. 62 00:03:05,300 --> 00:03:09,950 So we want to keep simulating rounds as long as we still haven't yet 63 00:03:09,950 --> 00:03:11,820 gotten down to one team. 64 00:03:11,820 --> 00:03:16,610 So as long as len teams is greater than 1-- here we're using a while loop 65 00:03:16,610 --> 00:03:19,640 and taking advantage of the len function in Python, 66 00:03:19,640 --> 00:03:21,810 which gets us the length of some sequence, 67 00:03:21,810 --> 00:03:24,320 for example, the number of items in a list. 68 00:03:24,320 --> 00:03:26,930 As long as there is more than one team, then 69 00:03:26,930 --> 00:03:29,750 there's another round that we need to simulate. 70 00:03:29,750 --> 00:03:34,550 So we'll go ahead and simulate a round with those teams and set 71 00:03:34,550 --> 00:03:38,360 that to be the new value of this variable called teams. 72 00:03:38,360 --> 00:03:42,710 This is going to repeatedly simulate rounds, and then update teams to just 73 00:03:42,710 --> 00:03:45,110 be the list of all of the winners. 74 00:03:45,110 --> 00:03:48,380 So by the end of it, teams is a list that has one team in it, 75 00:03:48,380 --> 00:03:50,330 the winning team of the tournament. 76 00:03:50,330 --> 00:03:52,700 And if I have a list with only one thing in it, 77 00:03:52,700 --> 00:03:56,370 I can access that first and only element as teams 78 00:03:56,370 --> 00:03:59,690 that square bracket 0 to get the first and, in this case, 79 00:03:59,690 --> 00:04:01,910 only element in the list. 80 00:04:01,910 --> 00:04:04,070 And then because each team is a dictionary 81 00:04:04,070 --> 00:04:06,410 and I want to access the name of the team, 82 00:04:06,410 --> 00:04:09,800 I'm going to say square bracket team to just access 83 00:04:09,800 --> 00:04:13,400 the value for the team column of that CSV file 84 00:04:13,400 --> 00:04:17,540 or the team key inside of this dictionary. 85 00:04:17,540 --> 00:04:20,570 I'm then going to use that simulate tournament function 86 00:04:20,570 --> 00:04:24,500 inside of my main function to keep track of how many times 87 00:04:24,500 --> 00:04:26,990 each team has won a tournament. 88 00:04:26,990 --> 00:04:31,280 I'm going to loop N times, where, by default, N is going to be 1,000. 89 00:04:31,280 --> 00:04:34,310 And I'm going to simulate a tournament with those teams, 90 00:04:34,310 --> 00:04:37,560 and then save the result in winner. 91 00:04:37,560 --> 00:04:40,170 And now, I need to keep track of how many times 92 00:04:40,170 --> 00:04:42,060 each team has won a tournament. 93 00:04:42,060 --> 00:04:45,690 If the winner is already in my dictionary of counts, 94 00:04:45,690 --> 00:04:48,840 then I'm going to increase their number of wins by 1. 95 00:04:48,840 --> 00:04:51,570 If they've already won five games, for example, 96 00:04:51,570 --> 00:04:54,270 I'm going to increase their win count up to 6 97 00:04:54,270 --> 00:04:58,080 to now keep track of this additional tournament that they've won. 98 00:04:58,080 --> 00:05:01,680 But otherwise, if they're not already in my dictionary of counts, 99 00:05:01,680 --> 00:05:05,550 that means that this team hasn't won any tournaments before. 100 00:05:05,550 --> 00:05:09,390 This is the first tournament they've won, according to our simulations. 101 00:05:09,390 --> 00:05:14,250 And so we're going to set counts winner equal to 1 to mean that now they've won 102 00:05:14,250 --> 00:05:16,330 one tournament. 103 00:05:16,330 --> 00:05:19,300 And so at that point, counts by the end of this loop 104 00:05:19,300 --> 00:05:22,420 will have counted up the number of times that every team wins 105 00:05:22,420 --> 00:05:23,500 different tournaments. 106 00:05:23,500 --> 00:05:25,810 And then we can use that data to figure out 107 00:05:25,810 --> 00:05:28,180 the approximate probability that any team is 108 00:05:28,180 --> 00:05:30,750 going to win the entire tournament. 109 00:05:30,750 --> 00:05:31,740 My name is Brian. 110 00:05:31,740 --> 00:05:34,370 And this was World Cup. 111 00:05:34,370 --> 00:05:35,000