1
00:00:00,000 --> 00:00:00,150


2
00:00:00,150 --> 00:00:01,858
BRIAN: Let's take a
look at how you might

3
00:00:01,858 --> 00:00:03,750
have gone about solving World Cup.

4
00:00:03,750 --> 00:00:06,060
In this problem, your
task was to write a Python

5
00:00:06,060 --> 00:00:10,290
program to run multiple simulations
of some kind of a sports tournament.

6
00:00:10,290 --> 00:00:12,300
And let's see how you
would have done that.

7
00:00:12,300 --> 00:00:15,600
Inside of the main function,
the first thing you needed to do

8
00:00:15,600 --> 00:00:18,540
was to read in all of
the teams into memory,

9
00:00:18,540 --> 00:00:21,930
because the teams were stored
inside of a CSV file that

10
00:00:21,930 --> 00:00:25,770
had the name of each team, well as
a rating for each team representing

11
00:00:25,770 --> 00:00:28,970
how good that particular team is.

12
00:00:28,970 --> 00:00:34,190
That CSV file was specified as a command
line argument, which in Python, you

13
00:00:34,190 --> 00:00:38,930
can access using sys dot argv,
where sys dot argv 1, in this case,

14
00:00:38,930 --> 00:00:41,370
is going to be the name of the file.

15
00:00:41,370 --> 00:00:44,630
So we'll store that inside of
a variable called filename.

16
00:00:44,630 --> 00:00:47,660
Next we're going to open that
filename, opening that filename

17
00:00:47,660 --> 00:00:49,400
and calling the file f.

18
00:00:49,400 --> 00:00:53,660
And now we want to read
that file as a CSV file.

19
00:00:53,660 --> 00:00:56,000
To do so, we can use the CSV module.

20
00:00:56,000 --> 00:00:58,160
And in particular,
using something called

21
00:00:58,160 --> 00:01:03,680
CSV dot DictReader, which will let
us read a file, f, one row at a time,

22
00:01:03,680 --> 00:01:07,190
treating each row of the
CSV file as a dictionary,

23
00:01:07,190 --> 00:01:10,070
with keys corresponding to the
names of each of the columns,

24
00:01:10,070 --> 00:01:15,110
and values corresponding to the values
actually found inside that row of data

25
00:01:15,110 --> 00:01:17,520
in the CSV file.

26
00:01:17,520 --> 00:01:20,130
So once we have this
DictReader, which in this case,

27
00:01:20,130 --> 00:01:23,528
I'm just calling reader, let's
loop over all of the data

28
00:01:23,528 --> 00:01:25,070
that that reader is going to give us.

29
00:01:25,070 --> 00:01:27,560
For team in that
reader, each time, we're

30
00:01:27,560 --> 00:01:30,710
going to get one dictionary
representing that team that we

31
00:01:30,710 --> 00:01:33,620
want to add to our list of teams.

32
00:01:33,620 --> 00:01:37,610
But remember, when we read data from
a CSV file, the data, by default,

33
00:01:37,610 --> 00:01:40,850
is all going to be
strings, just textual data.

34
00:01:40,850 --> 00:01:42,290
But the rating is a number.

35
00:01:42,290 --> 00:01:44,120
And to be able to do
math with that number

36
00:01:44,120 --> 00:01:46,400
to calculate probabilities
and the likelihood

37
00:01:46,400 --> 00:01:48,800
that one team wins over
another team, we need

38
00:01:48,800 --> 00:01:51,960
to turn those ratings into integers.

39
00:01:51,960 --> 00:01:56,322
So what we're going to do here is
say team square bracket rating--

40
00:01:56,322 --> 00:01:58,280
and recall that you can
use that square bracket

41
00:01:58,280 --> 00:02:03,320
notation to access the value for a
particular key inside of a dictionary--

42
00:02:03,320 --> 00:02:07,760
is going to be updated to be equal
to team square bracket rating casted

43
00:02:07,760 --> 00:02:09,300
into an integer.

44
00:02:09,300 --> 00:02:12,320
So we're going to take that
value, turn it into an integer,

45
00:02:12,320 --> 00:02:16,400
and store that value inside
of the dictionary instead.

46
00:02:16,400 --> 00:02:20,030
Once we've done that, we can add
this new team to the list of teams

47
00:02:20,030 --> 00:02:23,090
by using this append
method, which is a function

48
00:02:23,090 --> 00:02:26,300
we can use on an existing
list, which just takes a value

49
00:02:26,300 --> 00:02:29,390
and adds it to the end of the list.

50
00:02:29,390 --> 00:02:33,050
The effect of this will be that
we'll have read all of these teams

51
00:02:33,050 --> 00:02:34,430
into memory.

52
00:02:34,430 --> 00:02:38,750
And using those teams, we now want
to be able to simulate tournaments.

53
00:02:38,750 --> 00:02:41,278
So how might we gone about
simulating tournaments.

54
00:02:41,278 --> 00:02:43,070
Well, let's look at
the simulate tournament

55
00:02:43,070 --> 00:02:45,980
function down near the
bottom of our file.

56
00:02:45,980 --> 00:02:49,190
To simulate a tournament, we
need to take a list of teams

57
00:02:49,190 --> 00:02:52,250
and repeatedly run rounds
on that list of teams,

58
00:02:52,250 --> 00:02:54,470
until we're down to just a single team.

59
00:02:54,470 --> 00:02:57,680
And to do so, we could have
used the simulate round

60
00:02:57,680 --> 00:03:01,610
function that accepts a list
of teams and gives us back

61
00:03:01,610 --> 00:03:05,300
a list of all of the
winners in that round.

62
00:03:05,300 --> 00:03:09,950
So we want to keep simulating rounds
as long as we still haven't yet

63
00:03:09,950 --> 00:03:11,820
gotten down to one team.

64
00:03:11,820 --> 00:03:16,610
So as long as len teams is greater
than 1-- here we're using a while loop

65
00:03:16,610 --> 00:03:19,640
and taking advantage of
the len function in Python,

66
00:03:19,640 --> 00:03:21,810
which gets us the
length of some sequence,

67
00:03:21,810 --> 00:03:24,320
for example, the number
of items in a list.

68
00:03:24,320 --> 00:03:26,930
As long as there is
more than one team, then

69
00:03:26,930 --> 00:03:29,750
there's another round
that we need to simulate.

70
00:03:29,750 --> 00:03:34,550
So we'll go ahead and simulate
a round with those teams and set

71
00:03:34,550 --> 00:03:38,360
that to be the new value of
this variable called teams.

72
00:03:38,360 --> 00:03:42,710
This is going to repeatedly simulate
rounds, and then update teams to just

73
00:03:42,710 --> 00:03:45,110
be the list of all of the winners.

74
00:03:45,110 --> 00:03:48,380
So by the end of it, teams is
a list that has one team in it,

75
00:03:48,380 --> 00:03:50,330
the winning team of the tournament.

76
00:03:50,330 --> 00:03:52,700
And if I have a list with
only one thing in it,

77
00:03:52,700 --> 00:03:56,370
I can access that first
and only element as teams

78
00:03:56,370 --> 00:03:59,690
that square bracket 0 to get
the first and, in this case,

79
00:03:59,690 --> 00:04:01,910
only element in the list.

80
00:04:01,910 --> 00:04:04,070
And then because each
team is a dictionary

81
00:04:04,070 --> 00:04:06,410
and I want to access
the name of the team,

82
00:04:06,410 --> 00:04:09,800
I'm going to say square
bracket team to just access

83
00:04:09,800 --> 00:04:13,400
the value for the team
column of that CSV file

84
00:04:13,400 --> 00:04:17,540
or the team key inside
of this dictionary.

85
00:04:17,540 --> 00:04:20,570
I'm then going to use that
simulate tournament function

86
00:04:20,570 --> 00:04:24,500
inside of my main function to
keep track of how many times

87
00:04:24,500 --> 00:04:26,990
each team has won a tournament.

88
00:04:26,990 --> 00:04:31,280
I'm going to loop N times, where,
by default, N is going to be 1,000.

89
00:04:31,280 --> 00:04:34,310
And I'm going to simulate a
tournament with those teams,

90
00:04:34,310 --> 00:04:37,560
and then save the result in winner.

91
00:04:37,560 --> 00:04:40,170
And now, I need to keep
track of how many times

92
00:04:40,170 --> 00:04:42,060
each team has won a tournament.

93
00:04:42,060 --> 00:04:45,690
If the winner is already
in my dictionary of counts,

94
00:04:45,690 --> 00:04:48,840
then I'm going to increase
their number of wins by 1.

95
00:04:48,840 --> 00:04:51,570
If they've already won
five games, for example,

96
00:04:51,570 --> 00:04:54,270
I'm going to increase
their win count up to 6

97
00:04:54,270 --> 00:04:58,080
to now keep track of this additional
tournament that they've won.

98
00:04:58,080 --> 00:05:01,680
But otherwise, if they're not
already in my dictionary of counts,

99
00:05:01,680 --> 00:05:05,550
that means that this team hasn't
won any tournaments before.

100
00:05:05,550 --> 00:05:09,390
This is the first tournament they've
won, according to our simulations.

101
00:05:09,390 --> 00:05:14,250
And so we're going to set counts winner
equal to 1 to mean that now they've won

102
00:05:14,250 --> 00:05:16,330
one tournament.

103
00:05:16,330 --> 00:05:19,300
And so at that point, counts
by the end of this loop

104
00:05:19,300 --> 00:05:22,420
will have counted up the number
of times that every team wins

105
00:05:22,420 --> 00:05:23,500
different tournaments.

106
00:05:23,500 --> 00:05:25,810
And then we can use
that data to figure out

107
00:05:25,810 --> 00:05:28,180
the approximate probability
that any team is

108
00:05:28,180 --> 00:05:30,750
going to win the entire tournament.

109
00:05:30,750 --> 00:05:31,740
My name is Brian.

110
00:05:31,740 --> 00:05:34,370
And this was World Cup.

111
00:05:34,370 --> 00:05:35,000