1
00:00:00,000 --> 00:00:00,500


2
00:00:00,500 --> 00:00:03,590
SPEAKER 1: This is a seminar on
two of my favorite buzz terms--

3
00:00:03,590 --> 00:00:05,060
machine learning, computer vision.

4
00:00:05,060 --> 00:00:06,770
You throw those out there
in like a party conversation

5
00:00:06,770 --> 00:00:09,940
and besides people saying nerd, they'll
also think that it's pretty bad ass.

6
00:00:09,940 --> 00:00:11,690
If they're computer
scientists, they might

7
00:00:11,690 --> 00:00:13,790
question how much you actually know.

8
00:00:13,790 --> 00:00:16,910
So machine learning and
computer vision, I've

9
00:00:16,910 --> 00:00:21,230
kind of put up a cutesy
little slide on what they are.

10
00:00:21,230 --> 00:00:23,540
I usually will refer to
them by their full name.

11
00:00:23,540 --> 00:00:25,430
But if I ever say ML I
mean Machine Learning.

12
00:00:25,430 --> 00:00:27,590
If I ever say CV I mean Computer Vision.

13
00:00:27,590 --> 00:00:28,520
Pretty easy.

14
00:00:28,520 --> 00:00:31,444
And these are kind of two
of the myths that people

15
00:00:31,444 --> 00:00:32,610
have about machine learning.

16
00:00:32,610 --> 00:00:34,370
They're usually on one of two extremes.

17
00:00:34,370 --> 00:00:36,920
Either it's impossible and only
for the super technical geeky

18
00:00:36,920 --> 00:00:39,800
wizards that can sit at their
computers for hours on end.

19
00:00:39,800 --> 00:00:45,050
Or they're not that interesting because
we're better at doing what they do.

20
00:00:45,050 --> 00:00:49,400
And if you fall in any of the betweens
of those two extremes, that's cool too.

21
00:00:49,400 --> 00:00:53,360
But basically machine
learning and computer vision,

22
00:00:53,360 --> 00:00:55,746
the context for why we
would need them is we

23
00:00:55,746 --> 00:00:57,620
want to be able to solve
problems that people

24
00:00:57,620 --> 00:01:00,740
solve all day, every day
but programmatically.

25
00:01:00,740 --> 00:01:03,740
So I won't be able to take a computer
program, have it look at something

26
00:01:03,740 --> 00:01:07,340
and say, that's a cat, that's a
dog, that's a person, that's a couch

27
00:01:07,340 --> 00:01:10,670
and not accidentally look at a
person and say, oh, that's a car.

28
00:01:10,670 --> 00:01:12,410
Where's there's license plate.

29
00:01:12,410 --> 00:01:15,590
And that's maybe a real world example
that you could imagine existing

30
00:01:15,590 --> 00:01:17,030
is traffic recognition.

31
00:01:17,030 --> 00:01:20,030
If you're just trying to
figure out what something is,

32
00:01:20,030 --> 00:01:21,530
computer vision is super important.

33
00:01:21,530 --> 00:01:23,510
Because I want to be able
to tell, is the thing

34
00:01:23,510 --> 00:01:28,340
that just ran my red light
a brown bear or a car.

35
00:01:28,340 --> 00:01:29,600
And that's important.

36
00:01:29,600 --> 00:01:33,180
And so that's something that is really
what we're kind of after in general

37
00:01:33,180 --> 00:01:33,680
here.

38
00:01:33,680 --> 00:01:36,013
But we're going to approach
a very specific problem just

39
00:01:36,013 --> 00:01:38,460
to give us some context.

40
00:01:38,460 --> 00:01:42,410
And as far as context for me and my
kind of background in these two things

41
00:01:42,410 --> 00:01:48,290
is I had never coded before a year
ago, a year ago being 2016 so in case

42
00:01:48,290 --> 00:01:49,370
this is in the future.

43
00:01:49,370 --> 00:01:53,210
And basically I wanted to,
for my CS50 final project,

44
00:01:53,210 --> 00:01:54,410
do something really cool.

45
00:01:54,410 --> 00:01:56,570
I wanted to do it on my own.

46
00:01:56,570 --> 00:01:59,059
And I wanted it to be
something that was accessible.

47
00:01:59,059 --> 00:02:01,350
And so a lot of people were
like, oh, machine learning,

48
00:02:01,350 --> 00:02:05,420
computer vision are possibly, they
break like two of those three criteria.

49
00:02:05,420 --> 00:02:08,570
Like, you can't do that on your
own and it's not accessible.

50
00:02:08,570 --> 00:02:10,160
Those things are impossible to do.

51
00:02:10,160 --> 00:02:11,810
Why would you ever approach that?

52
00:02:11,810 --> 00:02:14,660
And yes that is maybe true
from like a theoretical side.

53
00:02:14,660 --> 00:02:17,840
I didn't sit down and teach myself all
of the math behind machine learning.

54
00:02:17,840 --> 00:02:22,280
I didn't sit down and teach myself how
to do contouring using computer vision.

55
00:02:22,280 --> 00:02:25,490
I sat down and played around
with some YouTube videos.

56
00:02:25,490 --> 00:02:27,500
And that's basically the
point of the seminar,

57
00:02:27,500 --> 00:02:32,420
is to teach or show or prove that these
two concepts, while they are buzzwords

58
00:02:32,420 --> 00:02:35,420
and they are super cool and there's
whole fields around each of them,

59
00:02:35,420 --> 00:02:38,990
they're accessible to you, the
CS50 student that just started CS,

60
00:02:38,990 --> 00:02:43,370
had never seen what a line of code
and like what it meant before.

61
00:02:43,370 --> 00:02:45,200
That's who this is for.

62
00:02:45,200 --> 00:02:47,840
And so if you are some
sort of CS guru that does

63
00:02:47,840 --> 00:02:51,020
know all sorts of things about machine
learning, computer vision, little

64
00:02:51,020 --> 00:02:53,606
confused why you're watching
but also really appreciative.

65
00:02:53,606 --> 00:02:56,480
And hopefully you'll find it at
least entertaining and maybe a little

66
00:02:56,480 --> 00:02:58,360
been informative.

67
00:02:58,360 --> 00:03:01,760
And so basically, my
story was I came in,

68
00:03:01,760 --> 00:03:05,180
did a final project using these
two packages or pieces of software,

69
00:03:05,180 --> 00:03:07,660
Keras which is the machine
learning part and Open CV

70
00:03:07,660 --> 00:03:10,674
2 which is the computer
vision part and built

71
00:03:10,674 --> 00:03:12,590
an algorithm or a piece
of software that would

72
00:03:12,590 --> 00:03:15,410
allow me to do a very specific task.

73
00:03:15,410 --> 00:03:17,240
I wanted to convert
images of sheet music

74
00:03:17,240 --> 00:03:20,450
into machine readable versions of music.

75
00:03:20,450 --> 00:03:22,140
Software that doesn't exist right now.

76
00:03:22,140 --> 00:03:22,973
And I found out why.

77
00:03:22,973 --> 00:03:24,470
Because it's really freaking hard.

78
00:03:24,470 --> 00:03:26,131
But it was a cool process anyway.

79
00:03:26,131 --> 00:03:28,130
And I thought that it was
a worthwhile endeavor.

80
00:03:28,130 --> 00:03:31,650
And I would encourage everyone
to try it if they want to.

81
00:03:31,650 --> 00:03:35,690
So just kind of a real world
example, we have kind of four people

82
00:03:35,690 --> 00:03:38,960
and this will be to kind of ease us
into the idea of pattern recognition.

83
00:03:38,960 --> 00:03:41,270
That's what we're after with
machine learning anyway.

84
00:03:41,270 --> 00:03:43,424
And so I have for stock images.

85
00:03:43,424 --> 00:03:45,590
They came with the template
of the PowerPoint slide.

86
00:03:45,590 --> 00:03:47,240
I just changed their descriptions.

87
00:03:47,240 --> 00:03:51,120
And basically I'm going to
ask you to kind of think

88
00:03:51,120 --> 00:03:55,260
in your head about what patterns you
can find between these four people.

89
00:03:55,260 --> 00:03:58,170
And as a human being, you can find--

90
00:03:58,170 --> 00:03:59,601
I'm hesitant to say hundreds.

91
00:03:59,601 --> 00:04:02,600
I don't know if there's even really
enough data for hundreds, but maybe.

92
00:04:02,600 --> 00:04:03,100
Right?

93
00:04:03,100 --> 00:04:04,830
A very large number of patterns.

94
00:04:04,830 --> 00:04:08,695
So now I'm going to restrict
that in the sense that machines,

95
00:04:08,695 --> 00:04:11,070
they have a limited amount of
data that they can pick up.

96
00:04:11,070 --> 00:04:11,660
So do we.

97
00:04:11,660 --> 00:04:14,201
But we pick up a lot more
kind of instantaneously

98
00:04:14,201 --> 00:04:16,950
than a machine really is going to,
especially a machine that we're

99
00:04:16,950 --> 00:04:20,579
working with, kind of on the
softer or smaller software side.

100
00:04:20,579 --> 00:04:23,277
So you get these three categories.

101
00:04:23,277 --> 00:04:25,860
And that's basically how many
eyes do you have, are you human,

102
00:04:25,860 --> 00:04:27,510
and do you have a ponytail.

103
00:04:27,510 --> 00:04:28,710
I was short on time.

104
00:04:28,710 --> 00:04:31,290
And so, from these
three categories, if I

105
00:04:31,290 --> 00:04:35,500
were to ask you a question you should be
able to identify which group of people

106
00:04:35,500 --> 00:04:38,730
or I were to point at someone and
say based on these categories, what

107
00:04:38,730 --> 00:04:41,227
category do they belong in,
you should be able to do that.

108
00:04:41,227 --> 00:04:43,060
And human beings, we're
pretty good at that.

109
00:04:43,060 --> 00:04:46,200
So if I say is this a person, you'd
say with pretty much 100% confidence,

110
00:04:46,200 --> 00:04:47,110
yes that is.

111
00:04:47,110 --> 00:04:47,610
Same here.

112
00:04:47,610 --> 00:04:48,510
Same with that one.

113
00:04:48,510 --> 00:04:49,426
Same with number four.

114
00:04:49,426 --> 00:04:51,810
It's hard to say, is this a girl.

115
00:04:51,810 --> 00:04:53,700
Girl isn't really one of the categories.

116
00:04:53,700 --> 00:04:58,140
And looking at the data, well
maybe you have some extraneous data

117
00:04:58,140 --> 00:05:00,090
that says people with
ponytails are probably

118
00:05:00,090 --> 00:05:01,522
girls, which is a little sexist.

119
00:05:01,522 --> 00:05:04,230
But like, you know, we're going
to kind of ignore that assumption

120
00:05:04,230 --> 00:05:06,330
and just say ponytails
are probably girls.

121
00:05:06,330 --> 00:05:09,761
Well, you get this one with
we'll say like 95% confidence.

122
00:05:09,761 --> 00:05:12,010
But all the rest of these
are like, well, no ponytail.

123
00:05:12,010 --> 00:05:14,360
So I'm 100% sure they're not girls.

124
00:05:14,360 --> 00:05:15,100
Slight problem.

125
00:05:15,100 --> 00:05:15,960
That one is.

126
00:05:15,960 --> 00:05:19,470
And so this is a very contrived
but kind of interesting

127
00:05:19,470 --> 00:05:23,070
example where if I limit the amount
of data that you're able to look at,

128
00:05:23,070 --> 00:05:26,400
you are now restricted in
what patterns you can find.

129
00:05:26,400 --> 00:05:27,870
And that's a very intuitive thing.

130
00:05:27,870 --> 00:05:31,140
That's something that should be
almost an instantaneous revelation

131
00:05:31,140 --> 00:05:32,850
like a shower thought if you will.

132
00:05:32,850 --> 00:05:35,730
But that has kind of some
severe manifestations

133
00:05:35,730 --> 00:05:39,120
when we're trying to apply machine
learning and computer vision.

134
00:05:39,120 --> 00:05:41,610
So one of the assumptions
that people make

135
00:05:41,610 --> 00:05:46,600
is that we can accumulate
as much data as we need.

136
00:05:46,600 --> 00:05:47,970
However, I'm not Google.

137
00:05:47,970 --> 00:05:49,000
You're also not Google.

138
00:05:49,000 --> 00:05:50,950
You're not Amazon or Microsoft.

139
00:05:50,950 --> 00:05:54,819
We're not one of these big companies
that have access to hedabytes of data.

140
00:05:54,819 --> 00:05:56,610
And so a lot of people
then are like, well,

141
00:05:56,610 --> 00:05:57,690
then machine learning is not for me.

142
00:05:57,690 --> 00:05:59,400
When I work at Google I will do it then.

143
00:05:59,400 --> 00:06:00,600
And they move on.

144
00:06:00,600 --> 00:06:05,340
But you don't need hedabytes or
even gigabytes necessarily of data

145
00:06:05,340 --> 00:06:07,290
to get machine learning to work.

146
00:06:07,290 --> 00:06:11,640
If I were to give you a simple pattern,
we'll talk about the logic gate and.

147
00:06:11,640 --> 00:06:14,100
It takes two inputs, either 0 or 1.

148
00:06:14,100 --> 00:06:16,770
And it returns one
output, either 0 or 1.

149
00:06:16,770 --> 00:06:19,680
And if I say and of 1
and 1, you return 1.

150
00:06:19,680 --> 00:06:23,250
If I say and of any other
input, you give me 0.

151
00:06:23,250 --> 00:06:24,630
It's a very small amount of data.

152
00:06:24,630 --> 00:06:28,710
In fact, I can represent
the inputs as single bits.

153
00:06:28,710 --> 00:06:32,870
And I can represent and in,
we'll say, less than 10 bits.

154
00:06:32,870 --> 00:06:37,560
Now you have learned an entire pattern,
a complete pattern, if you will.

155
00:06:37,560 --> 00:06:40,470
And it took you less
than even a kilobyte.

156
00:06:40,470 --> 00:06:42,750
So patterns don't require lots of data.

157
00:06:42,750 --> 00:06:46,230
Complicated patterns maybe
require more data, as you'll see.

158
00:06:46,230 --> 00:06:47,610
And it's kind of intuitive.

159
00:06:47,610 --> 00:06:51,240
But we can kind of get past
this problem without having

160
00:06:51,240 --> 00:06:53,520
to just collect more data.

161
00:06:53,520 --> 00:06:55,650
We don't need to sit
down for hours and hours

162
00:06:55,650 --> 00:06:59,280
labeling things and manually
going, all right, that's a cat.

163
00:06:59,280 --> 00:06:59,940
That goes here.

164
00:06:59,940 --> 00:07:00,510
That's a dog.

165
00:07:00,510 --> 00:07:01,080
It goes here.

166
00:07:01,080 --> 00:07:01,830
That's a triangle.

167
00:07:01,830 --> 00:07:03,210
Why is a triangle in here?

168
00:07:03,210 --> 00:07:05,820
You don't need to actually
sit there doing that.

169
00:07:05,820 --> 00:07:08,050
There's all these other
techniques that exist.

170
00:07:08,050 --> 00:07:11,640
And I think that they're
not as well publicized.

171
00:07:11,640 --> 00:07:13,752
And if you had known about
these sort of things,

172
00:07:13,752 --> 00:07:16,960
maybe you wouldn't have turned away from
machine learning in the first place.

173
00:07:16,960 --> 00:07:19,630
So one of the first ones
that I have listed there,

174
00:07:19,630 --> 00:07:21,037
it's called data augmentation.

175
00:07:21,037 --> 00:07:23,370
Another one of those things
that sounds like a buzzword.

176
00:07:23,370 --> 00:07:27,285
You throw it out there and you're
just like, I augmented some data.

177
00:07:27,285 --> 00:07:29,910
And then you just kind of move
on and hope that nobody actually

178
00:07:29,910 --> 00:07:31,050
asks what that means.

179
00:07:31,050 --> 00:07:34,770
But all it really is just
taking the data that you have

180
00:07:34,770 --> 00:07:39,520
and kind of creating new data or
making sure that patterns are preserved

181
00:07:39,520 --> 00:07:42,180
but changing what it looks like.

182
00:07:42,180 --> 00:07:45,630
And so what I mean by that is if
you take a picture of someone's face

183
00:07:45,630 --> 00:07:50,040
and you stretch it out, to a point,
you can still recognize them.

184
00:07:50,040 --> 00:07:54,987
But to a machine, each pixel that
you stretched it out is more data.

185
00:07:54,987 --> 00:07:57,570
Because I've now taken this
picture and I've stretched it out.

186
00:07:57,570 --> 00:07:59,950
That's the same person but
these are two different images.

187
00:07:59,950 --> 00:08:02,491
If I were to compare the two
images, they are not just by bit

188
00:08:02,491 --> 00:08:04,350
by bit comparison the same.

189
00:08:04,350 --> 00:08:05,290
And that's important.

190
00:08:05,290 --> 00:08:07,623
So that's one of the techniques,
is stretching an image.

191
00:08:07,623 --> 00:08:11,250
In the example code or distribution
code that you'll get access to later,

192
00:08:11,250 --> 00:08:13,920
you'll see that there's an entire
configuration file dedicated

193
00:08:13,920 --> 00:08:15,930
to how do you augment your data.

194
00:08:15,930 --> 00:08:17,670
What if they rotated something.

195
00:08:17,670 --> 00:08:20,450
Is a triangle still a triangle
if I turn it sideways?

196
00:08:20,450 --> 00:08:21,060
Yeah.

197
00:08:21,060 --> 00:08:22,380
And same thing with a face.

198
00:08:22,380 --> 00:08:26,110
If I have your face and I turn
it upside down, well, people

199
00:08:26,110 --> 00:08:28,710
we'll turn ourselves upside down
and try to look at your face.

200
00:08:28,710 --> 00:08:32,549
But the same thing is applicable
to machines and how they learn.

201
00:08:32,549 --> 00:08:35,419
If I can take a piece of data,
even a small amount of data,

202
00:08:35,419 --> 00:08:38,760
I can amplify it according to all of
these different ways of shifting it.

203
00:08:38,760 --> 00:08:40,980
What if color doesn't matter.

204
00:08:40,980 --> 00:08:44,280
What if I was using a emojis
and they could be in any color.

205
00:08:44,280 --> 00:08:46,020
You could still recognize the emoji.

206
00:08:46,020 --> 00:08:52,230
It's still a smiley face emoji, even if
it's yellow or black or blue or pink.

207
00:08:52,230 --> 00:08:55,920
But in the particular case that I've
just given, there's a slight problem.

208
00:08:55,920 --> 00:08:58,780
Some emojis do use
color to convey meaning.

209
00:08:58,780 --> 00:09:00,590
The angry emoji is red.

210
00:09:00,590 --> 00:09:03,535
If you had an angry emoji
that was like a bright pink,

211
00:09:03,535 --> 00:09:04,910
it might look a little different.

212
00:09:04,910 --> 00:09:06,347
We might get a different message.

213
00:09:06,347 --> 00:09:08,680
So it's important that when
you're augmenting your data,

214
00:09:08,680 --> 00:09:11,080
you keep in mind what
patterns can change

215
00:09:11,080 --> 00:09:15,040
and which ones can't, what
information are you actually after.

216
00:09:15,040 --> 00:09:17,320
And that comes into the
next point of clever data

217
00:09:17,320 --> 00:09:18,640
gathering, which is
basically what you're

218
00:09:18,640 --> 00:09:20,264
doing when you're augmenting your data.

219
00:09:20,264 --> 00:09:22,180
You're just not going
outside to get more.

220
00:09:22,180 --> 00:09:25,452
So if I was collecting data,
I was just picking up images

221
00:09:25,452 --> 00:09:27,160
off the internet which
is often what I'll

222
00:09:27,160 --> 00:09:29,580
do if I'm trying to build
a machine learning model,

223
00:09:29,580 --> 00:09:35,770
I have to make sure that I collect
maybe not more but the right kinds.

224
00:09:35,770 --> 00:09:40,940
So if I gathered 30 pictures of
the same cat and then I said,

225
00:09:40,940 --> 00:09:44,410
all right, machine, tell me
if this is a cat or a dog.

226
00:09:44,410 --> 00:09:46,510
Not very good data gathering.

227
00:09:46,510 --> 00:09:48,310
I've just picked up the same thing.

228
00:09:48,310 --> 00:09:51,500
And even a human would be like,
well, if I only had that information.

229
00:09:51,500 --> 00:09:53,440
I would just say it's not that cat.

230
00:09:53,440 --> 00:09:55,420
But that's basically the idea here.

231
00:09:55,420 --> 00:09:58,780
If you were teaching a toddler something
or if you were teaching a small kid

232
00:09:58,780 --> 00:10:02,260
or even if you were teaching a full on
college student a complicated concept,

233
00:10:02,260 --> 00:10:04,540
you have to give them
enough of the pattern

234
00:10:04,540 --> 00:10:06,520
that they can get it right every time.

235
00:10:06,520 --> 00:10:09,667
They can extrapolate from the
pattern that they're given.

236
00:10:09,667 --> 00:10:11,500
So if I were to give
you a number sequence--

237
00:10:11,500 --> 00:10:16,820
1, 1, 2-- then some people might go,
oh, that's the Fibonacci sequence.

238
00:10:16,820 --> 00:10:17,520
Well, no.

239
00:10:17,520 --> 00:10:21,280
It's 1, 1, 2, 1, 1, 2.

240
00:10:21,280 --> 00:10:26,440
And so that might be some example in a
very kind of contrived way of a pattern

241
00:10:26,440 --> 00:10:29,140
that I didn't give you
enough information.

242
00:10:29,140 --> 00:10:31,330
And this sort of thing,
where you're using an image,

243
00:10:31,330 --> 00:10:32,740
that's a lot of patterns.

244
00:10:32,740 --> 00:10:33,730
It could be eye color.

245
00:10:33,730 --> 00:10:34,390
It could be hair color.

246
00:10:34,390 --> 00:10:35,140
What if they're not people?

247
00:10:35,140 --> 00:10:35,890
It could be shape.

248
00:10:35,890 --> 00:10:38,260
It could be the angle at
which things intersect.

249
00:10:38,260 --> 00:10:39,850
There's a lot of information there.

250
00:10:39,850 --> 00:10:41,565
And we pick it up almost instantly.

251
00:10:41,565 --> 00:10:43,440
You look at a single
picture and you're like,

252
00:10:43,440 --> 00:10:45,060
I could name 400 patterns from here.

253
00:10:45,060 --> 00:10:47,800
If you're not thinking, oh,
I know exactly 400 patterns

254
00:10:47,800 --> 00:10:50,810
but you could start enumerating,
oh, well, there's this attribute.

255
00:10:50,810 --> 00:10:52,390
There's this attribute, and this one.

256
00:10:52,390 --> 00:10:54,306
So you have to make sure
that your machine has

257
00:10:54,306 --> 00:10:58,500
enough data and the right type of data
that it can actually pick up stuff.

258
00:10:58,500 --> 00:11:01,930
And a good benchmark for that is
if you kind of narrow your mind--

259
00:11:01,930 --> 00:11:05,080
and this is one of the few cases where
I'll say, just be narrow minded--

260
00:11:05,080 --> 00:11:07,630
and only look at something
in the context of what you're

261
00:11:07,630 --> 00:11:10,720
given, if you can still figure
it out, there's probably

262
00:11:10,720 --> 00:11:12,440
a way for the machine to do it too.

263
00:11:12,440 --> 00:11:15,820
And if you can't, the machine
probably can't do it either.

264
00:11:15,820 --> 00:11:19,930
And so automated data gathering is
one of the next solutions to this

265
00:11:19,930 --> 00:11:21,620
I don't have enough data problems.

266
00:11:21,620 --> 00:11:24,130
Because basically the
original, the brute force,

267
00:11:24,130 --> 00:11:27,460
solution is to take a bunch
of pictures and label them.

268
00:11:27,460 --> 00:11:30,732
And we're talking about specifically
like image classification here.

269
00:11:30,732 --> 00:11:32,440
There are other kinds
of machine learning

270
00:11:32,440 --> 00:11:34,780
but I'm kind of gearing
this more towards image

271
00:11:34,780 --> 00:11:38,120
classification because it is a little
bit easier to understand and intuit.

272
00:11:38,120 --> 00:11:41,110
So if I was manually labeling images.

273
00:11:41,110 --> 00:11:42,530
And I've done this before.

274
00:11:42,530 --> 00:11:43,720
It's awful.

275
00:11:43,720 --> 00:11:46,080
If you can avoid it, don't do it.

276
00:11:46,080 --> 00:11:50,080
But you can sit there and you can
say, well, this is an A, this is a B,

277
00:11:50,080 --> 00:11:52,387
this is a C. But you need
enough data to make sure

278
00:11:52,387 --> 00:11:53,720
that your patterns are complete.

279
00:11:53,720 --> 00:11:57,100
So you might be doing that
for seven, eight hours.

280
00:11:57,100 --> 00:11:58,376
It's horrible.

281
00:11:58,376 --> 00:11:59,250
Find some good music.

282
00:11:59,250 --> 00:12:01,560
It'll make it a little bit easier to do.

283
00:12:01,560 --> 00:12:03,060
But you can automate that process.

284
00:12:03,060 --> 00:12:05,156
If I could generate
all of the-- let's say

285
00:12:05,156 --> 00:12:07,030
we're doing some sort
of letter recognition--

286
00:12:07,030 --> 00:12:11,610
if I could generate in 100 different
fonts all of the 26 letters,

287
00:12:11,610 --> 00:12:14,910
we'll say, maybe 52 if you
do lower case and uppercase.

288
00:12:14,910 --> 00:12:17,850
Then all my data
gathering is pretty easy.

289
00:12:17,850 --> 00:12:19,170
Click a button, it's done.

290
00:12:19,170 --> 00:12:23,659
So if there's a way for you to
automate your data collection, do it.

291
00:12:23,659 --> 00:12:26,700
And now that you've automated it, you
might as well generate as much data

292
00:12:26,700 --> 00:12:27,529
as you want.

293
00:12:27,529 --> 00:12:29,820
And you might find that there
is some time restrictions

294
00:12:29,820 --> 00:12:32,028
because as your machine
learns on more and more data,

295
00:12:32,028 --> 00:12:33,670
it might take more and more time.

296
00:12:33,670 --> 00:12:36,060
So now you have a balancing
act that you have to perform.

297
00:12:36,060 --> 00:12:38,680
Do I want to generate more data and
get a better machine learning modeled?

298
00:12:38,680 --> 00:12:41,430
On want to generate less data
and have it be done faster?

299
00:12:41,430 --> 00:12:45,470
And is there a point where giving it
more data doesn't make it any better

300
00:12:45,470 --> 00:12:46,927
but it does make it slower?

301
00:12:46,927 --> 00:12:48,510
Because at that point you should stop.

302
00:12:48,510 --> 00:12:51,551
If it's not getting any better, than
maybe you need to change your model.

303
00:12:51,551 --> 00:12:54,960
And that's the last point there,
is we need sometimes beefier models

304
00:12:54,960 --> 00:12:57,300
and sometimes more clever models.

305
00:12:57,300 --> 00:13:00,030
And those are somewhat
interchangeable and sometimes not.

306
00:13:00,030 --> 00:13:01,920
Just because you have
a bigger heftier model,

307
00:13:01,920 --> 00:13:03,503
that's what I kind of mean by beefier.

308
00:13:03,503 --> 00:13:05,670
Doesn't mean it's better
at learning things.

309
00:13:05,670 --> 00:13:09,210
People that are just bigger don't
learn things faster inherently.

310
00:13:09,210 --> 00:13:12,180
But if I have a model that's a
little bit more clever about how

311
00:13:12,180 --> 00:13:16,020
it learns something, it is faster
able to pick up on a pattern.

312
00:13:16,020 --> 00:13:19,790
It's probably going to do a little bit
better depending on the circumstance.

313
00:13:19,790 --> 00:13:22,300
So one of our one of my
favorite myths about machine

314
00:13:22,300 --> 00:13:27,550
learning that I'm also blocking part
of is that it takes a long time.

315
00:13:27,550 --> 00:13:30,250
If I want to take a
model and get it to work,

316
00:13:30,250 --> 00:13:34,600
I have to train it for
hours and hours and hours.

317
00:13:34,600 --> 00:13:35,817
And that's not true.

318
00:13:35,817 --> 00:13:36,400
That's people.

319
00:13:36,400 --> 00:13:37,870
People take a long time.

320
00:13:37,870 --> 00:13:40,060
You train people for hours and hours.

321
00:13:40,060 --> 00:13:42,520
One of the benefits to
machine learning is that you

322
00:13:42,520 --> 00:13:44,490
don't have to train them that long.

323
00:13:44,490 --> 00:13:46,491
A lot of times, I mean,
that's with the caveat

324
00:13:46,491 --> 00:13:48,490
that if you're training
on an enormous data set,

325
00:13:48,490 --> 00:13:50,531
you're training a particularly
complicated model,

326
00:13:50,531 --> 00:13:51,740
it might take a long time.

327
00:13:51,740 --> 00:13:54,980
But given that we're doing some
sort of CS50 final project,

328
00:13:54,980 --> 00:13:56,457
this is not a problem for you.

329
00:13:56,457 --> 00:13:58,540
This is not something that
pushes this project out

330
00:13:58,540 --> 00:14:01,960
of the reaches of your grasp.

331
00:14:01,960 --> 00:14:05,770
It's one of those things that
is actually just kind of a myth,

332
00:14:05,770 --> 00:14:07,450
that machine learning takes too long.

333
00:14:07,450 --> 00:14:10,930
And a kind of parallel myth is
that computer vision perfectly

334
00:14:10,930 --> 00:14:12,910
captures all data in an image.

335
00:14:12,910 --> 00:14:16,750
Right And maybe the way that these are
parallel is not immediately apparent.

336
00:14:16,750 --> 00:14:19,420
But the same idea is present here.

337
00:14:19,420 --> 00:14:24,340
This concept that computer vision is a
perfect representation of whatever data

338
00:14:24,340 --> 00:14:27,300
it sees pushes it outside of our grasp.

339
00:14:27,300 --> 00:14:29,500
Because if it is a
perfect representation

340
00:14:29,500 --> 00:14:33,190
and we can't learn something
from that perfect representation,

341
00:14:33,190 --> 00:14:34,510
then it's not doable.

342
00:14:34,510 --> 00:14:36,430
Might as well throw up
our hands and give up.

343
00:14:36,430 --> 00:14:37,420
But that's not true.

344
00:14:37,420 --> 00:14:39,610
Computer vision is a little subjective.

345
00:14:39,610 --> 00:14:41,530
I can choose how my machine sees.

346
00:14:41,530 --> 00:14:44,830
I can choose how well
it picks up on patterns,

347
00:14:44,830 --> 00:14:48,230
how well it distinguishes between what
is the foreground and the background.

348
00:14:48,230 --> 00:14:51,310
All of those things come into play
when you're trying to pick up data.

349
00:14:51,310 --> 00:14:54,610
And so we use a very simple example
of computer vision, more just

350
00:14:54,610 --> 00:15:00,220
to give you a taste in the distribution
code of how to interface with open CV2.

351
00:15:00,220 --> 00:15:01,490
But it does exist.

352
00:15:01,490 --> 00:15:03,490
And it is something that
I think is particularly

353
00:15:03,490 --> 00:15:06,300
important for image classifiers.

354
00:15:06,300 --> 00:15:09,600
But when we're choosing software
to do all of these things,

355
00:15:09,600 --> 00:15:12,900
to do machine learning, to
do computer vision, even just

356
00:15:12,900 --> 00:15:16,290
to program in general, it
becomes very important to us

357
00:15:16,290 --> 00:15:20,320
to kind of see the trade offs
between different pieces of software.

358
00:15:20,320 --> 00:15:24,810
So in this project and in general,
I go to Keras and Open CV.

359
00:15:24,810 --> 00:15:28,770
However underneath the
hood, Keras uses TensorFlow.

360
00:15:28,770 --> 00:15:30,360
Or at least, I have it use TensorFlow.

361
00:15:30,360 --> 00:15:32,130
You could also have it use Theano.

362
00:15:32,130 --> 00:15:32,997
I don't use Theano.

363
00:15:32,997 --> 00:15:34,080
It's a little bit mathier.

364
00:15:34,080 --> 00:15:36,870
It was a little bit above
my intellectual level.

365
00:15:36,870 --> 00:15:39,480
But TensorFlow I thought
was pretty accessible

366
00:15:39,480 --> 00:15:41,800
and I like it as a company
and just kind of in general.

367
00:15:41,800 --> 00:15:43,050
I swear I don't work for them.

368
00:15:43,050 --> 00:15:44,400
They're just the really cool.

369
00:15:44,400 --> 00:15:47,849
And so these two things
actually have the same benefits,

370
00:15:47,849 --> 00:15:49,140
at least from my point of view.

371
00:15:49,140 --> 00:15:52,350
I was a college student that was
just learning all of these things,

372
00:15:52,350 --> 00:15:54,730
just learning computer
science in general.

373
00:15:54,730 --> 00:15:57,120
And so I was like, well, you
know, what projects exist.

374
00:15:57,120 --> 00:15:57,930
I asked my TF.

375
00:15:57,930 --> 00:16:01,230
And he was like, oh, go look up
like OpenCV, see what that does.

376
00:16:01,230 --> 00:16:03,990
And I asked him, like what can I
do to a machine learn something.

377
00:16:03,990 --> 00:16:04,710
I want to do AI.

378
00:16:04,710 --> 00:16:07,280
And he was like, oh AI
sounds a little scary.

379
00:16:07,280 --> 00:16:09,690
But machine learning is also scary.

380
00:16:09,690 --> 00:16:10,950
Well, pick one.

381
00:16:10,950 --> 00:16:12,970
And I was like OK, we'll
do machine learning.

382
00:16:12,970 --> 00:16:14,886
And you'll notice they're
not super different.

383
00:16:14,886 --> 00:16:17,862
But what I mean by high level
interface, I think open source,

384
00:16:17,862 --> 00:16:20,570
it's open to the public, I didn't
have to pay for it, [INAUDIBLE]

385
00:16:20,570 --> 00:16:21,930
college student.

386
00:16:21,930 --> 00:16:24,630
But providing a high level
interface is something

387
00:16:24,630 --> 00:16:27,810
that I've kind of done here as well,
is the product, the distribution code

388
00:16:27,810 --> 00:16:32,130
that I'll have at the end, it's a
high level interface on a high level

389
00:16:32,130 --> 00:16:32,700
interface.

390
00:16:32,700 --> 00:16:36,630
It makes it accessible in that
you just have to say, build model.

391
00:16:36,630 --> 00:16:38,880
And it takes care of
everything underneath the hood.

392
00:16:38,880 --> 00:16:40,797
It just builds the model,
whatever that means.

393
00:16:40,797 --> 00:16:43,005
And if you want to go look
underneath the hood, which

394
00:16:43,005 --> 00:16:45,450
I advise that you do if you're
building this as a project,

395
00:16:45,450 --> 00:16:47,350
you can then see what's going on.

396
00:16:47,350 --> 00:16:52,870
But if you don't and you just want
it to work, that's what this does.

397
00:16:52,870 --> 00:16:56,800
I don't have to sit down and say, oh,
crap, I carried a 1 wrong in my math

398
00:16:56,800 --> 00:16:59,470
here so my machine now says
everything is a square.

399
00:16:59,470 --> 00:17:00,970
Like that would be kind of annoying.

400
00:17:00,970 --> 00:17:02,710
And maybe tracing
between those two things

401
00:17:02,710 --> 00:17:04,450
is very difficult,
especially if you're using

402
00:17:04,450 --> 00:17:07,450
an algorithm that maybe does something
that you don't fully understand.

403
00:17:07,450 --> 00:17:09,460
You're just sitting here
like, that's a lot of math.

404
00:17:09,460 --> 00:17:10,450
I think that's a sigma.

405
00:17:10,450 --> 00:17:11,750
And there's another letter here.

406
00:17:11,750 --> 00:17:12,625
And I don't know why.

407
00:17:12,625 --> 00:17:17,470
And then you go look it up on Wikipedia,
which I've done, and there's just names

408
00:17:17,470 --> 00:17:20,359
and there's no actual
numbers in the math anymore.

409
00:17:20,359 --> 00:17:21,437
It's all symbols.

410
00:17:21,437 --> 00:17:23,020
And it becomes very difficult to read.

411
00:17:23,020 --> 00:17:25,069
And from there it becomes inaccessible.

412
00:17:25,069 --> 00:17:26,800
And then maybe you give up.

413
00:17:26,800 --> 00:17:29,380
Or you just get frustrated and
you go eat a piece of cake.

414
00:17:29,380 --> 00:17:30,320
That's what I did.

415
00:17:30,320 --> 00:17:32,000
And so it's very frustrating.

416
00:17:32,000 --> 00:17:34,660
But the main reason that I chose
these two pieces of software

417
00:17:34,660 --> 00:17:35,980
was that they were usable.

418
00:17:35,980 --> 00:17:38,080
I could figure out how to use them.

419
00:17:38,080 --> 00:17:41,380
Somewhat ironically, figuring
out how to get them downloaded

420
00:17:41,380 --> 00:17:45,051
and working was very difficult. I
spent around 20 hours doing that.

421
00:17:45,051 --> 00:17:47,050
And admittedly, that was
because I didn't really

422
00:17:47,050 --> 00:17:49,030
know how to read
documentation at the time.

423
00:17:49,030 --> 00:17:52,060
I also didn't know how
to read like through code

424
00:17:52,060 --> 00:17:53,909
on like an GitHub page or anything.

425
00:17:53,909 --> 00:17:55,450
But I know that I'm not the only one.

426
00:17:55,450 --> 00:17:58,840
In fact I have about 728 students
that are in roughly the same place

427
00:17:58,840 --> 00:18:01,030
right now in that class called CS50.

428
00:18:01,030 --> 00:18:03,970
And so I think that having
something gathered into one place

429
00:18:03,970 --> 00:18:07,670
with an easy way of installing things
is a much easier introduction to that.

430
00:18:07,670 --> 00:18:09,970
So that's what this
distribution code is also.

431
00:18:09,970 --> 00:18:12,011
In case you're looking
for the distribution curve

432
00:18:12,011 --> 00:18:14,380
and you don't want to listen
to the rest of my talk,

433
00:18:14,380 --> 00:18:16,005
it's of towards the end of the lecture.

434
00:18:16,005 --> 00:18:18,710
It'll be there after I get there.

435
00:18:18,710 --> 00:18:20,140
But if you do, hang around.

436
00:18:20,140 --> 00:18:23,700
So basically, I took
all of the packages,

437
00:18:23,700 --> 00:18:26,620
there's a lot of them that needed
to be installed to get OpenCV work.

438
00:18:26,620 --> 00:18:27,820
That one's the really annoying one.

439
00:18:27,820 --> 00:18:28,480
Keras is fine.

440
00:18:28,480 --> 00:18:31,450
You just do like pip3
install Keras, it works fine.

441
00:18:31,450 --> 00:18:33,682
OpenCV is awful.

442
00:18:33,682 --> 00:18:34,640
Nothing against OpenCV.

443
00:18:34,640 --> 00:18:37,240
I very much appreciate
that the project exists.

444
00:18:37,240 --> 00:18:39,370
I have a super psyched
that I get to use it.

445
00:18:39,370 --> 00:18:41,770
It's just a pain in the ass to install.

446
00:18:41,770 --> 00:18:45,010
And or at least it was
when I was young and naive.

447
00:18:45,010 --> 00:18:48,070
And it was very hard for me to sit
there reading through documentation

448
00:18:48,070 --> 00:18:50,740
and not knowing what they
meant by certain terms.

449
00:18:50,740 --> 00:18:54,010
What does it mean to use a virtual
environments to install things?

450
00:18:54,010 --> 00:18:55,090
Why is that necessary?

451
00:18:55,090 --> 00:18:57,310
If I don't do that did
it break my download?

452
00:18:57,310 --> 00:18:58,870
Why doesn't my download work?

453
00:18:58,870 --> 00:19:01,370
There's all of these terms and
things that get thrown around

454
00:19:01,370 --> 00:19:04,060
because they're taken for granted.

455
00:19:04,060 --> 00:19:05,900
I know that that is very scary.

456
00:19:05,900 --> 00:19:08,710
And at the very least, it's
incredibly frustrating.

457
00:19:08,710 --> 00:19:12,070
And so what I ended up
doing was I said, OK, I'm

458
00:19:12,070 --> 00:19:15,710
going to just try all of the solutions
and whichever one works, works.

459
00:19:15,710 --> 00:19:20,950
So my computer at one point had like
40 different versions of OpenCV on it.

460
00:19:20,950 --> 00:19:23,350
Every programmer I think has
had that sort of experience

461
00:19:23,350 --> 00:19:24,820
where they just downloaded
hundreds of things.

462
00:19:24,820 --> 00:19:26,357
I built from source at one point.

463
00:19:26,357 --> 00:19:27,190
And I was like cool.

464
00:19:27,190 --> 00:19:28,870
I don't know what this
means but I did it.

465
00:19:28,870 --> 00:19:29,745
And that worked well.

466
00:19:29,745 --> 00:19:33,010
That actually was the one that I
ended up using for my final project.

467
00:19:33,010 --> 00:19:35,830
I would not recommend doing that
unless you know what you're doing.

468
00:19:35,830 --> 00:19:37,430
I screwed it up horribly.

469
00:19:37,430 --> 00:19:40,774
And I didn't even realize that I
just was missing half of OpenCV.

470
00:19:40,774 --> 00:19:41,940
I didn't need it apparently.

471
00:19:41,940 --> 00:19:44,190
But like bad, bad deal.

472
00:19:44,190 --> 00:19:47,790
So I've finally gotten to the
point where we have some code.

473
00:19:47,790 --> 00:19:49,780
If you want to there's the bitly links.

474
00:19:49,780 --> 00:19:52,116
These are the actual slides
in case you want them.

475
00:19:52,116 --> 00:19:54,490
They have these links on them
so you can go to the slides

476
00:19:54,490 --> 00:19:55,970
and then click the link.

477
00:19:55,970 --> 00:19:58,342
The GitHub link is my
personal GitHub I didn't

478
00:19:58,342 --> 00:20:00,550
realize we were supposed to
use our actual names when

479
00:20:00,550 --> 00:20:02,870
we created GitHubs for school.

480
00:20:02,870 --> 00:20:04,630
My name is not powerhouse of the cell.

481
00:20:04,630 --> 00:20:05,890
It's actually Nick.

482
00:20:05,890 --> 00:20:07,152
But I got to keep it.

483
00:20:07,152 --> 00:20:08,110
My TF was fine with it.

484
00:20:08,110 --> 00:20:09,760
So we kept it.

485
00:20:09,760 --> 00:20:12,805
And this is the bitly
version, slightly shorter.

486
00:20:12,805 --> 00:20:15,430
So I'll leave those up there
while I kind of talk a little bit.

487
00:20:15,430 --> 00:20:17,471
And eventually I will pull
up some code and we'll

488
00:20:17,471 --> 00:20:20,999
get to coding things, or at least giving
demonstrations of what the code does.

489
00:20:20,999 --> 00:20:24,040
I've found that when you're actually
coding things up in front of people,

490
00:20:24,040 --> 00:20:27,655
you make about 400
more typos per second.

491
00:20:27,655 --> 00:20:29,030
It's really just not a good deal.

492
00:20:29,030 --> 00:20:32,350
So I don't like coding particularly
much in front of other people.

493
00:20:32,350 --> 00:20:35,410
But what you'll find on
that GitHub is basically

494
00:20:35,410 --> 00:20:37,330
a lot of very cheesy read-me files.

495
00:20:37,330 --> 00:20:39,760
I included as many emojis
as I thought were necessary.

496
00:20:39,760 --> 00:20:41,590
I don't generally use emojis.

497
00:20:41,590 --> 00:20:45,054
I also don't use them often when
I write things on my GitHub.

498
00:20:45,054 --> 00:20:46,470
That's the only one that has them.

499
00:20:46,470 --> 00:20:49,286
But GitHub has a nice
interface for including emojis.

500
00:20:49,286 --> 00:20:51,160
And the reason for that
is the problem that I

501
00:20:51,160 --> 00:20:54,070
wanted to solve with
this sort of algorithm

502
00:20:54,070 --> 00:20:56,650
with this machine learning,
computer vision for this seminar

503
00:20:56,650 --> 00:20:59,080
was classifying emojis.

504
00:20:59,080 --> 00:21:00,470
And that's a very broad problem.

505
00:21:00,470 --> 00:21:01,360
And I didn't solve all of it.

506
00:21:01,360 --> 00:21:02,985
I actually didn't even really solve it.

507
00:21:02,985 --> 00:21:04,690
But I started us on that path.

508
00:21:04,690 --> 00:21:07,360
And so I said, OK, well I want
to do something with the emoji

509
00:21:07,360 --> 00:21:10,390
because that's kind of hip, kind
of cool, also kind of dorky.

510
00:21:10,390 --> 00:21:13,360
And that kind of fits me, the
latter one, not the first two.

511
00:21:13,360 --> 00:21:15,970
And so what I ended up
doing was I said, you

512
00:21:15,970 --> 00:21:19,810
know what, we're not going to classify
the like hundreds of emojis that exist.

513
00:21:19,810 --> 00:21:23,340
We're going to just take
like 15 happy looking ones,

514
00:21:23,340 --> 00:21:25,480
15 kind of neutral-ish ones--

515
00:21:25,480 --> 00:21:27,700
I think I actually get
13 of those-- and then 15

516
00:21:27,700 --> 00:21:29,530
kind of angry or negative looking ones.

517
00:21:29,530 --> 00:21:32,240
We're going to call those
three groups classes.

518
00:21:32,240 --> 00:21:34,960
We're going to say, there's
something positive, there's neutral,

519
00:21:34,960 --> 00:21:35,860
and there's negative.

520
00:21:35,860 --> 00:21:40,480
And I want the machine to be able
to tell me is an arbitrary emoji

521
00:21:40,480 --> 00:21:43,570
that I'm looking at positive,
neutral or negative.

522
00:21:43,570 --> 00:21:45,070
And that seems kind of trivial.

523
00:21:45,070 --> 00:21:46,620
Human beings do that all the time.

524
00:21:46,620 --> 00:21:48,340
And we're very subjective about it too.

525
00:21:48,340 --> 00:21:51,870
We're kind of like, ooh, that person
she's looking to me just like angrily.

526
00:21:51,870 --> 00:21:54,570
He's got just like an
aggressive face on him.

527
00:21:54,570 --> 00:21:56,490
He's just chilling
there sipping his tea.

528
00:21:56,490 --> 00:21:58,500
You messed up her shoe.

529
00:21:58,500 --> 00:22:01,830
Something like that can be very
difficult for us to perceive.

530
00:22:01,830 --> 00:22:04,060
And even in that example,
that's totally subjective.

531
00:22:04,060 --> 00:22:07,260
What I just said is basically
up to whoever is viewing it.

532
00:22:07,260 --> 00:22:10,380
And that is where this
becomes a difficult problem,

533
00:22:10,380 --> 00:22:13,210
is how do you provide enough
data to get this to work.

534
00:22:13,210 --> 00:22:15,540
So I actually did a
couple of disservices

535
00:22:15,540 --> 00:22:17,987
to you, the user of this code.

536
00:22:17,987 --> 00:22:20,070
One, the machine that you
are provided with, well,

537
00:22:20,070 --> 00:22:24,480
it does work and will train and
learn , it doesn't it very well.

538
00:22:24,480 --> 00:22:26,850
By the end it's basically
randomly guessing.

539
00:22:26,850 --> 00:22:32,040
It'll say, it's about, I don't know,
33% chance of this, 33% chance of that,

540
00:22:32,040 --> 00:22:34,890
and 33% chance of that,
which you'll notice

541
00:22:34,890 --> 00:22:39,220
there are three categories, 100%
across all three categories, about 33%.

542
00:22:39,220 --> 00:22:42,130
So the machine doesn't do a very
good job of figuring it out.

543
00:22:42,130 --> 00:22:45,827
And sometimes which I found when I was
testing which one I was going to demo,

544
00:22:45,827 --> 00:22:47,910
sometimes the machine that
I provided you actually

545
00:22:47,910 --> 00:22:50,832
just gets it completely wrong,
but it's like super sure of that.

546
00:22:50,832 --> 00:22:53,040
I give it a very happy
looking emoji and it was like,

547
00:22:53,040 --> 00:22:55,170
I'm 100% sure that is negative.

548
00:22:55,170 --> 00:22:56,884
Negative look emoji right there.

549
00:22:56,884 --> 00:22:58,800
And so that's kind of
one of the funny things.

550
00:22:58,800 --> 00:23:01,424
I think that a lot of times when
you're doing machine learning,

551
00:23:01,424 --> 00:23:04,620
you feel like you're training a toddler,
particularly annoying toddler, that

552
00:23:04,620 --> 00:23:08,100
is not a danger really to
itself or anything around it,

553
00:23:08,100 --> 00:23:09,854
but it particularly hates you.

554
00:23:09,854 --> 00:23:13,020
In fact, it wants to make sure that you
never get whatever assignment you're

555
00:23:13,020 --> 00:23:14,250
trying to do done.

556
00:23:14,250 --> 00:23:17,190
That's how I've kind of learned
machine learning works, especially

557
00:23:17,190 --> 00:23:21,270
when I was working on this being like,
oh, crap I have a seminar to teach.

558
00:23:21,270 --> 00:23:22,320
This was not working.

559
00:23:22,320 --> 00:23:24,180
It just refused to do
what I wanted it to.

560
00:23:24,180 --> 00:23:25,980
And that was very frustrating.

561
00:23:25,980 --> 00:23:30,270
But something that shares a lot of
parallels with this toddler analogy.

562
00:23:30,270 --> 00:23:32,400
If you were to take a
toddler and every time

563
00:23:32,400 --> 00:23:34,580
that they just like didn't do
what you wanted them to do, just

564
00:23:34,580 --> 00:23:36,590
be all right, well,
getting a new one, and just

565
00:23:36,590 --> 00:23:39,570
like go get a new toddler, that'd
be weird in a number of ways.

566
00:23:39,570 --> 00:23:41,220
It's also kind of weird with this too.

567
00:23:41,220 --> 00:23:42,450
It's a little less extreme.

568
00:23:42,450 --> 00:23:44,670
You can trade out machines, no problem.

569
00:23:44,670 --> 00:23:47,730
But you'll find that the machine
that I have handed you actually does

570
00:23:47,730 --> 00:23:49,680
have a couple of things
that can be modified

571
00:23:49,680 --> 00:23:51,441
within it to make it a lot better.

572
00:23:51,441 --> 00:23:54,690
And it doesn't mean that you just copy
and paste the same layers over and over

573
00:23:54,690 --> 00:23:57,614
again and make your machine
much longer, but rather you

574
00:23:57,614 --> 00:23:59,030
can make a little bit more clever.

575
00:23:59,030 --> 00:24:05,670


576
00:24:05,670 --> 00:24:10,170
So basically, when we were training
the machines-- or when I was training--

577
00:24:10,170 --> 00:24:10,680
I say we.

578
00:24:10,680 --> 00:24:14,490
I mean I. When I was like lonely in
my room training the machine models,

579
00:24:14,490 --> 00:24:17,610
I was saying, all right, I've
got to get this to work somehow.

580
00:24:17,610 --> 00:24:19,290
I don't know what I'm going to do.

581
00:24:19,290 --> 00:24:21,720
And if you'll recall, I
said I had about 15 images

582
00:24:21,720 --> 00:24:24,300
of positive and neutral and sad.

583
00:24:24,300 --> 00:24:26,820
So I had 45 images total, roughly.

584
00:24:26,820 --> 00:24:28,710
That's kind of a very
small amount of data.

585
00:24:28,710 --> 00:24:30,480
So I actually used some
techniques in the code--

586
00:24:30,480 --> 00:24:32,729
and I'll try and point them
out when I go over there--

587
00:24:32,729 --> 00:24:36,044
that allow me to augment my data.

588
00:24:36,044 --> 00:24:36,960
We do that first step.

589
00:24:36,960 --> 00:24:38,070
We augmented our data.

590
00:24:38,070 --> 00:24:40,470
What I didn't do was add very good data.

591
00:24:40,470 --> 00:24:43,920
My data collection as partially
a product of my laziness

592
00:24:43,920 --> 00:24:46,560
but now retrospectively,
a product of I wanted

593
00:24:46,560 --> 00:24:49,980
to teach, it was a teaching
moment, is the data

594
00:24:49,980 --> 00:24:52,290
was collected kind of
arbitrarily without much thought

595
00:24:52,290 --> 00:24:53,873
as to what patterns were being picked.

596
00:24:53,873 --> 00:24:56,821
I kind of ignored my own sort
of second rule, if you will.

597
00:24:56,821 --> 00:24:59,070
And yes, that is mostly
because I was just being lazy.

598
00:24:59,070 --> 00:25:00,890
I just had to pick a bunch of
data and like throw it in there

599
00:25:00,890 --> 00:25:01,860
and hope it worked.

600
00:25:01,860 --> 00:25:03,870
And it doesn't work that well.

601
00:25:03,870 --> 00:25:05,544
That strategy won't help you very much.

602
00:25:05,544 --> 00:25:06,960
I didn't think about it that hard.

603
00:25:06,960 --> 00:25:12,360
But you can get there by
thinking about it just minimally.

604
00:25:12,360 --> 00:25:14,530
If you have a smiley
face emoji, for example,

605
00:25:14,530 --> 00:25:17,220
since we're talking
about this in this case,

606
00:25:17,220 --> 00:25:20,010
it's not too hard to find
enough different smiley faces

607
00:25:20,010 --> 00:25:22,425
to cover the general case.

608
00:25:22,425 --> 00:25:26,730
It's mostly that smiley little half
circle on the bottom of their face.

609
00:25:26,730 --> 00:25:29,970
And so covering as much data as you
can while still keeping it small

610
00:25:29,970 --> 00:25:32,970
is not that difficult. You just have
to be a little bit smarter about it

611
00:25:32,970 --> 00:25:33,829
than I was.

612
00:25:33,829 --> 00:25:36,120
And I believe the data is
included in that GitHub page.

613
00:25:36,120 --> 00:25:39,189
So you'll see my crappy
collected data there as well.

614
00:25:39,189 --> 00:25:40,980
I really hope it's not
a copyright problem.

615
00:25:40,980 --> 00:25:44,070
We'll find out if someone
shows up to arrest me.

616
00:25:44,070 --> 00:25:52,650
So we're going to actually switch over
to looking at the actual GitHub page.

617
00:25:52,650 --> 00:25:54,424
And this is where the
cheesiness comes in.

618
00:25:54,424 --> 00:25:55,590
I called it machine feeling.

619
00:25:55,590 --> 00:25:57,780
I was feeling a little dorky that day.

620
00:25:57,780 --> 00:25:59,910
I'm feeling a little dorky every day.

621
00:25:59,910 --> 00:26:03,360
And then I included as many emojis as
I could because I was like, oh, crap.

622
00:26:03,360 --> 00:26:05,190
Like you can include emojis in markdown.

623
00:26:05,190 --> 00:26:05,974
That's cool.

624
00:26:05,974 --> 00:26:07,140
So I through those in there.

625
00:26:07,140 --> 00:26:09,249
And you can read this on
your own if you want to.

626
00:26:09,249 --> 00:26:09,915
Maybe you don't.

627
00:26:09,915 --> 00:26:11,310
I don't blame me if you don't.

628
00:26:11,310 --> 00:26:14,220
There is this requirement
.txt right here.

629
00:26:14,220 --> 00:26:18,150
And so that allows you to
basically just immediately install

630
00:26:18,150 --> 00:26:20,570
all the requirements
for this entire project.

631
00:26:20,570 --> 00:26:21,120
That's it.

632
00:26:21,120 --> 00:26:23,720
No 20 hours of searching
through Google or anything.

633
00:26:23,720 --> 00:26:24,702
Just that.

634
00:26:24,702 --> 00:26:26,160
And then we have our source folder.

635
00:26:26,160 --> 00:26:29,010
And originally I was going
to provide some skeleton code

636
00:26:29,010 --> 00:26:31,350
to kind of complement the actual code.

637
00:26:31,350 --> 00:26:33,434
I decided against that
because I ran out of time.

638
00:26:33,434 --> 00:26:35,100
And I also thought it was a little mean.

639
00:26:35,100 --> 00:26:38,210
So instead we have a
fully working sample code.

640
00:26:38,210 --> 00:26:39,960
And I've separated
things out a little bit

641
00:26:39,960 --> 00:26:42,900
just to make it easier to
comprehend what's going on.

642
00:26:42,900 --> 00:26:45,390
And so right up here, up
at the top, you basically

643
00:26:45,390 --> 00:26:46,732
have the computer vision folder.

644
00:26:46,732 --> 00:26:47,940
There's only one thing in it.

645
00:26:47,940 --> 00:26:52,110
It is just the thing that
provides computer vision basically

646
00:26:52,110 --> 00:26:53,520
properties to our code.

647
00:26:53,520 --> 00:26:54,660
And then you have the data.

648
00:26:54,660 --> 00:26:56,760
So we'll a short look into here.

649
00:26:56,760 --> 00:26:57,810
It's not very large.

650
00:26:57,810 --> 00:27:00,330
There is testing data and
there is training data.

651
00:27:00,330 --> 00:27:01,650
It's pretty segmented out.

652
00:27:01,650 --> 00:27:04,350
But you can see that it's
basically just a bunch of .pngs.

653
00:27:04,350 --> 00:27:06,540
This one happens to be pretty sad.

654
00:27:06,540 --> 00:27:10,560
And they're are also all cropped so that
they look the same height and width.

655
00:27:10,560 --> 00:27:12,510
They're all 200 by 200 pixels.

656
00:27:12,510 --> 00:27:15,030
You don't have to do that,
although I would recommend it

657
00:27:15,030 --> 00:27:18,000
as just one of the things
that you can normalize across.

658
00:27:18,000 --> 00:27:20,040
Because what is the size
the data is different?

659
00:27:20,040 --> 00:27:21,750
There are techniques
for dealing with that.

660
00:27:21,750 --> 00:27:23,791
You can just shrink it to
the right aspect ratio.

661
00:27:23,791 --> 00:27:25,750
You can do kind of a variety of things.

662
00:27:25,750 --> 00:27:29,070
But if you can, you want to keep your
data pretty consistent across things

663
00:27:29,070 --> 00:27:30,480
that don't matter.

664
00:27:30,480 --> 00:27:34,260
So like whether this is this size
or this size, it's still a sad face.

665
00:27:34,260 --> 00:27:35,290
It's still negative.

666
00:27:35,290 --> 00:27:38,800
So the classification doesn't
depend really on what size it is.

667
00:27:38,800 --> 00:27:40,800
So I wanted to keep all
of my data the same size

668
00:27:40,800 --> 00:27:43,680
so that in the event that
the machine was like, oh,

669
00:27:43,680 --> 00:27:46,500
images that are 201
pixels, those are sad.

670
00:27:46,500 --> 00:27:48,970
Images that are 400
pixels, they're happy.

671
00:27:48,970 --> 00:27:50,760
That be really
unfortunate because that's

672
00:27:50,760 --> 00:27:53,790
not even close to the actual
pattern that we're going after.

673
00:27:53,790 --> 00:27:57,060
And you could understand that
even complicated examples, you'll

674
00:27:57,060 --> 00:27:59,542
want to be able to be aware
of which patterns matter

675
00:27:59,542 --> 00:28:02,250
and which ones don't, and which
ones are you actually introducing

676
00:28:02,250 --> 00:28:03,360
into your models.

677
00:28:03,360 --> 00:28:05,340
And that might sound
kind of complicated.

678
00:28:05,340 --> 00:28:08,700
But the example that I just
gave there, not too difficult.

679
00:28:08,700 --> 00:28:11,130
Making sure that the size of
the image doesn't actually

680
00:28:11,130 --> 00:28:14,510
play into the machine learning
kind of what it actually learns.

681
00:28:14,510 --> 00:28:15,630
That was very intuitive.

682
00:28:15,630 --> 00:28:17,229
And most of these are like that.

683
00:28:17,229 --> 00:28:18,270
They're pretty intuitive.

684
00:28:18,270 --> 00:28:21,450
This sort of project, even though
it sounds kind of complicated,

685
00:28:21,450 --> 00:28:22,450
isn't too bad.

686
00:28:22,450 --> 00:28:24,810
It's not particularly
complex if you make

687
00:28:24,810 --> 00:28:27,530
it analogous to like
human beings or toddlers.

688
00:28:27,530 --> 00:28:31,140
You can think of it as like your
young niece, nephew, daughter,

689
00:28:31,140 --> 00:28:36,460
if you have a child, brother, sibling,
smaller people, little human beings,

690
00:28:36,460 --> 00:28:37,610
and how you teach them.

691
00:28:37,610 --> 00:28:39,734
And if you can teach a
child that, you can probably

692
00:28:39,734 --> 00:28:41,850
teach the machine with some caveats.

693
00:28:41,850 --> 00:28:44,670
So in the rest of this
folder, we have the ML.

694
00:28:44,670 --> 00:28:47,110
It was right around here.

695
00:28:47,110 --> 00:28:49,402
It's just that, just the
machine learning portion.

696
00:28:49,402 --> 00:28:52,110
There's a file in there that does
all the machine learning parts.

697
00:28:52,110 --> 00:28:55,139
I built a class for us to just
have a very high level model.

698
00:28:55,139 --> 00:28:57,180
But it also gives you a
low enough level that you

699
00:28:57,180 --> 00:28:58,930
can tinker with the model itself.

700
00:28:58,930 --> 00:29:00,560
So depending on what you want.

701
00:29:00,560 --> 00:29:04,200
And we have a configure file which
has a bunch of variables inside of it.

702
00:29:04,200 --> 00:29:06,090
We might take a look
inside in a little bit.

703
00:29:06,090 --> 00:29:08,520
And then we have our
actual run file which

704
00:29:08,520 --> 00:29:11,640
allows us to execute the
entire piece of software.

705
00:29:11,640 --> 00:29:17,190
And then I have a test.png which is just
a test image that I was using earlier.

706
00:29:17,190 --> 00:29:19,210
I left it in the it's kind of cute.

707
00:29:19,210 --> 00:29:21,930
So that's all of the code on the GitHub.

708
00:29:21,930 --> 00:29:24,630
You're welcome to clone it,
download it, make pull requests,

709
00:29:24,630 --> 00:29:27,370
preferably don't sell it for a profit.

710
00:29:27,370 --> 00:29:28,620
If you do, that's really cool.

711
00:29:28,620 --> 00:29:30,900
I'm just proud that it worked.

712
00:29:30,900 --> 00:29:32,220
Any of those things is awesome.

713
00:29:32,220 --> 00:29:35,260
But we're actually going to show
just a little bit of code over here.

714
00:29:35,260 --> 00:29:40,210
So I'm already within the
directory of the actual code.

715
00:29:40,210 --> 00:29:42,390
So if you were to have
GitCloned this, you'd

716
00:29:42,390 --> 00:29:46,000
have ended up somewhere around here.

717
00:29:46,000 --> 00:29:48,150
So this is the actual
kind of root directory.

718
00:29:48,150 --> 00:29:50,340
I know this is kind of
a boring terminal screen

719
00:29:50,340 --> 00:29:52,210
but it'll get more interesting shortly.

720
00:29:52,210 --> 00:29:54,420
And so here you can see
there's just a bunch of files

721
00:29:54,420 --> 00:29:58,800
that tell us what license,
MIT, what the read-me says,

722
00:29:58,800 --> 00:30:02,069
and like there's some caching here,
source files and requirements.

723
00:30:02,069 --> 00:30:04,110
We're going to go into
SRC because that's source.

724
00:30:04,110 --> 00:30:07,026
And we're actually going to go
specifically into the sample directory.

725
00:30:07,026 --> 00:30:08,380
And we're back where we started.

726
00:30:08,380 --> 00:30:14,340
So if I wanted to run this, I can
basically say, /run.py and-- oh, yep.

727
00:30:14,340 --> 00:30:17,530
Like I said, typing in front of
people, you make so many more mistakes.

728
00:30:17,530 --> 00:30:20,010
And this will bring up
kind of our help screen

729
00:30:20,010 --> 00:30:23,080
which is meant to be as
non-obscure as possible.

730
00:30:23,080 --> 00:30:26,704
However, I am no expert coder
so it might be a little obscure.

731
00:30:26,704 --> 00:30:28,620
It's intended to be
pretty easy to use though.

732
00:30:28,620 --> 00:30:30,477
So there's -o for an output file.

733
00:30:30,477 --> 00:30:32,310
And this is all just
kind of software stuff.

734
00:30:32,310 --> 00:30:33,510
Not particularly interesting.

735
00:30:33,510 --> 00:30:36,040
If you're interested afterward,
please do go ahead and let me know.

736
00:30:36,040 --> 00:30:38,100
And I'll be happy to
talk with you about it.

737
00:30:38,100 --> 00:30:42,080
But if you wanted to just run
this, then we can say, OK, well,

738
00:30:42,080 --> 00:30:45,450
I want my output file to be seminar.

739
00:30:45,450 --> 00:30:48,600
And I want it to go through, we're
going to say one round of training,

740
00:30:48,600 --> 00:30:52,859
unless you guys want to sit
here for the next 45 minutes.

741
00:30:52,859 --> 00:30:55,650
And I don't want it to load another
model, one that already exists.

742
00:30:55,650 --> 00:30:57,780
I want it to just kind
of do its own thing.

743
00:30:57,780 --> 00:31:00,560
And that's all I really need to do.

744
00:31:00,560 --> 00:31:02,580
From the command line,
that will train it.

745
00:31:02,580 --> 00:31:04,930
And I say that and this'll
be the one time that it

746
00:31:04,930 --> 00:31:07,810
breaks which is absolutely fantastic.

747
00:31:07,810 --> 00:31:10,477
But it tells you that it's going
to use the TensorFlow back end.

748
00:31:10,477 --> 00:31:12,393
It found all of the data
that I had handed it.

749
00:31:12,393 --> 00:31:13,470
And now begins training.

750
00:31:13,470 --> 00:31:16,470
And so the reason I bring this up
is because I think it's kind of--

751
00:31:16,470 --> 00:31:18,746
you're not quite sure what
each of these things mean.

752
00:31:18,746 --> 00:31:21,870
And what's kind of funny is you can
customize each of these metrics anyway.

753
00:31:21,870 --> 00:31:24,552


754
00:31:24,552 --> 00:31:26,510
So basically, if you're
looking at this screen,

755
00:31:26,510 --> 00:31:30,230
you see their kind of cool
little animation if you will.

756
00:31:30,230 --> 00:31:33,590
But this is really just telling you how
many steps through the training round

757
00:31:33,590 --> 00:31:34,710
it's gotten.

758
00:31:34,710 --> 00:31:38,750
[? Epoch or ?] Epic is going to
usually be the actual training

759
00:31:38,750 --> 00:31:40,070
round that it's on.

760
00:31:40,070 --> 00:31:44,640
And so within a training round, your
machine is basically saying, all right,

761
00:31:44,640 --> 00:31:47,310
I'm given some amount of
data that you specify.

762
00:31:47,310 --> 00:31:50,200
And I've got to figure out
what the hell this means.

763
00:31:50,200 --> 00:31:51,920
I've got to classify it.

764
00:31:51,920 --> 00:31:55,130
And what it does is it sits there and
it says, hm, that looks like a cat.

765
00:31:55,130 --> 00:31:55,820
That's a bird.

766
00:31:55,820 --> 00:31:56,810
That's a dog.

767
00:31:56,810 --> 00:32:00,740
And, or in this case, that's positive,
that's negative, that's neutral.

768
00:32:00,740 --> 00:32:02,220
And it throws out those answers.

769
00:32:02,220 --> 00:32:03,620
And it encodes them somehow--

770
00:32:03,620 --> 00:32:06,512
0, 1, 2, totally reasonable.

771
00:32:06,512 --> 00:32:09,470
And what it does is it says, OK, here
are my answers to all of the data

772
00:32:09,470 --> 00:32:10,520
that I've been handed.

773
00:32:10,520 --> 00:32:11,922
And then it looks at the answers.

774
00:32:11,922 --> 00:32:14,630
And it says, oh, crap, I missed
this one, this one, and this one.

775
00:32:14,630 --> 00:32:16,100
So I've got to do some magic.

776
00:32:16,100 --> 00:32:17,850
I'm going to re-weight
some of my numbers.

777
00:32:17,850 --> 00:32:20,180
I'm going to do some
hardcore math stuff.

778
00:32:20,180 --> 00:32:21,704
And then I'm going to try again.

779
00:32:21,704 --> 00:32:23,120
And that's the new training round.

780
00:32:23,120 --> 00:32:27,690
Now you'll notice that kind of towards
the right side of each bar or each row,

781
00:32:27,690 --> 00:32:30,680
there's this val loss and vowel ACC.

782
00:32:30,680 --> 00:32:33,500
And they correspond to
loss and ACC over here.

783
00:32:33,500 --> 00:32:37,820
Loss being, well, loss which is a
metric used in the actual algorithm

784
00:32:37,820 --> 00:32:39,230
or the math underneath.

785
00:32:39,230 --> 00:32:43,580
And ACC being accuracy or the
accuracy of the model given

786
00:32:43,580 --> 00:32:47,690
that it is categorically trying
to tell what sort of image it

787
00:32:47,690 --> 00:32:49,350
is with multiple categories.

788
00:32:49,350 --> 00:32:53,421
And you can specify all this within
the file that builds this model.

789
00:32:53,421 --> 00:32:55,670
But for now we're going to
just kind of take it as is.

790
00:32:55,670 --> 00:32:58,820
You don't need to use that
accuracy metric, for example.

791
00:32:58,820 --> 00:33:01,937
The val versions of each of those
are the validation versions.

792
00:33:01,937 --> 00:33:03,770
They're the ones that
say, all right, here's

793
00:33:03,770 --> 00:33:05,870
one that you've never seen before.

794
00:33:05,870 --> 00:33:06,887
How do you do on that?

795
00:33:06,887 --> 00:33:08,970
And for that one, it doesn't
readjust its weights,

796
00:33:08,970 --> 00:33:10,469
That just evaluates it a little bit.

797
00:33:10,469 --> 00:33:12,400
It just checks that
you're not overfitting.

798
00:33:12,400 --> 00:33:14,150
So that was the first
term that was thrown

799
00:33:14,150 --> 00:33:16,570
at me when I was
starting to learn this is

800
00:33:16,570 --> 00:33:18,590
what does it mean to overfit your data.

801
00:33:18,590 --> 00:33:19,980
And it's kind of intuitive.

802
00:33:19,980 --> 00:33:21,710
You're doing fitting too much.

803
00:33:21,710 --> 00:33:25,050
And if you think of this
training process as fitting,

804
00:33:25,050 --> 00:33:26,780
then you're just training it too much.

805
00:33:26,780 --> 00:33:28,820
And you can think of this
as like with a toddler

806
00:33:28,820 --> 00:33:32,750
if you give it too much of
a limited pattern, is maybe

807
00:33:32,750 --> 00:33:38,000
you tell it, the machine, that
everything is kind of so and so.

808
00:33:38,000 --> 00:33:40,670
You give it all the data
pieces that it gets.

809
00:33:40,670 --> 00:33:44,570
And it just memorizes the data,
but not the actual patterns.

810
00:33:44,570 --> 00:33:45,737
People do this all the time.

811
00:33:45,737 --> 00:33:48,570
You give them a bunch of chemistry
facts, they memorize those facts.

812
00:33:48,570 --> 00:33:51,380
If you ask them an extrapolation
on those facts, they have no idea.

813
00:33:51,380 --> 00:33:54,800
That's a very common problem, especially
in public school systems for example.

814
00:33:54,800 --> 00:33:56,540
Small, political jab.

815
00:33:56,540 --> 00:33:58,970
But that is something that
happens with machines too.

816
00:33:58,970 --> 00:34:01,460
If they just end up
memorizing their data,

817
00:34:01,460 --> 00:34:06,260
yes they get it right, at least on
this kind of loss accuracy metric,

818
00:34:06,260 --> 00:34:08,671
but they won't get it right
on the validation accuracy

819
00:34:08,671 --> 00:34:10,670
because that is stuff
they've never seen before.

820
00:34:10,670 --> 00:34:12,003
They couldn't have memorized it.

821
00:34:12,003 --> 00:34:14,810
It's basically the same
idea behind test taking.

822
00:34:14,810 --> 00:34:15,800
I give you some data.

823
00:34:15,800 --> 00:34:18,300
I expect you to learn the
patterns, not the actual data.

824
00:34:18,300 --> 00:34:21,170
And then I give you a test that
has data you've never seen before

825
00:34:21,170 --> 00:34:23,920
but has the same patterns, you
should be able to figure it out.

826
00:34:23,920 --> 00:34:27,850
And so you can see or unfortunately
it got kind of locked over to the next

827
00:34:27,850 --> 00:34:31,100
one, but we're going to say that these
0.3s threes are roughly about the same.

828
00:34:31,100 --> 00:34:36,610
You'll notice they are, it's
like 0.32, 0.35, 0.38, 0.38.

829
00:34:36,610 --> 00:34:39,120
And you'll notice
they're roughly guessing.

830
00:34:39,120 --> 00:34:43,317
The machine is basically saying, hey,
if I say that this one is neutral,

831
00:34:43,317 --> 00:34:45,900
this one is negative, this one's
positive-- neutral, negative,

832
00:34:45,900 --> 00:34:46,620
positive--

833
00:34:46,620 --> 00:34:49,830
I get it roughly right
which is not so good.

834
00:34:49,830 --> 00:34:52,170
And it's because the model
that I handed you is, well,

835
00:34:52,170 --> 00:34:53,489
not particularly intelligent.

836
00:34:53,489 --> 00:34:55,650
Also the data is not very
well collected either.

837
00:34:55,650 --> 00:34:57,480
And you'll notice that
even the accuracy--

838
00:34:57,480 --> 00:35:01,650
it didn't really get the chance to
memorize everything, thank god--

839
00:35:01,650 --> 00:35:02,920
is still pretty low.

840
00:35:02,920 --> 00:35:04,960
It's roughly guessing here too.

841
00:35:04,960 --> 00:35:06,220
So that's not too good.

842
00:35:06,220 --> 00:35:08,560
And it asked me, do I
want to save the model.

843
00:35:08,560 --> 00:35:11,280
And you'll notice that one my
points about the myths of machine

844
00:35:11,280 --> 00:35:13,320
learning, that didn't take that long.

845
00:35:13,320 --> 00:35:15,970
We were talking here for a
couple of minutes and it's done.

846
00:35:15,970 --> 00:35:16,786
It's now trained.

847
00:35:16,786 --> 00:35:18,660
So now we have kind of
a computer vision part

848
00:35:18,660 --> 00:35:22,800
of this, which is somewhat annoying
because I did hack it together.

849
00:35:22,800 --> 00:35:23,880
But that's OK.

850
00:35:23,880 --> 00:35:29,290
So we have our kind of live feed of
the screen as it's going right now.

851
00:35:29,290 --> 00:35:32,310
And this allows us to
take pictures of things.

852
00:35:32,310 --> 00:35:34,170
So there is an actual
screen shot software

853
00:35:34,170 --> 00:35:36,003
that you could use that
is easier than this,

854
00:35:36,003 --> 00:35:40,320
but it was a easy way for me to
introduce the ideas of computer vision

855
00:35:40,320 --> 00:35:41,440
specifically.

856
00:35:41,440 --> 00:35:45,294
So what I can do is I can say,
all right, let's pull up an emoji.

857
00:35:45,294 --> 00:35:47,460
Because I want to take a
picture of that emoji and I

858
00:35:47,460 --> 00:35:49,376
want my machine to tell
me what that emoji is.

859
00:35:49,376 --> 00:35:51,810
Is it positive, negative, or neutral?

860
00:35:51,810 --> 00:35:54,799
And I can say smiley emoji into Google.

861
00:35:54,799 --> 00:35:57,090
And the reason I do this live
is to prove to you that I

862
00:35:57,090 --> 00:35:58,950
didn't just hardcoded into the machine.

863
00:35:58,950 --> 00:36:01,290
I'm willing to bet it
will not get it correct.

864
00:36:01,290 --> 00:36:03,770
But if it does, kudos.

865
00:36:03,770 --> 00:36:04,770
So we have this emoji.

866
00:36:04,770 --> 00:36:07,579
And you'll see it pops up
in our feed right over here.

867
00:36:07,579 --> 00:36:09,870
It actually pops up I think
an infinite number of times

868
00:36:09,870 --> 00:36:11,453
if you were to like look close enough.

869
00:36:11,453 --> 00:36:13,320
But the reason I have
it pop up in the feed

870
00:36:13,320 --> 00:36:18,820
is that I can drag over the actual
feed and have it select that picture.

871
00:36:18,820 --> 00:36:20,445
And so it takes that picture.

872
00:36:20,445 --> 00:36:22,320
Oh, I'm so psyched that
I got that one right.

873
00:36:22,320 --> 00:36:24,070
That's lit.

874
00:36:24,070 --> 00:36:27,120
So you'll notice that in
the actual terminal output,

875
00:36:27,120 --> 00:36:31,920
it gave me probabilities that
the thing was correct in being

876
00:36:31,920 --> 00:36:33,747
positive or negative or neutral.

877
00:36:33,747 --> 00:36:36,830
I'm really just psyched that it got
it right with really high probability.

878
00:36:36,830 --> 00:36:39,150
If you're above like
70% probability and you

879
00:36:39,150 --> 00:36:42,440
have a good enough number of labels,
you should be pretty psyched.

880
00:36:42,440 --> 00:36:45,690
The fact that this got
away with 94% likelihood,

881
00:36:45,690 --> 00:36:47,220
it was probably just guessing.

882
00:36:47,220 --> 00:36:51,300
It's like the toddler or this small
child that's like, it's that one.

883
00:36:51,300 --> 00:36:52,960
And they happen to be correct.

884
00:36:52,960 --> 00:36:56,940
And they're like, yes, I'm so smart.

885
00:36:56,940 --> 00:36:59,280
Like this machine, it's
really not that great.

886
00:36:59,280 --> 00:37:02,880
But to its credit and to
the seminar's credit is

887
00:37:02,880 --> 00:37:06,930
we have a dumb machine that
I've handled very little data

888
00:37:06,930 --> 00:37:13,060
and I've trained for a total of like
four minutes in front of all of us.

889
00:37:13,060 --> 00:37:13,990
And it got it right.

890
00:37:13,990 --> 00:37:15,400
It was able to figure stuff out.

891
00:37:15,400 --> 00:37:16,540
It figured out a pattern.

892
00:37:16,540 --> 00:37:19,244
And it said that it also could
have been a little bit negative.

893
00:37:19,244 --> 00:37:21,910
And you'll notice that there are
some attributes that are shared

894
00:37:21,910 --> 00:37:24,040
among smiley faces and frowny faces.

895
00:37:24,040 --> 00:37:26,970
They both have eyes and in particular
emojis are pretty standard.

896
00:37:26,970 --> 00:37:28,330
They have that same rough shape.

897
00:37:28,330 --> 00:37:29,530
They're the same color.

898
00:37:29,530 --> 00:37:33,040
And they do kind of have the
same width of smile or frown

899
00:37:33,040 --> 00:37:35,210
even though it's in
different orientations.

900
00:37:35,210 --> 00:37:38,530
So the machine didn't do a terrible job.

901
00:37:38,530 --> 00:37:40,300
And that's kind of nuts.

902
00:37:40,300 --> 00:37:44,440
And hopefully that proves at least
in a small way that it is accessible

903
00:37:44,440 --> 00:37:46,000
and it is easy to do.

904
00:37:46,000 --> 00:37:48,106
You might have to sit
there and tinker with code.

905
00:37:48,106 --> 00:37:50,230
But if you're not sitting
there tinkering with code

906
00:37:50,230 --> 00:37:51,580
are you really coding?

907
00:37:51,580 --> 00:37:55,440
If you're not sitting there
debugging things, why are you here?

908
00:37:55,440 --> 00:37:57,330
So the debugging is a
good amount of coding.

909
00:37:57,330 --> 00:37:59,163
And this, just like any
other piece of code,

910
00:37:59,163 --> 00:38:03,260
can be done and debugged even by
people that just started programming.

911
00:38:03,260 --> 00:38:06,887
You can do this at this level
in a couple of hours, maybe

912
00:38:06,887 --> 00:38:07,595
a couple of days.

913
00:38:07,595 --> 00:38:09,594
You might have to research
a little bit and say,

914
00:38:09,594 --> 00:38:11,337
oh, crap, what does it mean to overfit.

915
00:38:11,337 --> 00:38:14,420
What did that guy say, the crazy dude
that talked about cats and triangles

916
00:38:14,420 --> 00:38:15,740
for a while?

917
00:38:15,740 --> 00:38:17,260
Well, that's OK.

918
00:38:17,260 --> 00:38:18,790
That's how this works.

919
00:38:18,790 --> 00:38:22,510
But it's not any more difficult
than if I said, oh, go use an API

920
00:38:22,510 --> 00:38:26,200
and retrieve some information
via PUT request for me.

921
00:38:26,200 --> 00:38:27,970
Just as complicated sounding.

922
00:38:27,970 --> 00:38:30,447
But it's all the same idea.

923
00:38:30,447 --> 00:38:32,780
You have to just sit down and
learn it for a little bit.

924
00:38:32,780 --> 00:38:35,080
And in this case, you have
a pretty decent example.

925
00:38:35,080 --> 00:38:39,490
That was-- I'm gonig to stress that
that was kind of luck that that worked.

926
00:38:39,490 --> 00:38:40,370
I'm very proud of it.

927
00:38:40,370 --> 00:38:42,430
But still kind of ridiculous.

928
00:38:42,430 --> 00:38:46,190
So let's say that we wanted to improve
on a model that already exists.

929
00:38:46,190 --> 00:38:49,960
So there are smarter people than me that
have written lots and lots of machine

930
00:38:49,960 --> 00:38:50,840
learning algorithms.

931
00:38:50,840 --> 00:38:53,890
There are, I would argue,
more intelligent people than I

932
00:38:53,890 --> 00:38:56,720
am that have done this than less.

933
00:38:56,720 --> 00:39:00,760
So I actually included one
of those pre-trained models

934
00:39:00,760 --> 00:39:02,890
because I figured it be
kind of cool to demo.

935
00:39:02,890 --> 00:39:04,880
And so in the code that
you have, you actually

936
00:39:04,880 --> 00:39:06,880
have the ability to pull
up a pre-trained model.

937
00:39:06,880 --> 00:39:09,130
It's called Inception V3.

938
00:39:09,130 --> 00:39:12,390
I think it's pretty bad
ass that they call that.

939
00:39:12,390 --> 00:39:15,980
A lot of the other ones are
like VGG16 and stuff like that.

940
00:39:15,980 --> 00:39:17,710
But this one is called Inception V3.

941
00:39:17,710 --> 00:39:20,670
I like the sound of that name.

942
00:39:20,670 --> 00:39:24,550
And so you can run this program with
that flag, the pre-trained flag.

943
00:39:24,550 --> 00:39:28,060
It still pulls up the TensorFlow back
end because it is a TensorFlow model.

944
00:39:28,060 --> 00:39:32,510
TensorFlow being the underlying
machine learning software of Keras,

945
00:39:32,510 --> 00:39:34,359
or at least the way that I designed it.

946
00:39:34,359 --> 00:39:37,150
It still loads the data even though
it doesn't have to in this case

947
00:39:37,150 --> 00:39:39,460
because we're going to look
at a different piece of data.

948
00:39:39,460 --> 00:39:40,730
I don't really want to save the model.

949
00:39:40,730 --> 00:39:41,590
It's a little big.

950
00:39:41,590 --> 00:39:43,423
But it's going to bring
up that same feed so

951
00:39:43,423 --> 00:39:46,000
that I can take a picture of my screen.

952
00:39:46,000 --> 00:39:48,010
And basically, what
we're going to do, is

953
00:39:48,010 --> 00:39:51,930
we're going to pull up a picture
of a cat, particularly this cat.

954
00:39:51,930 --> 00:39:52,890
I really like this cat.

955
00:39:52,890 --> 00:39:53,640
It's kind of cute.

956
00:39:53,640 --> 00:39:56,064
So this is an Egyptian cat.

957
00:39:56,064 --> 00:39:58,230
And what I want to do is
I'm going to take my mouse,

958
00:39:58,230 --> 00:40:00,180
I'm going to click, drag it over.

959
00:40:00,180 --> 00:40:02,930
I'm going to take a picture of that cat.

960
00:40:02,930 --> 00:40:04,940
And what I can do is then
I can say, all right,

961
00:40:04,940 --> 00:40:07,790
let's take a look at what
my machine said it was.

962
00:40:07,790 --> 00:40:11,419
And if you'll read carefully,
this one returns five labels.

963
00:40:11,419 --> 00:40:13,460
There are actually 1,000
labels it has access to.

964
00:40:13,460 --> 00:40:14,501
It's not just these five.

965
00:40:14,501 --> 00:40:16,070
I just picked the top five.

966
00:40:16,070 --> 00:40:18,050
And you'll notice that
while the bottom one is

967
00:40:18,050 --> 00:40:25,370
a Windows screen, which is not wrong,
that isn't the most accurate one,

968
00:40:25,370 --> 00:40:26,330
not even close.

969
00:40:26,330 --> 00:40:28,820
Because these are
percentages, not fractions.

970
00:40:28,820 --> 00:40:33,170
The closest one by far was by
94% or roughly the same accuracy

971
00:40:33,170 --> 00:40:35,130
that my other model had.

972
00:40:35,130 --> 00:40:36,360
And it's an Egyptian cat.

973
00:40:36,360 --> 00:40:38,818
And so that's one of the powerful
parts of machine learning

974
00:40:38,818 --> 00:40:41,530
is this model, that was even
faster than the previous one.

975
00:40:41,530 --> 00:40:44,010
And it got just an
arbitrary picture of a cat

976
00:40:44,010 --> 00:40:49,380
that I picked off the internet
correct with 94% accuracy.

977
00:40:49,380 --> 00:40:50,400
That's nuts.

978
00:40:50,400 --> 00:40:54,930
I just took a random picture
and then picked it and it works.

979
00:40:54,930 --> 00:40:57,120
And that's really the
point that I want to stress

980
00:40:57,120 --> 00:40:59,490
here is in a couple
of minutes, admittedly

981
00:40:59,490 --> 00:41:02,700
I've had the advantage of
prepping this for a little while,

982
00:41:02,700 --> 00:41:05,310
you can sit here and
build an algorithm that

983
00:41:05,310 --> 00:41:08,020
identifies things pretty accurately.

984
00:41:08,020 --> 00:41:11,790
And so if you wanted to build a
facial recognition software algorithm,

985
00:41:11,790 --> 00:41:12,360
it's this.

986
00:41:12,360 --> 00:41:13,410
It's the same idea.

987
00:41:13,410 --> 00:41:14,580
You just change the data.

988
00:41:14,580 --> 00:41:15,780
And you change your model a little bit.

989
00:41:15,780 --> 00:41:16,821
Make it a little smarter.

990
00:41:16,821 --> 00:41:19,237
Make it better suited
for specifically faces.

991
00:41:19,237 --> 00:41:19,820
But that's it.

992
00:41:19,820 --> 00:41:22,180
That's really only the
big difference here.

993
00:41:22,180 --> 00:41:25,860
This idea, these buzz words,
machine learning, computer vision,

994
00:41:25,860 --> 00:41:29,550
they're just as accessible to you and
me as beginners in computer science

995
00:41:29,550 --> 00:41:33,270
as they are to someone who has done
a bunch of years of computer science

996
00:41:33,270 --> 00:41:34,950
and is maybe a computer wizard.

997
00:41:34,950 --> 00:41:36,790
Maybe they can do cooler stuff with it.

998
00:41:36,790 --> 00:41:40,680
They can put all sorts of APIs and
other acronyms and scary sounding words

999
00:41:40,680 --> 00:41:42,010
behind it.

1000
00:41:42,010 --> 00:41:43,440
It's the same thing underneath.

1001
00:41:43,440 --> 00:41:49,014
It's all just working as a machine
should work, deterministically

1002
00:41:49,014 --> 00:41:50,430
and hopefully the way you want it.

1003
00:41:50,430 --> 00:41:53,640
So we're all a little bit
at the code because I think

1004
00:41:53,640 --> 00:41:55,569
that that is a worthwhile endeavor.

1005
00:41:55,569 --> 00:41:58,860
This is also like the worst possible way
you can check what director you're in,

1006
00:41:58,860 --> 00:42:02,500
but I'm talking at the same time
so I feel like it's justified.

1007
00:42:02,500 --> 00:42:04,050
I use Visual Studio Code.

1008
00:42:04,050 --> 00:42:07,180
And I really hope that I don't
have anything like ridiculous open.

1009
00:42:07,180 --> 00:42:11,330
We're going to just expand it a
little bit, make it easier to read.

1010
00:42:11,330 --> 00:42:12,907
So we have code.

1011
00:42:12,907 --> 00:42:15,490
And this is usually the part
where people are like, all right,

1012
00:42:15,490 --> 00:42:16,930
now I'm out.

1013
00:42:16,930 --> 00:42:19,400
We got there, we're done.

1014
00:42:19,400 --> 00:42:20,740
And if you're not, awesome.

1015
00:42:20,740 --> 00:42:22,840
I would have thought that the
math would have scared you away.

1016
00:42:22,840 --> 00:42:24,260
And since I've shown
that there's no math,

1017
00:42:24,260 --> 00:42:25,930
I'm hoping that you're still here.

1018
00:42:25,930 --> 00:42:28,420
So we're sitting here looking
at a pretty random file,

1019
00:42:28,420 --> 00:42:30,320
but this is actually the ML model file.

1020
00:42:30,320 --> 00:42:34,000
So this is a file that
tells you or actually

1021
00:42:34,000 --> 00:42:37,600
that codes in all of the
attributes of the actual model.

1022
00:42:37,600 --> 00:42:41,410
This is the class that has
the Save method of the model.

1023
00:42:41,410 --> 00:42:44,215
It has the part that builds
it or predicts on data.

1024
00:42:44,215 --> 00:42:46,090
It has all of the things
that you could maybe

1025
00:42:46,090 --> 00:42:50,360
need to get what we just
showed you in the example.

1026
00:42:50,360 --> 00:42:54,910
So we're looking at here and what I
want to kind of draw our attention to

1027
00:42:54,910 --> 00:42:57,940
is right around here.

1028
00:42:57,940 --> 00:42:59,950
Looking at this part.

1029
00:42:59,950 --> 00:43:03,700
This is pretty much the bulk
of the model that you just saw.

1030
00:43:03,700 --> 00:43:04,720
That's it.

1031
00:43:04,720 --> 00:43:10,500
If you don't count the empty lines,
it's just five lines of code,

1032
00:43:10,500 --> 00:43:12,880
and my mouse highlighting everything.

1033
00:43:12,880 --> 00:43:15,244
So it's pretty simple.

1034
00:43:15,244 --> 00:43:16,410
It's pretty straightforward.

1035
00:43:16,410 --> 00:43:19,190
Now, a lot of these terms are
maybe a little bit more confusing.

1036
00:43:19,190 --> 00:43:22,610
Max pooling with drop out and
then you flatten it like a pancake

1037
00:43:22,610 --> 00:43:24,950
and then you do a dense of
something, god knows what.

1038
00:43:24,950 --> 00:43:27,280
You activate that but
there's a pool size here,

1039
00:43:27,280 --> 00:43:28,530
there's a random number there.

1040
00:43:28,530 --> 00:43:29,430
I think it's magic.

1041
00:43:29,430 --> 00:43:30,140
I don't know why.

1042
00:43:30,140 --> 00:43:32,970
And then it gets very
complicated very quickly.

1043
00:43:32,970 --> 00:43:36,020
But again, like in CS50
and like an any problem,

1044
00:43:36,020 --> 00:43:38,840
really just break it into
smaller and smaller pieces.

1045
00:43:38,840 --> 00:43:41,780
Let's start with maybe the
easiest piece of code here--

1046
00:43:41,780 --> 00:43:42,620
flatten.

1047
00:43:42,620 --> 00:43:44,010
It has no arguments.

1048
00:43:44,010 --> 00:43:46,100
So all we had to do was add flatten.

1049
00:43:46,100 --> 00:43:50,090
And maybe even easier is why are
we adding things to the model.

1050
00:43:50,090 --> 00:43:51,540
How does this model work?

1051
00:43:51,540 --> 00:43:54,830
You can think of it as
like a stack of layers.

1052
00:43:54,830 --> 00:43:57,920
And you take the input and depending
on how your stack is oriented,

1053
00:43:57,920 --> 00:44:01,594
not the data structure stacked
like a literal physical stack,

1054
00:44:01,594 --> 00:44:04,010
you're either dropping inputs
in or you're putting them up

1055
00:44:04,010 --> 00:44:05,870
but either way it's
going through the stack.

1056
00:44:05,870 --> 00:44:08,300
And then that first layer takes
in that input and it says,

1057
00:44:08,300 --> 00:44:10,610
all right, we're going to do
some magic with that and drops

1058
00:44:10,610 --> 00:44:11,390
into the next layer.

1059
00:44:11,390 --> 00:44:12,850
And then that does the same thing.

1060
00:44:12,850 --> 00:44:16,770
So like it's the last one
which is located here.

1061
00:44:16,770 --> 00:44:20,690
And that last one says,
I know what it is.

1062
00:44:20,690 --> 00:44:21,650
It's a triangle.

1063
00:44:21,650 --> 00:44:23,840
And it throws out that number to you.

1064
00:44:23,840 --> 00:44:25,430
And that's all this really does.

1065
00:44:25,430 --> 00:44:29,347
It's just a bunch of math that takes in
data points and does stuff with them.

1066
00:44:29,347 --> 00:44:30,680
So that's why we're adding them.

1067
00:44:30,680 --> 00:44:33,840
And the order in which we add them
changes the order of the stack.

1068
00:44:33,840 --> 00:44:35,490
And that's not too bad.

1069
00:44:35,490 --> 00:44:39,050
But then we have these weird words,
like max pooling, drop out, flatten

1070
00:44:39,050 --> 00:44:40,370
and dense.

1071
00:44:40,370 --> 00:44:43,204
And those aren't as difficult to
understand as you may think either.

1072
00:44:43,204 --> 00:44:45,870
We're going to start with flatten
because it takes no arguments.

1073
00:44:45,870 --> 00:44:48,270
But it will be pretty easy
to move on from there.

1074
00:44:48,270 --> 00:44:53,240
So adding a flattening layer, this
might seem a little ridiculous.

1075
00:44:53,240 --> 00:44:55,080
It might seem unnecessary even.

1076
00:44:55,080 --> 00:44:59,680
But if you're looking at a picture and
that picture captured all sorts of data

1077
00:44:59,680 --> 00:45:07,670
points and maybe it was x long
and x wide, and some amount thick,

1078
00:45:07,670 --> 00:45:10,580
we really only have to worry
about the width and the height.

1079
00:45:10,580 --> 00:45:13,010
And every other piece of
information can probably somehow

1080
00:45:13,010 --> 00:45:15,800
be encoded without having it
be stretched out like this.

1081
00:45:15,800 --> 00:45:18,980
Like let's say that that stretching
out is color, r, g, and b.

1082
00:45:18,980 --> 00:45:23,520
So even if we have our image kind of
laid out in kind of this rectangle,

1083
00:45:23,520 --> 00:45:25,722
there's some three layer of depth to it.

1084
00:45:25,722 --> 00:45:28,430
The first layer is how much red
is in that pixel, how much green,

1085
00:45:28,430 --> 00:45:29,870
and then how much blue.

1086
00:45:29,870 --> 00:45:33,590
But what if we don't really care
or we can encode that data somehow

1087
00:45:33,590 --> 00:45:34,850
some other way?

1088
00:45:34,850 --> 00:45:38,240
Then we can flatten the picture,
so to speak, and hand you

1089
00:45:38,240 --> 00:45:41,110
a two dimensional thing instead
of a three dimensional one.

1090
00:45:41,110 --> 00:45:43,610
And if you can do the same but
taking two dimensional things

1091
00:45:43,610 --> 00:45:46,640
and collapsing it into a
line, you would flatten again.

1092
00:45:46,640 --> 00:45:48,990
And so this concept is
really not that difficult.

1093
00:45:48,990 --> 00:45:51,170
It's actually something
that we do anyway.

1094
00:45:51,170 --> 00:45:54,560
If you wanted to analyze an image and
you didn't really care about the color,

1095
00:45:54,560 --> 00:45:57,740
for example, you could flatten
it, make it black and white.

1096
00:45:57,740 --> 00:45:59,270
You've now flattened an image.

1097
00:45:59,270 --> 00:46:03,080
And so this, although it might be a
little bit strange or weirdly worded,

1098
00:46:03,080 --> 00:46:05,870
it does something that we're
actually not too familiar with.

1099
00:46:05,870 --> 00:46:08,690
The next easiest one
is probably drop out.

1100
00:46:08,690 --> 00:46:11,720
And this plays a role in
something that we've already seen.

1101
00:46:11,720 --> 00:46:14,990
This plays a role in
basically overfitting.

1102
00:46:14,990 --> 00:46:16,760
So we've talked about this term before.

1103
00:46:16,760 --> 00:46:19,110
We've taught a toddler a bunch of facts.

1104
00:46:19,110 --> 00:46:21,020
And that toddler knows those facts.

1105
00:46:21,020 --> 00:46:23,940
It knows what a brachiosaurus is.

1106
00:46:23,940 --> 00:46:24,640
That's it.

1107
00:46:24,640 --> 00:46:27,055
And so now what we
want to do in our model

1108
00:46:27,055 --> 00:46:28,930
is make sure that our
model isn't doing that.

1109
00:46:28,930 --> 00:46:31,320
It's not just going, the
answer is a, the next answer

1110
00:46:31,320 --> 00:46:33,340
is b, the next one is c, and so on.

1111
00:46:33,340 --> 00:46:36,130
We want our model to pick up
on patterns and say, well,

1112
00:46:36,130 --> 00:46:39,740
according to how those patterns
work, that should be this.

1113
00:46:39,740 --> 00:46:40,980
That's a much better model.

1114
00:46:40,980 --> 00:46:43,720
And so in this case, what we
do is we introduce dropout.

1115
00:46:43,720 --> 00:46:46,300
And you could think of
that as every once in a

1116
00:46:46,300 --> 00:46:49,840
while we just kind of
randomly kick out some data.

1117
00:46:49,840 --> 00:46:52,910
It's with 50% probability, supposedly.

1118
00:46:52,910 --> 00:46:55,470
So the fraction there is
to tell it how much data

1119
00:46:55,470 --> 00:46:59,330
to kick out, not the probability.

1120
00:46:59,330 --> 00:47:00,030
My mistake.

1121
00:47:00,030 --> 00:47:01,946
This is actually a
fraction of the data that's

1122
00:47:01,946 --> 00:47:06,370
going to be kind of dropped in
a given section, in this layer.

1123
00:47:06,370 --> 00:47:08,730
And so what that layer
says is, like, all right,

1124
00:47:08,730 --> 00:47:09,990
we're going to just kind
of everyone once in a while

1125
00:47:09,990 --> 00:47:11,520
not pick a piece of data.

1126
00:47:11,520 --> 00:47:14,350
And then we're going to move
on and do something else.

1127
00:47:14,350 --> 00:47:18,660
And in that way, we don't give
it the same data set every time.

1128
00:47:18,660 --> 00:47:19,927
We give it a little bit less.

1129
00:47:19,927 --> 00:47:22,260
We say, all right, here's the
data you get to pick from.

1130
00:47:22,260 --> 00:47:24,260
We're actually only going
to hand you this much.

1131
00:47:24,260 --> 00:47:24,972
Here you go.

1132
00:47:24,972 --> 00:47:27,930
And then the next time it comes
around, it might be a different subset.

1133
00:47:27,930 --> 00:47:30,690
Maybe I'll hand you this subset
instead of the previous one.

1134
00:47:30,690 --> 00:47:34,020
And in that way, we can avoid
to a degree overfitting.

1135
00:47:34,020 --> 00:47:38,160
Now 50% of the data being dropped
out every time, pretty high.

1136
00:47:38,160 --> 00:47:40,652
And so I've introduced
that here to kind of combat

1137
00:47:40,652 --> 00:47:42,360
the overfitting that
will occur if we are

1138
00:47:42,360 --> 00:47:45,004
training something hundreds of
times in a couple of minutes.

1139
00:47:45,004 --> 00:47:46,920
But you could lower that
and see what happens.

1140
00:47:46,920 --> 00:47:50,520
It'll probably get very, very
good at the kind of training data.

1141
00:47:50,520 --> 00:47:53,310
But it'll be pretty bad at the
actual validation or testing data.

1142
00:47:53,310 --> 00:47:54,750
So maybe not ideal.

1143
00:47:54,750 --> 00:47:57,360
But you could also increase
this so much that it never

1144
00:47:57,360 --> 00:48:02,194
gets good at the training data and
well the testing data will follow suit.

1145
00:48:02,194 --> 00:48:03,360
And that's not ideal either.

1146
00:48:03,360 --> 00:48:05,150
So there is some kind
of give and take here.

1147
00:48:05,150 --> 00:48:07,150
You do have to mess around
with it a little bit.

1148
00:48:07,150 --> 00:48:10,030
And you can add more or fewer
of these layers as you see fit.

1149
00:48:10,030 --> 00:48:12,340
You'll notice there is some
dimensionality that needs to happen.

1150
00:48:12,340 --> 00:48:14,465
For example, if you got
rid of the flattened layer,

1151
00:48:14,465 --> 00:48:17,670
Keras will just be like, I don't
understand what's going on.

1152
00:48:17,670 --> 00:48:19,290
And it'll kind of freak out on you.

1153
00:48:19,290 --> 00:48:23,740
But other than that, you can play around
with these more or less as you see fit.

1154
00:48:23,740 --> 00:48:29,340
The other kind of major one before
we go into max pooling 2D is dense.

1155
00:48:29,340 --> 00:48:31,140
And dense has some activation.

1156
00:48:31,140 --> 00:48:35,190
If you were to imagine dense as
being some distribution of weights

1157
00:48:35,190 --> 00:48:38,400
is what they're called or numbers
that tell the computer what

1158
00:48:38,400 --> 00:48:42,930
the value of the decision it makes
is, so if I tell you that you touching

1159
00:48:42,930 --> 00:48:45,480
a stove has a value of
100 and you touching

1160
00:48:45,480 --> 00:48:48,960
the ground has a value like 40,
and you touching your own skin has

1161
00:48:48,960 --> 00:48:51,390
a value of like 0, then
you can pretty easily tell

1162
00:48:51,390 --> 00:48:53,040
where my value system is going there.

1163
00:48:53,040 --> 00:48:54,840
Which one is more dangerous.

1164
00:48:54,840 --> 00:48:57,270
And that sort of value
system is at play here,

1165
00:48:57,270 --> 00:49:00,330
but the activation tells it
well how do we want to just

1166
00:49:00,330 --> 00:49:02,850
kind of pre-weight things to a degree.

1167
00:49:02,850 --> 00:49:06,900
And [INAUDIBLE] happens to just be a
common one that people use with images.

1168
00:49:06,900 --> 00:49:08,070
There are a bunch of them.

1169
00:49:08,070 --> 00:49:09,400
You're free to look them up.

1170
00:49:09,400 --> 00:49:11,760
Keras comes in built with a ton of them.

1171
00:49:11,760 --> 00:49:12,750
There's like Softmax.

1172
00:49:12,750 --> 00:49:13,637
There's 10H.

1173
00:49:13,637 --> 00:49:14,970
There's all sorts of other ones.

1174
00:49:14,970 --> 00:49:16,590
And they all mean varying things.

1175
00:49:16,590 --> 00:49:17,940
That can be very technical.

1176
00:49:17,940 --> 00:49:21,251
Sometimes you can just play around with
them and see which ones work better.

1177
00:49:21,251 --> 00:49:23,750
You can just swap them out every
once in a while and try it.

1178
00:49:23,750 --> 00:49:24,416
Which one works?

1179
00:49:24,416 --> 00:49:25,260
Which one doesn't?

1180
00:49:25,260 --> 00:49:29,070
And you'll often find that [INAUDIBLE]
works particularly well with images

1181
00:49:29,070 --> 00:49:30,940
just because of the math underneath.

1182
00:49:30,940 --> 00:49:33,390
And if you're interested in that math,
feel free to talk to me afterward.

1183
00:49:33,390 --> 00:49:35,556
But if you're not, we're
going to just kind of leave

1184
00:49:35,556 --> 00:49:38,490
that as an activation that tells
it the value of its decisions.

1185
00:49:38,490 --> 00:49:40,650
And that's the starting
value of its decisions.

1186
00:49:40,650 --> 00:49:42,022
Later, it re-weights itself.

1187
00:49:42,022 --> 00:49:43,980
It says, oh, yeah, no,
that was a bad decision.

1188
00:49:43,980 --> 00:49:45,480
We're rechanging that.

1189
00:49:45,480 --> 00:49:48,750
And then dense is the actual
thing being added to this layer.

1190
00:49:48,750 --> 00:49:50,460
That is the name of this layer.

1191
00:49:50,460 --> 00:49:54,220
And dense and of itself
is really just saying,

1192
00:49:54,220 --> 00:49:57,090
hey, we're going to have
a of nodes or neurons.

1193
00:49:57,090 --> 00:49:59,400
We're going to have 16
of them specifically.

1194
00:49:59,400 --> 00:50:02,650
And we're going to have them all be
able to communicate with each other.

1195
00:50:02,650 --> 00:50:04,776
And so what that just says
is if I make a decision,

1196
00:50:04,776 --> 00:50:06,566
I'm going to tell
everyone around me that's

1197
00:50:06,566 --> 00:50:08,200
the decision I made and it was bad.

1198
00:50:08,200 --> 00:50:08,960
Don't do that one.

1199
00:50:08,960 --> 00:50:10,230
That was a terrible decision.

1200
00:50:10,230 --> 00:50:14,520
It's like if you get a little
bit too drunk on water one night

1201
00:50:14,520 --> 00:50:17,970
and you just go around to everyone
the next day like, hey, guys bad plan.

1202
00:50:17,970 --> 00:50:18,990
Don't do that.

1203
00:50:18,990 --> 00:50:20,370
Do you not make that decision.

1204
00:50:20,370 --> 00:50:21,840
Very easy way of dealing with that.

1205
00:50:21,840 --> 00:50:24,840
And in that layer you have a bunch
of neurons all talking to each other.

1206
00:50:24,840 --> 00:50:29,430
And some people's immediate solution
is well I could just add more neurons.

1207
00:50:29,430 --> 00:50:30,330
Sometimes.

1208
00:50:30,330 --> 00:50:31,140
Sometimes not.

1209
00:50:31,140 --> 00:50:33,610
And you'll notice that it makes
your computer a lot slower.

1210
00:50:33,610 --> 00:50:36,270
So there is always a trade off.

1211
00:50:36,270 --> 00:50:39,600
And then we have our
max pooling 2D, which

1212
00:50:39,600 --> 00:50:42,990
is actually pretty intuitively named
if you know what's going on underneath.

1213
00:50:42,990 --> 00:50:45,780
But if you don't, it's
just like what the fudge.

1214
00:50:45,780 --> 00:50:49,170
So what we have ending up going
on here is I gave it a pool size.

1215
00:50:49,170 --> 00:50:50,370
I said 2 by 2.

1216
00:50:50,370 --> 00:50:52,380
And so if you imagine
that in your image you

1217
00:50:52,380 --> 00:50:56,430
have 2 by 2 like sections of pixels,
basically a square that has four

1218
00:50:56,430 --> 00:50:58,890
pixels in it sliding across the image.

1219
00:50:58,890 --> 00:51:02,550
Then really what I'm doing here is
I'm kind of pooling them all together

1220
00:51:02,550 --> 00:51:03,810
and taking the max.

1221
00:51:03,810 --> 00:51:04,790
That's it.

1222
00:51:04,790 --> 00:51:07,290
And I'm taking that max one and
saying that that is probably

1223
00:51:07,290 --> 00:51:09,000
the feature that determines things.

1224
00:51:09,000 --> 00:51:11,294
And in images, that can
sometimes be the case.

1225
00:51:11,294 --> 00:51:13,710
Particularly for this kind of
image, it works pretty well.

1226
00:51:13,710 --> 00:51:14,935
There's also min pooling.

1227
00:51:14,935 --> 00:51:15,810
You take the minimum.

1228
00:51:15,810 --> 00:51:17,674
That's the one that matters.

1229
00:51:17,674 --> 00:51:20,090
There are cases where that
might be particularly relevant.

1230
00:51:20,090 --> 00:51:22,150
What if you're looking at
the negative of images?

1231
00:51:22,150 --> 00:51:23,160
Maybe that applies here.

1232
00:51:23,160 --> 00:51:24,180
Maybe it doesn't.

1233
00:51:24,180 --> 00:51:25,900
And so that's something to keep in mind.

1234
00:51:25,900 --> 00:51:27,900
And there's, I believe,
also average pooling.

1235
00:51:27,900 --> 00:51:30,360
It might be called
mean pooling in Keras.

1236
00:51:30,360 --> 00:51:33,750
But it does the same thing
that you just would think of,

1237
00:51:33,750 --> 00:51:36,600
takes that, averages it, and
then does that as it goes.

1238
00:51:36,600 --> 00:51:40,200
And the pool size can change if
you think that's appropriate.

1239
00:51:40,200 --> 00:51:43,740
2 by 2 is pretty fitting here because
we aren't really saying, yeah,

1240
00:51:43,740 --> 00:51:46,602
this whole thing, if you just
take the biggest point there,

1241
00:51:46,602 --> 00:51:48,352
that determines whether
it's happy or not.

1242
00:51:48,352 --> 00:51:49,590
That's it.

1243
00:51:49,590 --> 00:51:50,640
That's not very accurate.

1244
00:51:50,640 --> 00:51:52,380
We wouldn't be able
to get far from that.

1245
00:51:52,380 --> 00:51:54,450
So this helps us condense
our data a little bit.

1246
00:51:54,450 --> 00:51:56,280
We kind of just take the
information we're looking at

1247
00:51:56,280 --> 00:51:57,730
and throw out some of the fluff.

1248
00:51:57,730 --> 00:51:59,640
And you do this a couple of times.

1249
00:51:59,640 --> 00:52:03,010
And then at the end we
spit out our output.

1250
00:52:03,010 --> 00:52:08,400
So that is kind of your very topical
overview into machine learning

1251
00:52:08,400 --> 00:52:10,690
and hopefully an
introduction to the idea

1252
00:52:10,690 --> 00:52:14,500
that it is accessible as a final
project for CS50, specifically.

1253
00:52:14,500 --> 00:52:18,626
But even in the kind of real world,
outside of CS50 and outside of classes,

1254
00:52:18,626 --> 00:52:20,500
if you wanted to tinker
around with this that

1255
00:52:20,500 --> 00:52:22,940
is totally within your capabilities.

1256
00:52:22,940 --> 00:52:26,260
And I mean your not as someone
who has done a year of CS

1257
00:52:26,260 --> 00:52:28,450
and is now teaching
a course, but someone

1258
00:52:28,450 --> 00:52:31,870
who has started where
you all started or worse.

1259
00:52:31,870 --> 00:52:35,480
I started with no experience of
this and this was where I went.

1260
00:52:35,480 --> 00:52:37,030
This was the direction I chose.

1261
00:52:37,030 --> 00:52:38,300
And it's totally accessible.

1262
00:52:38,300 --> 00:52:39,200
You can do that.

1263
00:52:39,200 --> 00:52:41,591
That is entirely within your grasp.

1264
00:52:41,591 --> 00:52:44,590
And so if you're at all interested
in it, I would recommend pursuing it.

1265
00:52:44,590 --> 00:52:47,715
You'll find that it is difficult. There
are points where it is frustrating.

1266
00:52:47,715 --> 00:52:50,864
But that is the case with anything
that you are going to do in CS.

1267
00:52:50,864 --> 00:52:52,780
There are points where
they will be difficult,

1268
00:52:52,780 --> 00:52:54,200
where they will be frustrating.

1269
00:52:54,200 --> 00:52:56,800
So I would encourage you to
not give up but rather think

1270
00:52:56,800 --> 00:52:58,420
that that is basically the right path.

1271
00:52:58,420 --> 00:52:59,830
You're going down the street.

1272
00:52:59,830 --> 00:53:01,150
Just keep going.

1273
00:53:01,150 --> 00:53:04,060
Because you might as well, if you're
going to do this on any project,

1274
00:53:04,060 --> 00:53:05,844
do it on one you're interested in.

1275
00:53:05,844 --> 00:53:08,260
And that's more of a piece of
advice specifically directed

1276
00:53:08,260 --> 00:53:10,210
at the final project for CS50.

1277
00:53:10,210 --> 00:53:12,400
Don't waste your time for three weeks.

1278
00:53:12,400 --> 00:53:13,660
Build something cool.

1279
00:53:13,660 --> 00:53:17,490
If it's hard and it takes a lot of
time and it's very annoying to debug

1280
00:53:17,490 --> 00:53:20,649
and there's things that don't work
up until the last possible minute,

1281
00:53:20,649 --> 00:53:21,940
you're probably doing it right.

1282
00:53:21,940 --> 00:53:23,560
That's probably about right.

1283
00:53:23,560 --> 00:53:26,800
A lot of the best work in CS
happens at the last possible minute.

1284
00:53:26,800 --> 00:53:30,085
It's that moment is where
you're like, I got it.

1285
00:53:30,085 --> 00:53:30,960
And then you're good.

1286
00:53:30,960 --> 00:53:34,000
And that sense of relief is why
a lot of us are still in CS,

1287
00:53:34,000 --> 00:53:37,650
is we like that feeling of being
satisfied with what we produced.

1288
00:53:37,650 --> 00:53:41,580
So if you ever think that CS is not
for you because it's too difficult

1289
00:53:41,580 --> 00:53:46,320
or because everyone seems to get it
but you, that is 100% not the case.

1290
00:53:46,320 --> 00:53:47,550
Machine learning is hard.

1291
00:53:47,550 --> 00:53:48,660
Computer vision is hard.

1292
00:53:48,660 --> 00:53:50,220
Computer science is hard.

1293
00:53:50,220 --> 00:53:54,780
Learning is difficult.
All of this, we can do.

1294
00:53:54,780 --> 00:53:56,890
So I would recommend always pursuing it.

1295
00:53:56,890 --> 00:53:59,330
What are some questions you guys
have about machine learning, computer

1296
00:53:59,330 --> 00:53:59,830
vision?

1297
00:53:59,830 --> 00:54:02,610
I figure in my last like
seven or so minutes I'll

1298
00:54:02,610 --> 00:54:04,604
open it up to any questions.

1299
00:54:04,604 --> 00:54:07,390


1300
00:54:07,390 --> 00:54:08,015
Sure.

1301
00:54:08,015 --> 00:54:10,098
SPEAKER 2: I was wondering
if you could talk maybe

1302
00:54:10,098 --> 00:54:14,720
about the min pooling and max
pooling, and when to use a 2 by 2.

1303
00:54:14,720 --> 00:54:18,455
Like what are some circumstances
where you'd use like 100 by 100

1304
00:54:18,455 --> 00:54:19,381
or [INAUDIBLE]

1305
00:54:19,381 --> 00:54:22,340
SPEAKER 1: You can imagine, I'm
not really in a pull up any image.

1306
00:54:22,340 --> 00:54:24,423
But maybe I'll keep the
cat one up here it's cute.

1307
00:54:24,423 --> 00:54:27,250


1308
00:54:27,250 --> 00:54:29,870
So this image has a set
amount of pixels in it.

1309
00:54:29,870 --> 00:54:34,454
So the question being why min pooling
versus max pooling versus mean pooling

1310
00:54:34,454 --> 00:54:36,620
and what does it mean to
have a different pool size.

1311
00:54:36,620 --> 00:54:38,632
Why is that relevant really?

1312
00:54:38,632 --> 00:54:41,840
And so we're going to talk about this
image in particular, because it's cute.

1313
00:54:41,840 --> 00:54:43,520
And I think brown or red.

1314
00:54:43,520 --> 00:54:44,444
I can't really tell.

1315
00:54:44,444 --> 00:54:46,610
But it's an Egyptian cat
and they have the beautiful

1316
00:54:46,610 --> 00:54:47,980
like wide eyes and big ears.

1317
00:54:47,980 --> 00:54:48,950
They're awesome.

1318
00:54:48,950 --> 00:54:52,130
And basically this image
has a set number of pixels.

1319
00:54:52,130 --> 00:54:54,440
Even though I'm displaying
it on some number of pixels,

1320
00:54:54,440 --> 00:54:59,300
the image itself is, let's
say, I don't 400 by 200.

1321
00:54:59,300 --> 00:55:01,430
Not quite right but close enough.

1322
00:55:01,430 --> 00:55:08,810
So if it's 400 by 200, then in a
given, we'll say like 20 by 20 box,

1323
00:55:08,810 --> 00:55:10,940
we can only get so much data.

1324
00:55:10,940 --> 00:55:13,910
Let's say that's just the
tip of the ear, 20 by 20.

1325
00:55:13,910 --> 00:55:16,340
Well, if I take just the
max of that, then you

1326
00:55:16,340 --> 00:55:21,170
can think of it as the actual 20
by 20 section, if I do a max of 20

1327
00:55:21,170 --> 00:55:26,840
by 20, well, then the entire tip
of this ear becomes one point.

1328
00:55:26,840 --> 00:55:29,060
I have one data point that
is the tip of the ear.

1329
00:55:29,060 --> 00:55:32,500
And same thing as we iterate
through the entire image.

1330
00:55:32,500 --> 00:55:35,360
So that this image gets
significantly condensed.

1331
00:55:35,360 --> 00:55:38,300
If it was 400 by 200,
well you can think of it

1332
00:55:38,300 --> 00:55:42,520
as now being reduced by a factor
of 20 which might be appropriate.

1333
00:55:42,520 --> 00:55:46,430
Maybe all you really care
about is the general shape.

1334
00:55:46,430 --> 00:55:48,950
Is it a cat or is it a doorknob?

1335
00:55:48,950 --> 00:55:50,870
That's a pretty easy
classifier to build.

1336
00:55:50,870 --> 00:55:55,760
All I have to really care about is it
not a circle, not a doorknob, good.

1337
00:55:55,760 --> 00:55:57,830
But maybe your class of
doorknobs is different.

1338
00:55:57,830 --> 00:55:59,700
It can get more complicated from there.

1339
00:55:59,700 --> 00:56:02,810
But in this case, you'd
probably want to use

1340
00:56:02,810 --> 00:56:05,012
to preserve detail a pretty small pool.

1341
00:56:05,012 --> 00:56:07,220
We're just trying to condense
our image a little bit.

1342
00:56:07,220 --> 00:56:10,010
We're trying to get rid of some
of the fluff, some of the noise.

1343
00:56:10,010 --> 00:56:11,940
Like there's some fur here.

1344
00:56:11,940 --> 00:56:14,600
But it doesn't really matter
what that fur actually

1345
00:56:14,600 --> 00:56:17,624
does, unless you're looking
for a very particular machine

1346
00:56:17,624 --> 00:56:19,790
classifier in which case
you're probably not looking

1347
00:56:19,790 --> 00:56:21,590
at pictures of whole animals.

1348
00:56:21,590 --> 00:56:24,350
So if we're looking at, is
it a cat versus a dog, well,

1349
00:56:24,350 --> 00:56:26,600
does it really matter if
there's a speck of fur here

1350
00:56:26,600 --> 00:56:30,560
or some extra noise captured by
the camera that took the picture?

1351
00:56:30,560 --> 00:56:31,220
Not really.

1352
00:56:31,220 --> 00:56:34,040
So we can just kind of ignore
that and average over it.

1353
00:56:34,040 --> 00:56:35,874
Or use max pooling over
it and just say, you

1354
00:56:35,874 --> 00:56:38,831
know what, we're just going to pool
all of our details, all the biggest

1355
00:56:38,831 --> 00:56:40,190
details, together.

1356
00:56:40,190 --> 00:56:45,320
And while that can be appropriate in
this case, what if this picture was

1357
00:56:45,320 --> 00:56:49,470
4 million pixels by 2 million pixels?

1358
00:56:49,470 --> 00:56:51,770
Now your pool size might
want to be scaled up a lot.

1359
00:56:51,770 --> 00:56:54,000
We don't need all of
that extra information,

1360
00:56:54,000 --> 00:56:56,090
especially if it's the same picture.

1361
00:56:56,090 --> 00:56:58,670
We can just say, you know what,
we're going to reduce that

1362
00:56:58,670 --> 00:57:01,600
by a factor of like a million.

1363
00:57:01,600 --> 00:57:04,750
And now you have a 4 by 2, which
might be a little too much.

1364
00:57:04,750 --> 00:57:07,450
Now you've basically just got
four pixels down and two across

1365
00:57:07,450 --> 00:57:08,770
and hopefully it's still a cat.

1366
00:57:08,770 --> 00:57:10,300
But you can play around with that.

1367
00:57:10,300 --> 00:57:12,340
And that might be a
case in which you would

1368
00:57:12,340 --> 00:57:15,220
need to change whether you're
doing a min or a max or even

1369
00:57:15,220 --> 00:57:17,440
just how you're analyzing this image.

1370
00:57:17,440 --> 00:57:20,830
Is it appropriate to take
just one image and do this?

1371
00:57:20,830 --> 00:57:23,980
Or is only one image
in your data set extra

1372
00:57:23,980 --> 00:57:27,757
large and then all of the rest
of them are like 150 by 150?

1373
00:57:27,757 --> 00:57:29,215
Then you might want to change that.

1374
00:57:29,215 --> 00:57:32,742
SPEAKER 3: So like
[INAUDIBLE] like if you

1375
00:57:32,742 --> 00:57:36,750
had one image that was say
like 4 million pixels long,

1376
00:57:36,750 --> 00:57:39,085
it would probably make more
sense then to preprocess

1377
00:57:39,085 --> 00:57:42,190
that data before you
go in, [INAUDIBLE] size

1378
00:57:42,190 --> 00:57:43,850
to like a certain value [INAUDIBLE].

1379
00:57:43,850 --> 00:57:44,960
SPEAKER 1: Yes.

1380
00:57:44,960 --> 00:57:47,980
Actually there is a little bit
of that in the sample code.

1381
00:57:47,980 --> 00:57:50,480
There is a bit of what
was just brought up,

1382
00:57:50,480 --> 00:57:52,294
which is pre-processing of an image.

1383
00:57:52,294 --> 00:57:55,210
It's another kind of fun little word
that people will throw out there,

1384
00:57:55,210 --> 00:57:57,910
just like oh, yeah, preprocess
your image before you use them.

1385
00:57:57,910 --> 00:58:00,243
And they like turn around and
just ignore that they just

1386
00:58:00,243 --> 00:58:02,080
dropped a whole thing on you.

1387
00:58:02,080 --> 00:58:05,350
Pre-processing is really just
basically what was just mentioned

1388
00:58:05,350 --> 00:58:09,076
is that you want to take your images
and kind of normalize them a little bit.

1389
00:58:09,076 --> 00:58:11,950
You don't want to have this outlier
in your data set that's 4 million

1390
00:58:11,950 --> 00:58:14,950
by 2 million when the
rest of like 100 across.

1391
00:58:14,950 --> 00:58:17,350
You want to take those and
maybe resize them, scale them

1392
00:58:17,350 --> 00:58:18,642
down using appropriate methods.

1393
00:58:18,642 --> 00:58:21,350
And whatever those methods are
might change depending on the data

1394
00:58:21,350 --> 00:58:23,800
you're looking at or depending
on how you want to do it

1395
00:58:23,800 --> 00:58:26,196
but being able to
normalize across them is

1396
00:58:26,196 --> 00:58:27,820
going to be some sort of preprocessing.

1397
00:58:27,820 --> 00:58:31,449
And it's called preprocessing if you
think of it as the processing part is

1398
00:58:31,449 --> 00:58:33,490
throwing it into your
machine learning algorithm,

1399
00:58:33,490 --> 00:58:35,861
you do this beforehand-- preprocessing.

1400
00:58:35,861 --> 00:58:38,110
And so that's where that
terminology kind of comes in.

1401
00:58:38,110 --> 00:58:40,390
And it comes up a lot
with images in particular

1402
00:58:40,390 --> 00:58:43,420
because images can be taken by
cameras, which are used by people

1403
00:58:43,420 --> 00:58:45,710
and people are pretty stochastic.

1404
00:58:45,710 --> 00:58:49,334
I might take the same picture 400 times
and it might look different every time.

1405
00:58:49,334 --> 00:58:52,250
And that's kind of a problem with
how people take pictures, especially

1406
00:58:52,250 --> 00:58:54,291
for real world scenarios
where you're applying it

1407
00:58:54,291 --> 00:58:57,860
to some sort of pictures of
living animals or people's faces

1408
00:58:57,860 --> 00:58:58,880
or things like that.

1409
00:58:58,880 --> 00:59:02,046
You'll probably want to find a way to
preprocess your images so that they're

1410
00:59:02,046 --> 00:59:06,830
roughly the right size and give or take
roughly the right thing that you're

1411
00:59:06,830 --> 00:59:08,000
looking for.

1412
00:59:08,000 --> 00:59:11,390
Maybe a picture of someone's face
is zoomed all the way out here

1413
00:59:11,390 --> 00:59:14,980
and maybe every other person's
picture is like zoomed in super close

1414
00:59:14,980 --> 00:59:16,940
so you just have their face.

1415
00:59:16,940 --> 00:59:19,520
Harvard IT uses that to identify you.

1416
00:59:19,520 --> 00:59:23,060
They preprocess all of the images they
take of you so that it's just your face

1417
00:59:23,060 --> 00:59:24,329
and they can identify you.

1418
00:59:24,329 --> 00:59:26,870
And that's something that comes
up a lot in machine learning.

1419
00:59:26,870 --> 00:59:29,240
It's part of the project.

1420
00:59:29,240 --> 00:59:32,550
And I think I'm right
about out of time, but I'll

1421
00:59:32,550 --> 00:59:34,660
be hanging around afterward
for any questions.

1422
00:59:34,660 --> 00:59:37,076
But as far as the livestream
goes, thank you for watching.

1423
00:59:37,076 --> 00:59:39,857
I'll be on campus kind
of doing my own thing.

1424
00:59:39,857 --> 00:59:41,940
But I really appreciate
you hanging in all the way

1425
00:59:41,940 --> 00:59:44,250
through the weird cat
picture at the very end.

1426
00:59:44,250 --> 00:59:45,990
So thank you very much.

1427
00:59:45,990 --> 00:59:48,840
Thanks for showing up, you guys.

1428
00:59:48,840 --> 00:59:51,034