1
00:00:00,000 --> 00:00:02,570
[Section 3 - More Comfortable]

2
00:00:02,570 --> 00:00:05,070
[Rob Bowden - Harvard University]

3
00:00:05,070 --> 00:00:07,310
>> [This is CS50. - CS50.TV]

4
00:00:07,310 --> 00:00:12,700
>> So the first question is strangely worded.

5
00:00:12,700 --> 00:00:17,480
GDB lets you "debug" a program, but, more specifically, what does it let you do?

6
00:00:17,480 --> 00:00:22,590
I'll answer that one, and I don't know what's exactly expected,

7
00:00:22,590 --> 00:00:27,910
so I'm guessing it's something along the lines of it lets you step by step

8
00:00:27,910 --> 00:00:31,540
walk through the program, interact with it, change variables, do all these things--

9
00:00:31,540 --> 00:00:34,270
basically completely control execution of a program

10
00:00:34,270 --> 00:00:38,410
and inspect any given part of the execution of the program.

11
00:00:38,410 --> 00:00:43,030
So those features let you debug things.

12
00:00:43,030 --> 00:00:44,830
Okay.

13
00:00:44,830 --> 00:00:50,520
Why does binary search require that an array be sorted?

14
00:00:50,520 --> 00:00:53,360
Who wants to answer that?

15
00:00:56,120 --> 00:01:00,070
[student] Because it doesn't work if it's not sorted. >>Yeah. [laughter]

16
00:01:00,070 --> 00:01:04,910
If it's not sorted, then it's impossible to split it in half

17
00:01:04,910 --> 00:01:07,850
and know that everything to the left is less and everything to the right

18
00:01:07,850 --> 00:01:10,490
is greater than the middle value.

19
00:01:10,490 --> 00:01:12,790
So it needs to be sorted.

20
00:01:12,790 --> 00:01:14,170
Okay.

21
00:01:14,170 --> 00:01:17,570
Why is bubble sort in O of n squared?

22
00:01:17,570 --> 00:01:23,230
Does anyone first want to give a very quick high-level overview of what bubble sort is?

23
00:01:25,950 --> 00:01:33,020
[student] You basically go through each element and you check the first few elements.

24
00:01:33,020 --> 00:01:37,150
If they're out of where you swap them, then you check the next few elements and so on.

25
00:01:37,150 --> 00:01:40,770
When you reach the end, then you know that the largest element is placed at the end,

26
00:01:40,770 --> 00:01:42,750
so you ignore that one then you keep on going through,

27
00:01:42,750 --> 00:01:48,490
and each time you have to check one less element until you make no changes. >>Yeah.

28
00:01:48,490 --> 00:01:58,470
It's called bubble sort because if you flip the array on its side so it's up and down, vertical,

29
00:01:58,470 --> 00:02:03,100
then the large values will sink to the bottom and the small values will bubble up to the top.

30
00:02:03,100 --> 00:02:05,210
That's how it got its name.

31
00:02:05,210 --> 00:02:08,220
And yeah, you just go through.

32
00:02:08,220 --> 00:02:11,190
You keep going through the array, swapping the larger value

33
00:02:11,190 --> 00:02:14,040
to get the largest values to the bottom.

34
00:02:14,040 --> 00:02:17,500
>> Why is it O of n squared?

35
00:02:18,690 --> 00:02:24,620
First, does anyone want to say why it's O of n squared?

36
00:02:24,620 --> 00:02:28,760
[student] Because for each run it goes n times.

37
00:02:28,760 --> 00:02:32,100
You can only be sure that you've taken the biggest element all the way down,

38
00:02:32,100 --> 00:02:35,230
then you have to repeat that for as many elements. >>Yeah.

39
00:02:35,230 --> 00:02:41,800
So keep in mind what big O means and what big Omega means.

40
00:02:41,800 --> 00:02:50,560
The big O is like the upper bound on how slow it can actually run.

41
00:02:50,560 --> 00:02:58,990
So by saying it's O of n squared, it is not O of n or else it would be able to run

42
00:02:58,990 --> 00:03:02,640
in linear time, but it is O of n cubed

43
00:03:02,640 --> 00:03:06,390
because it is bounded by O of n cubed.

44
00:03:06,390 --> 00:03:12,300
If it's bounded by O of n squared, then it's bounded also by n cubed.

45
00:03:12,300 --> 00:03:20,280
So it is n squared, and in the absolute worst case it can't do better than n squared,

46
00:03:20,280 --> 00:03:22,830
which is why it's O of n squared.

47
00:03:22,830 --> 00:03:31,200
So to see slight math at how it comes out to be n squared,

48
00:03:31,200 --> 00:03:40,530
if we have five things in our list, the first time how many swaps could we potentially need to make

49
00:03:40,530 --> 00:03:47,170
in order to get this? Let's actually just--

50
00:03:47,170 --> 00:03:52,040
How many swaps are we going to have to make in the first run of bubble sort through the array?

51
00:03:52,040 --> 00:03:53,540
[student] n - 1. >>Yeah.

52
00:03:53,540 --> 00:03:58,340
>> If there are 5 elements, we're going to need to make n - 1.

53
00:03:58,340 --> 00:04:01,100
Then on the second one how many swaps are we going to have to make?

54
00:04:01,100 --> 00:04:03,440
[student] n - 2. >>Yeah.

55
00:04:03,440 --> 00:04:11,640
And the third is going to be n - 3, and then for convenience I will write the last two

56
00:04:11,640 --> 00:04:15,270
as then we're going to need to make 2 swaps and 1 swap.

57
00:04:15,270 --> 00:04:19,899
I guess the last one may or may not actually need to happen.

58
00:04:19,899 --> 00:04:22,820
Is it a swap? I don't know.

59
00:04:22,820 --> 00:04:26,490
So these are the total amounts of swaps or at least comparisons you have to make.

60
00:04:26,490 --> 00:04:29,910
Even if you don't swap, you still need to compare the values.

61
00:04:29,910 --> 00:04:33,910
So there are n - 1 comparisons in the first run through the array.

62
00:04:33,910 --> 00:04:42,050
If you rearrange these things, let's actually make it six things so things stack up nicely,

63
00:04:42,050 --> 00:04:44,790
and then I'll do 3, 2, 1.

64
00:04:44,790 --> 00:04:49,910
So just rearranging these sums, we want to see how many comparisons we make

65
00:04:49,910 --> 00:04:52,700
in the entire algorithm.

66
00:04:52,700 --> 00:04:56,550
So if we bring these guys down here,

67
00:04:56,550 --> 00:05:07,470
then we're still just summing however many comparisons there were.

68
00:05:07,470 --> 00:05:13,280
But if we sum these and we sum these and we sum these,

69
00:05:13,280 --> 00:05:18,130
it's still the same problem. We just sum those particular groups.

70
00:05:18,130 --> 00:05:22,400
>> So now we're summing 3 n's. It's not just 3 n's.

71
00:05:22,400 --> 00:05:27,650
It's always going to be n/2 n's.

72
00:05:27,650 --> 00:05:29,430
So here we happen to have 6.

73
00:05:29,430 --> 00:05:34,830
If we had 10 things, then we could do this grouping for 5 different pairs of things

74
00:05:34,830 --> 00:05:37,180
and end up with n + n + n + n + n.

75
00:05:37,180 --> 00:05:45,840
So you're always going to get n/2 n's, and so this we'll jot it out to n squared/2.

76
00:05:45,840 --> 00:05:48,890
And so even though it's the factor of half, which happens to come in

77
00:05:48,890 --> 00:05:54,190
because of the fact that through each iteration over the array we compare 1 less

78
00:05:54,190 --> 00:05:58,040
so that's how we get the over 2, but it is still n squared.

79
00:05:58,040 --> 00:06:01,650
We don't care about the constant factor of a half.

80
00:06:01,650 --> 00:06:07,760
So a lot of big O stuff like this relies on just kind of doing this sort of math,

81
00:06:07,760 --> 00:06:12,260
doing arithmetic sums and geometric series stuff,

82
00:06:12,260 --> 00:06:17,750
but most of them in this course are pretty straightforward.

83
00:06:17,750 --> 00:06:19,290
Okay.

84
00:06:19,290 --> 00:06:24,430
Why is insertion sort in Omega of n? What does omega mean?

85
00:06:24,430 --> 00:06:27,730
[two students speaking at once - unintelligible]  >>Yeah.

86
00:06:27,730 --> 00:06:30,630
Omega you can think of as the lower bound.

87
00:06:30,630 --> 00:06:36,550
>> So no matter how efficient your insertion sort algorithm is,

88
00:06:36,550 --> 00:06:41,810
regardless of the list that's passed in, it always has to compare at least n things

89
00:06:41,810 --> 00:06:44,620
or it has to iterate over n things.

90
00:06:44,620 --> 00:06:47,280
Why is that?

91
00:06:47,280 --> 00:06:51,190
[student] Because if the list is already sorted, then through the first iteration

92
00:06:51,190 --> 00:06:54,080
you can only guarantee that the very first element is the least,

93
00:06:54,080 --> 00:06:56,490
and the second iteration you can only guarantee the first two are

94
00:06:56,490 --> 00:07:00,010
because you don't know that the rest of the list is sorted. >>Yeah.

95
00:07:00,010 --> 00:07:08,910
If you pass in a completely sorted list, at the very least you have to go over all the elements

96
00:07:08,910 --> 00:07:12,180
to see that nothing needs to be moved around.

97
00:07:12,180 --> 00:07:14,720
So passing over the list and saying oh, this is already sorted,

98
00:07:14,720 --> 00:07:18,240
it's impossible for you to know it's sorted until you check each element

99
00:07:18,240 --> 00:07:20,660
to see that they are in sorted order.

100
00:07:20,660 --> 00:07:25,290
So the lower bound on insertion sort is Omega of n.

101
00:07:25,290 --> 00:07:28,210
What's the worst case running time of merge sort,

102
00:07:28,210 --> 00:07:31,390
worst case being big O again?

103
00:07:31,390 --> 00:07:37,660
So in the worst case scenario, how does merge sort run?

104
00:07:42,170 --> 00:07:43,690
[student] N log n. >>Yeah.

105
00:07:43,690 --> 00:07:49,990
The fastest general sorting algorithms are n log n. You can't do better.

106
00:07:51,930 --> 00:07:55,130
>> There are special cases, and if we have time today--but we probably won't--

107
00:07:55,130 --> 00:07:59,330
we could see one that does better than n log n.

108
00:07:59,330 --> 00:08:04,050
But in the general case, you cannot do better than n log n.

109
00:08:04,050 --> 00:08:09,680
And merge sort happens to be the one you should know for this course that is n log n.

110
00:08:09,680 --> 00:08:13,260
And so we'll actually be implementing that today.

111
00:08:13,260 --> 00:08:18,070
And finally, in no more than three sentences, how does selection sort work?

112
00:08:18,070 --> 00:08:20,370
Does someone want to answer, and I'll count your sentences

113
00:08:20,370 --> 00:08:22,390
because if you go over 3--

114
00:08:25,530 --> 00:08:28,330
Does anyone remember selection sort?

115
00:08:31,280 --> 00:08:37,809
Selection sort is usually pretty easy to remember just from the name.

116
00:08:37,809 --> 00:08:45,350
You just iterate over the array, find whatever the largest value is or smallest--

117
00:08:45,350 --> 00:08:47,290
whatever order you're sorting in.

118
00:08:47,290 --> 00:08:50,750
So let's say we're sorting from smallest to greatest.

119
00:08:50,750 --> 00:08:55,250
You iterate over the array, looking for whatever the minimum element is,

120
00:08:55,250 --> 00:08:59,750
select it, and then just swap it whatever is in the first place.

121
00:08:59,750 --> 00:09:04,090
And then on the second pass over the array, look for the minimum element again,

122
00:09:04,090 --> 00:09:07,490
select it, and then swap it with what's in the second position.

123
00:09:07,490 --> 00:09:10,650
So we are just picking and choosing the minimum values

124
00:09:10,650 --> 00:09:16,050
and inserting them into the front of the array until it is sorted.

125
00:09:19,210 --> 00:09:21,560
Questions on that?

126
00:09:21,560 --> 00:09:25,710
>> These inevitably appear in the forms you have to fill out when you're submitting the pset.

127
00:09:29,010 --> 00:09:32,360
Those are basically the answers to those.

128
00:09:32,360 --> 00:09:34,230
Okay, so now coding problems.

129
00:09:34,230 --> 00:09:40,140
I already sent out over email-- Did anyone not get that email? Okay.

130
00:09:40,140 --> 00:09:46,630
I already sent out over email the space that we're going to be using,

131
00:09:46,630 --> 00:09:52,120
and if you click on my name--so I think I'm going to be at the bottom

132
00:09:52,120 --> 00:09:57,170
because of the backwards r--but if you click on my name you'll see 2 revisions.

133
00:09:57,170 --> 00:10:02,650
Revision 1 is going to be I already copied and pasted the code into Spaces

134
00:10:02,650 --> 00:10:06,900
for the search thing you're going to have to implement.

135
00:10:06,900 --> 00:10:10,540
And Revision 2 will be the sort thing that we implement after that.

136
00:10:10,540 --> 00:10:15,770
So you can click on my Revision 1 and work from there.

137
00:10:17,350 --> 00:10:22,060
And now we want to implement binary search.

138
00:10:22,060 --> 00:10:26,470
>> Does anyone want to just give a pseudocode high-level explanation

139
00:10:26,470 --> 00:10:31,440
of what we're going to have to do for search? Yeah.

140
00:10:31,440 --> 00:10:36,170
[student] You just take the middle of the array and see if what you're looking for

141
00:10:36,170 --> 00:10:38,650
is less than that or more than that.

142
00:10:38,650 --> 00:10:41,080
And if it's less, you go to the half that's less,

143
00:10:41,080 --> 00:10:44,750
and if it's more, you go to the half that's more and you repeat that until you just get one thing.

144
00:10:44,750 --> 00:10:46,570
[Bowden] Yeah.

145
00:10:46,570 --> 00:10:51,320
Notice that our numbers array is already sorted,

146
00:10:51,320 --> 00:10:57,190
and so that means that we can take advantage of that and we could first check,

147
00:10:57,190 --> 00:11:00,390
okay, I'm looking for the number 50.

148
00:11:00,390 --> 00:11:03,720
So I can go into the middle.

149
00:11:03,720 --> 00:11:07,380
Middle is hard to define when it's an even number of things,

150
00:11:07,380 --> 00:11:10,820
but let's just say we'll always truncate to the middle.

151
00:11:10,820 --> 00:11:14,420
So here we have 8 things, so the middle would be 16.

152
00:11:14,420 --> 00:11:17,330
I'm looking for 50, so 50 is greater than 16.

153
00:11:17,330 --> 00:11:21,310
So now I can basically treat my array as these elements.

154
00:11:21,310 --> 00:11:23,450
I can throw away everything from 16 over.

155
00:11:23,450 --> 00:11:27,440
Now my array is just these 4 elements, and I repeat.

156
00:11:27,440 --> 00:11:31,910
So then I want to find the middle again, which is going to be 42.

157
00:11:31,910 --> 00:11:34,730
42 is less than 50, so I can throw away these two elements.

158
00:11:34,730 --> 00:11:36,890
This is my remaining array.

159
00:11:36,890 --> 00:11:38,430
I'm going to find the middle again.

160
00:11:38,430 --> 00:11:42,100
I guess 50 was a bad example because I was always throwing away things to the left,

161
00:11:42,100 --> 00:11:48,280
but by the same measure, if I'm looking for something

162
00:11:48,280 --> 00:11:52,100
and it's less than the element I'm currently looking at,

163
00:11:52,100 --> 00:11:55,080
then I'm going to throw away everything to the right.

164
00:11:55,080 --> 00:11:58,150
So now we need to implement that.

165
00:11:58,150 --> 00:12:02,310
Notice that we need to pass in size.

166
00:12:02,310 --> 00:12:06,730
We can also not need to hard-code size.

167
00:12:06,730 --> 00:12:11,640
So if we get rid of that #define--

168
00:12:19,630 --> 00:12:21,430
Okay.

169
00:12:21,430 --> 00:12:27,180
How can I nicely figure out what the size of the numbers array currently is?

170
00:12:27,180 --> 00:12:30,950
>> How many elements are in the numbers array?

171
00:12:30,950 --> 00:12:33,630
[student] Numbers, brackets, .length?

172
00:12:33,630 --> 00:12:36,600
[Bowden] That doesn't exist in C.

173
00:12:36,600 --> 00:12:38,580
Need .length.

174
00:12:38,580 --> 00:12:42,010
Arrays don't have properties, so there is no length property of arrays

175
00:12:42,010 --> 00:12:45,650
that will just give you however long it happens to be.

176
00:12:48,180 --> 00:12:51,620
[student] See how much memory it has and divide by how much-- >>Yeah.

177
00:12:51,620 --> 00:12:55,810
So how can we see how much memory it has? >>[student] Sizeof. >>Yeah, sizeof.

178
00:12:55,810 --> 00:13:01,680
Sizeof is the operator that's going to return the size of the numbers array.

179
00:13:01,680 --> 00:13:10,060
And that's going to be however many integers there are times the size of an integer

180
00:13:10,060 --> 00:13:14,050
since that's how much memory it's actually taking up.

181
00:13:14,050 --> 00:13:17,630
So if I want the number of things in the array,

182
00:13:17,630 --> 00:13:20,560
then I'm going to want to divide by the size of an integer.

183
00:13:22,820 --> 00:13:26,010
Okay. So that lets me pass in size here.

184
00:13:26,010 --> 00:13:29,430
Why do I need to pass in size at all?

185
00:13:29,430 --> 00:13:38,570
Why can't I just do up here int size = sizeof(haystack) / sizeof(int)?

186
00:13:38,570 --> 00:13:41,490
Why does this not work?

187
00:13:41,490 --> 00:13:44,470
[student] It's not a global variable.

188
00:13:44,470 --> 00:13:51,540
[Bowden] Haystack exists and we're passing in numbers as haystack,

189
00:13:51,540 --> 00:13:54,700
and this is kind of a foreshadowing of what's to come. Yeah.

190
00:13:54,700 --> 00:14:00,170
[student] Haystack is just the reference to it, so it would return how big that reference is.

191
00:14:00,170 --> 00:14:02,150
Yeah.

192
00:14:02,150 --> 00:14:09,000
I doubt in lecture that you've seen the stack yet really, right?

193
00:14:09,000 --> 00:14:11,270
We've just spoken about it.

194
00:14:11,270 --> 00:14:16,090
So the stack is where all of your variables are going to be stored.

195
00:14:16,090 --> 00:14:19,960
>> Any memory that's allocated for local variables is going in the stack,

196
00:14:19,960 --> 00:14:24,790
and each function gets its own space on the stack, its own stack frame is what it's called.

197
00:14:24,790 --> 00:14:31,590
So main has its stack frame, and inside of it is going to exist this numbers array,

198
00:14:31,590 --> 00:14:34,270
and it's going to be of size sizeof(numbers).

199
00:14:34,270 --> 00:14:38,180
It's going to have size of numbers divided by size of elements,

200
00:14:38,180 --> 00:14:42,010
but that all lives within main's stack frame.

201
00:14:42,010 --> 00:14:45,190
When we call search, search gets its own stack frame,

202
00:14:45,190 --> 00:14:48,840
its own space to store all of its local variables.

203
00:14:48,840 --> 00:14:56,420
But these arguments--so haystack isn't a copy of this entire array.

204
00:14:56,420 --> 00:15:00,990
We don't pass in the entire array as a copy into search.

205
00:15:00,990 --> 00:15:04,030
It just passes a reference to that array.

206
00:15:04,030 --> 00:15:11,470
So search can access these numbers through this reference.

207
00:15:11,470 --> 00:15:17,100
It's still accessing the things that live inside of main's stack frame,

208
00:15:17,100 --> 00:15:22,990
but basically, when we get to pointers, which should be soon,

209
00:15:22,990 --> 00:15:24,980
this is what pointers are.

210
00:15:24,980 --> 00:15:29,400
Pointers are just references to things, and you can use pointers to access things

211
00:15:29,400 --> 00:15:32,030
that are in other things' stack frames.

212
00:15:32,030 --> 00:15:37,660
So even though numbers is local to main, we can still access it through this pointer.

213
00:15:37,660 --> 00:15:41,770
But since it's just a pointer and it's just a reference,

214
00:15:41,770 --> 00:15:45,040
sizeof(haystack) just returns the size of the reference itself.

215
00:15:45,040 --> 00:15:47,950
It doesn't return the size of the thing it's pointing to.

216
00:15:47,950 --> 00:15:51,110
It doesn't return the actual size of numbers.

217
00:15:51,110 --> 00:15:55,660
And so this isn't going to work as we want it to.

218
00:15:55,660 --> 00:15:57,320
>> Questions on that?

219
00:15:57,320 --> 00:16:03,250
Pointers will be gone into in significantly more gory detail in weeks to come.

220
00:16:06,750 --> 00:16:13,740
And this is why a lot of things that you see, most search things or sort things,

221
00:16:13,740 --> 00:16:16,990
they're almost all going to need to take the actual size of the array,

222
00:16:16,990 --> 00:16:20,440
because in C, we have no idea what the size of the array is.

223
00:16:20,440 --> 00:16:22,720
You need to manually pass it in.

224
00:16:22,720 --> 00:16:27,190
And you can't manually pass in the entire array because you're just passing in the reference

225
00:16:27,190 --> 00:16:30,390
and it can't get the size from the reference.

226
00:16:30,390 --> 00:16:32,300
Okay.

227
00:16:32,300 --> 00:16:38,160
So now we want to implement what was explained before.

228
00:16:38,160 --> 00:16:41,530
You can work on it for a minute,

229
00:16:41,530 --> 00:16:45,250
and you don't have to worry about getting everything perfectly 100% working.

230
00:16:45,250 --> 00:16:51,410
Just write up the half pseudocode for how you think it should work.

231
00:16:52,000 --> 00:16:53,630
Okay.

232
00:16:53,630 --> 00:16:56,350
No need to be completely done with this yet.

233
00:16:56,350 --> 00:17:02,380
But does anyone feel comfortable with what they have so far,

234
00:17:02,380 --> 00:17:05,599
like something we can work with together?

235
00:17:05,599 --> 00:17:09,690
Does anyone want to volunteer? Or I will randomly pick.

236
00:17:12,680 --> 00:17:18,599
It doesn't have to be right by any measure but something we can modify into a working state.

237
00:17:18,599 --> 00:17:20,720
[student] Sure. >>Okay.

238
00:17:20,720 --> 00:17:27,220
So can you save the revision by clicking on the little Save icon.

239
00:17:27,220 --> 00:17:29,950
You're Ramya, right? >>[student] Yeah. >>[Bowden] Okay.

240
00:17:29,950 --> 00:17:35,140
So now I can view your revision and everyone can pull up the revision.

241
00:17:35,140 --> 00:17:38,600
And here we have-- Okay.

242
00:17:38,600 --> 00:17:43,160
So Ramya went with the recursive solution, which is definitely a valid solution.

243
00:17:43,160 --> 00:17:44,970
There are two ways you can do this problem.

244
00:17:44,970 --> 00:17:48,060
You can either do it iteratively or recursively.

245
00:17:48,060 --> 00:17:53,860
Most problems you encounter that can be done recursively can also be done iteratively.

246
00:17:53,860 --> 00:17:58,510
So here we've done it recursively.

247
00:17:58,510 --> 00:18:03,730
>> Does someone want to define what it means to make a function recursive?

248
00:18:07,210 --> 00:18:08,920
[student] When you have a function call itself

249
00:18:08,920 --> 00:18:13,470
and then call itself until it comes out with true and true. >>Yeah.

250
00:18:13,470 --> 00:18:17,680
A recursive function is just a function which calls itself.

251
00:18:17,680 --> 00:18:24,140
There are three big things that a recursive function must have.

252
00:18:24,140 --> 00:18:27,310
The first is obviously, it calls itself.

253
00:18:27,310 --> 00:18:29,650
The second is the base case.

254
00:18:29,650 --> 00:18:33,390
So at some point the function needs to stop calling itself,

255
00:18:33,390 --> 00:18:35,610
and that's what the base case is for.

256
00:18:35,610 --> 00:18:43,860
So here we know that we should stop, we should give up in our search

257
00:18:43,860 --> 00:18:48,150
when start equals end--and we'll go over what that means.

258
00:18:48,150 --> 00:18:52,130
But finally, the last thing that's important for recursive functions:

259
00:18:52,130 --> 00:18:59,250
the functions must somehow approach the base case.

260
00:18:59,250 --> 00:19:04,140
Like if you're not actually updating anything when you make the second recursive call,

261
00:19:04,140 --> 00:19:07,880
if you're literally just calling the function again with the same arguments

262
00:19:07,880 --> 00:19:10,130
and no global variables have changed or anything,

263
00:19:10,130 --> 00:19:14,430
you will never reach the base case, in which case that's bad.

264
00:19:14,430 --> 00:19:17,950
It will be an infinite recursion and a stack overflow.

265
00:19:17,950 --> 00:19:24,330
But here we see that the update is happening since we are updating start + end/2,

266
00:19:24,330 --> 00:19:28,180
we're updating the end argument here, we're updating the start argument here.

267
00:19:28,180 --> 00:19:32,860
So in all recursive calls we are updating something. Okay.

268
00:19:32,860 --> 00:19:38,110
Do you want to walk us through your solution? >>Sure.

269
00:19:38,110 --> 00:19:44,270
I'm using SearchHelp so that every time I make this function call

270
00:19:44,270 --> 00:19:47,910
I have the start of where I'm looking for in the array and the end

271
00:19:47,910 --> 00:19:49,380
of where I'm looking the array.

272
00:19:49,380 --> 00:19:55,330
>> At every step where it's saying it's the middle element, which is start + end/2,

273
00:19:55,330 --> 00:19:58,850
is that equal to what we're looking for?

274
00:19:58,850 --> 00:20:04,650
And if it is, then we found it, and I guess that gets passed up the levels of recursion.

275
00:20:04,650 --> 00:20:12,540
And if that's not true, then we check whether that middle value of the array is too big,

276
00:20:12,540 --> 00:20:19,320
in which case we look at the left half of the array by going from start to the middle index.

277
00:20:19,320 --> 00:20:22,710
And otherwise we do the end half.

278
00:20:22,710 --> 00:20:24,740
[Bowden] Okay.

279
00:20:24,740 --> 00:20:27,730
That sounds good.

280
00:20:27,730 --> 00:20:36,640
Okay, so a couple things, and actually, this is a very high-level thing

281
00:20:36,640 --> 00:20:41,270
that you will never need to know for this course, but it is true.

282
00:20:41,270 --> 00:20:46,080
Recursive functions, you always hear that they're a bad deal

283
00:20:46,080 --> 00:20:51,160
because if you recursively call yourself too many times, you get stack overflow

284
00:20:51,160 --> 00:20:54,990
since, as I said before, every function gets its own stack frame.

285
00:20:54,990 --> 00:20:59,500
So each call of the recursive function gets its own stack frame.

286
00:20:59,500 --> 00:21:04,140
So if you make 1,000 recursive calls, you get 1,000 stack frames,

287
00:21:04,140 --> 00:21:08,390
and quickly you lead to having too many stack frames and things just break.

288
00:21:08,390 --> 00:21:13,480
So that's why recursive functions are generally bad.

289
00:21:13,480 --> 00:21:19,370
But there is a nice subset of recursive functions called tail-recursive functions,

290
00:21:19,370 --> 00:21:26,120
and this happens to be an example of one where if the compiler notices this

291
00:21:26,120 --> 00:21:29,920
and it should, I think--in Clang if you pass it the -O2 flag

292
00:21:29,920 --> 00:21:33,250
then it will notice this is tail recursive and make things good.

293
00:21:33,250 --> 00:21:40,050
>> It will reuse the same stack frame over and over again for each recursive call.

294
00:21:40,050 --> 00:21:47,010
And so since you're using the same stack frame, you do not need to worry about

295
00:21:47,010 --> 00:21:51,690
ever stack overflowing, and at the same time, like you said before,

296
00:21:51,690 --> 00:21:56,380
where once you return true, then it has to return up all of these stack frames

297
00:21:56,380 --> 00:22:01,740
and the 10th call to SearchHelp has to return to the 9th, has to return to the 8th.

298
00:22:01,740 --> 00:22:05,360
So that doesn't need to happen when functions are tail recursive.

299
00:22:05,360 --> 00:22:13,740
And so what makes this function tail recursive is notice that for any given call to searchHelp

300
00:22:13,740 --> 00:22:18,470
the recursive call that it's making is what it's returning.

301
00:22:18,470 --> 00:22:25,290
So in the first call to SearchHelp, we either immediately return false,

302
00:22:25,290 --> 00:22:29,590
immediately return true, or we make a recursive call to SearchHelp

303
00:22:29,590 --> 00:22:33,810
where what we're returning is what that call is returning.

304
00:22:33,810 --> 00:22:51,560
And we could not do this if we did something like int x = SearchHelp, return x * 2,

305
00:22:51,560 --> 00:22:55,440
just some random change.

306
00:22:55,440 --> 00:23:01,470
>> So now this recursive call, this int x = SearchHelp recursive call,

307
00:23:01,470 --> 00:23:05,740
is no longer tail recursive because it actually does have to return

308
00:23:05,740 --> 00:23:10,400
back to a previous stack frame so that that previous call to the function

309
00:23:10,400 --> 00:23:13,040
can then do something with the return value.

310
00:23:13,040 --> 00:23:22,190
So this is not tail recursive, but what we had before is nicely tail recursive. Yeah.

311
00:23:22,190 --> 00:23:27,000
[student] Shouldn't the second base case be checked first

312
00:23:27,000 --> 00:23:30,640
because there could be a situation where when you pass it the argument

313
00:23:30,640 --> 00:23:35,770
you have start = end, but they are the needle value.

314
00:23:35,770 --> 00:23:47,310
The question was can't we run into the case where end is the needle value

315
00:23:47,310 --> 00:23:52,000
or start = end, appropriately, start = end

316
00:23:52,000 --> 00:23:59,480
and you haven't actually checked that particular value yet,

317
00:23:59,480 --> 00:24:03,910
then start + end/2 is just going to be the same value.

318
00:24:03,910 --> 00:24:07,890
But we've already returned false and we never actually checked the value.

319
00:24:07,890 --> 00:24:14,240
So at the very least, in the first call, if size is 0, then we want to return false.

320
00:24:14,240 --> 00:24:17,710
But if size is 1, then start isn't going to equal end,

321
00:24:17,710 --> 00:24:19,820
and we'll at least check the one element.

322
00:24:19,820 --> 00:24:26,750
But I think you are right in that we can end up in a case where start + end/2,

323
00:24:26,750 --> 00:24:31,190
start ends up being the same as start + end/2,

324
00:24:31,190 --> 00:24:35,350
but we never actually checked that element.

325
00:24:35,350 --> 00:24:44,740
>> So if we first check is the middle element the value we're looking for,

326
00:24:44,740 --> 00:24:47,110
then we can immediately return true.

327
00:24:47,110 --> 00:24:50,740
Else if they're equal, then there's no point in continuing

328
00:24:50,740 --> 00:24:58,440
since we're just going to update to a case where we are on a single-element array.

329
00:24:58,440 --> 00:25:01,110
If that single element is not the one we're looking for,

330
00:25:01,110 --> 00:25:03,530
then everything is wrong. Yeah.

331
00:25:03,530 --> 00:25:08,900
[student] The thing is that since size is actually larger than the number of elements in the array,

332
00:25:08,900 --> 00:25:13,070
there is already an offset-- >>So will size--

333
00:25:13,070 --> 00:25:19,380
[student] Say if the array was size 0, then the SearchHelp will actually check haystack of 0

334
00:25:19,380 --> 00:25:21,490
on the first call.

335
00:25:21,490 --> 00:25:25,300
The array has size 0, so the 0 is-- >>Yeah.

336
00:25:25,300 --> 00:25:30,750
There's another thing that--it might be good. Let's think.

337
00:25:30,750 --> 00:25:40,120
So if the array had 10 elements and the middle one we're going to check is index 5,

338
00:25:40,120 --> 00:25:45,700
so we're checking 5, and let's say that the value is less.

339
00:25:45,700 --> 00:25:50,720
So we're throwing everything away from 5 onward.

340
00:25:50,720 --> 00:25:54,030
So start + end/2 is going to be our new end,

341
00:25:54,030 --> 00:25:57,450
so yeah, it's always going to stay beyond the end of the array.

342
00:25:57,450 --> 00:26:03,570
If it's a case if it was even or odd, then we would check, say, 4,

343
00:26:03,570 --> 00:26:05,770
but we're still throwing away--

344
00:26:05,770 --> 00:26:13,500
So yeah, end is always going to be beyond the actual end of the array.

345
00:26:13,500 --> 00:26:18,350
So the elements we are focusing on, end is always going to be the one after that.

346
00:26:18,350 --> 00:26:24,270
And so if start does ever equal end, we are in an array of size 0.

347
00:26:24,270 --> 00:26:35,600
>> The other thing I was thinking is we are updating start to be start + end/2,

348
00:26:35,600 --> 00:26:44,020
so this is the case that I'm having trouble with, where start + end/2

349
00:26:44,020 --> 00:26:46,820
is the element we're checking.

350
00:26:46,820 --> 00:26:58,150
Let's say we had this 10-element array. Whatever.

351
00:26:58,150 --> 00:27:03,250
So start + end/2 is going to be something like this one,

352
00:27:03,250 --> 00:27:07,060
and if that's not the value, say we want to update.

353
00:27:07,060 --> 00:27:10,060
The value is greater, so we want to look at this half of the array.

354
00:27:10,060 --> 00:27:15,910
So how we're updating start, we're updating start to now be this element.

355
00:27:15,910 --> 00:27:23,790
But this may still work, or at the very least, you can do start + end/2 + 1.

356
00:27:23,790 --> 00:27:27,850
[student] You don't need to be start + end [inaudible] >>Yeah.

357
00:27:27,850 --> 00:27:33,240
We have already checked this element and know it's not the one we're looking for.

358
00:27:33,240 --> 00:27:36,800
So we don't need to update start to be this element.

359
00:27:36,800 --> 00:27:39,560
We can just skip it and update start to be this element.

360
00:27:39,560 --> 00:27:46,060
And is there ever a case, let's say, that this were end,

361
00:27:46,060 --> 00:27:53,140
so then start would be this, start + end/2 would be this,

362
00:27:53,140 --> 00:28:00,580
start + end-- Yeah, I think it can end up in infinite recursion.

363
00:28:00,580 --> 00:28:12,690
Let's say it's just an array of size 2 or an array of size 1. I think this will work.

364
00:28:12,690 --> 00:28:19,490
So currently, start is that element and end is 1 beyond it.

365
00:28:19,490 --> 00:28:24,110
So the element that we're going to check is this one,

366
00:28:24,110 --> 00:28:29,400
and then when we update start, we're updating start to be 0 + 1/2,

367
00:28:29,400 --> 00:28:33,160
which is going to end us back with start being this element.

368
00:28:33,160 --> 00:28:36,210
>> So we're checking the same element over and over again.

369
00:28:36,210 --> 00:28:43,310
So this is the case where every recursive call must actually update something.

370
00:28:43,310 --> 00:28:48,370
So we need to do start + end/2 + 1, or else there's a case

371
00:28:48,370 --> 00:28:50,710
where we're not actually updating start.

372
00:28:50,710 --> 00:28:53,820
Everyone see that?

373
00:28:53,820 --> 00:28:56,310
Okay.

374
00:28:56,310 --> 00:29:03,860
Does anyone have questions on this solution or any more comments?

375
00:29:05,220 --> 00:29:10,280
Okay. Does anyone have an iterative solution that we can all look at?

376
00:29:17,400 --> 00:29:20,940
Did we all do it recursively?

377
00:29:20,940 --> 00:29:25,950
Or also I guess if you opened hers, then you might have overridden your previous one.

378
00:29:25,950 --> 00:29:28,810
Does it automatically save? I'm not positive.

379
00:29:35,090 --> 00:29:39,130
Does anyone have iterative?

380
00:29:39,130 --> 00:29:42,430
We can walk through it together if not.

381
00:29:46,080 --> 00:29:48,100
The idea is going to be the same.

382
00:30:00,260 --> 00:30:02,830
Iterative solution.

383
00:30:02,830 --> 00:30:07,140
We're going to want to basically do the same idea

384
00:30:07,140 --> 00:30:16,530
where we want to keep track of the new end of the array and the new start of the array

385
00:30:16,530 --> 00:30:18,510
and do that over and over.

386
00:30:18,510 --> 00:30:22,430
And if what we're keeping track of as the start and the end ever intersect,

387
00:30:22,430 --> 00:30:29,020
then we did not find it and we can return false.

388
00:30:29,020 --> 00:30:37,540
So how do I do that? Anyone have suggestions or code for me to pull up?

389
00:30:42,190 --> 00:30:47,450
[student] Do a while loop. >>Yeah.

390
00:30:47,450 --> 00:30:49,450
You are going to want to do a loop.

391
00:30:49,450 --> 00:30:51,830
>> Did you have code I could pull up, or what were you going to suggest?

392
00:30:51,830 --> 00:30:56,340
[student] I think so. >>All right. This makes things easier. What was your name?

393
00:30:56,340 --> 00:30:57,890
[student] Lucas.

394
00:31:00,140 --> 00:31:04,190
Revision 1. Okay.

395
00:31:04,190 --> 00:31:13,200
Low is what we called start before.

396
00:31:13,200 --> 00:31:17,080
Up is not quite what we called end before.

397
00:31:17,080 --> 00:31:22,750
Actually, end is now within the array. It's an element we should consider.

398
00:31:22,750 --> 00:31:26,890
So low is 0, up is the size of the array - 1,

399
00:31:26,890 --> 00:31:34,780
and now we are looping, and we are checking--

400
00:31:34,780 --> 00:31:37,340
I guess you can walk through it.

401
00:31:37,340 --> 00:31:41,420
What was your thinking through this? Walk us through your code.

402
00:31:41,420 --> 00:31:49,940
[student] Sure. Look at the haystack value in the middle and compare it to needle.

403
00:31:49,940 --> 00:31:58,520
So if it's greater than your needle, then you want to--oh, actually, that should be backwards.

404
00:31:58,520 --> 00:32:05,180
You're going to want to throw away the right half, and so yeah, that should be the way.

405
00:32:05,180 --> 00:32:08,830
[Bowden] So this should be less? Is that what you said? >>[student] Yeah.

406
00:32:08,830 --> 00:32:10,390
[Bowden] Okay. Less.

407
00:32:10,390 --> 00:32:15,700
So if what we're looking at is smaller than what we want,

408
00:32:15,700 --> 00:32:19,410
then yeah, we want to throw away the left half,

409
00:32:19,410 --> 00:32:22,210
which means we are updating everything we're considering

410
00:32:22,210 --> 00:32:26,610
by moving low to the right of the array.

411
00:32:26,610 --> 00:32:30,130
This looks good.

412
00:32:30,130 --> 00:32:34,550
I think it has the same issue that we said on the previous one,

413
00:32:34,550 --> 00:32:49,760
where if low is 0 and up is 1, then low + up/2 is going to set up to be the same thing again.

414
00:32:49,760 --> 00:32:53,860
>> And even if that's not the case, it is still more efficient at the very least

415
00:32:53,860 --> 00:32:57,630
to just throw away the element we just looked at which we know is wrong.

416
00:32:57,630 --> 00:33:03,240
So low + up/2 + 1-- >>[student] That should be the other way.

417
00:33:03,240 --> 00:33:05,900
[Bowden] Or this should be - 1 and the other one should be + 1.

418
00:33:05,900 --> 00:33:09,580
[student] And there should be a double equals sign. >>[Bowden] Yeah.

419
00:33:09,580 --> 00:33:11,340
[student] Yeah.

420
00:33:14,540 --> 00:33:15,910
Okay.

421
00:33:15,910 --> 00:33:21,280
And finally, now that we have this + 1 - 1 thing,

422
00:33:21,280 --> 00:33:31,520
is it--it might not be--is it ever possible for low to end up with a value greater than up?

423
00:33:35,710 --> 00:33:40,320
I think the only way that can happen-- Is it possible? >>[student] I don't know.

424
00:33:40,320 --> 00:33:45,220
But if it gets truncated and then gets minus that 1 and then-- >>Yeah.

425
00:33:45,220 --> 00:33:47,480
[student] It would possibly get messed up.

426
00:33:49,700 --> 00:33:53,940
I think it should be good only because

427
00:33:53,940 --> 00:33:58,800
for it to end up lower they would have to be equal, I think.

428
00:33:58,800 --> 00:34:03,070
But if they are equal, then we wouldn't have done the while loop to begin with

429
00:34:03,070 --> 00:34:06,670
and we just would have returned the value. So I think we're good now.

430
00:34:06,670 --> 00:34:11,530
Notice that even though this problem is no longer recursive,

431
00:34:11,530 --> 00:34:17,400
the same kind of ideas apply where we can see how this so easily lends itself

432
00:34:17,400 --> 00:34:23,659
to a recursive solution by the fact that we're just updating the indices over and over again,

433
00:34:23,659 --> 00:34:29,960
we're making the problem smaller and smaller, we're focusing on a subset of the array.

434
00:34:29,960 --> 00:34:40,860
[student] If low is 0 and up is 1, they would both be 0 + 1/2, which would go to 0,

435
00:34:40,860 --> 00:34:44,429
and then one would be + 1, one would be - 1.

436
00:34:47,000 --> 00:34:50,870
[student] Where are we checking equality?

437
00:34:50,870 --> 00:34:55,100
Like if the middle one is actually needle?

438
00:34:55,100 --> 00:34:58,590
We're not currently doing that? Oh!

439
00:35:00,610 --> 00:35:02,460
If it's--

440
00:35:05,340 --> 00:35:13,740
Yes. We can't just do the test down here because let's say the first middle--

441
00:35:13,740 --> 00:35:16,430
[student] It's actually like not throw away the bound.

442
00:35:16,430 --> 00:35:20,220
So if you throw away the bound, you have to check it first or whatever.

443
00:35:20,220 --> 00:35:23,350
Ah. Yeah. >>[student] Yeah.

444
00:35:23,350 --> 00:35:29,650
So now we have thrown away the one we currently looked at,

445
00:35:29,650 --> 00:35:33,260
which means we now need to also have

446
00:35:33,260 --> 00:35:44,810
if (haystack[(low+up)/2] == needle), then we can return true.

447
00:35:44,810 --> 00:35:52,070
And whether I put else or just if, it means literally the same thing

448
00:35:52,070 --> 00:35:57,110
because this would have returned true.

449
00:35:57,110 --> 00:36:01,450
So I'll put else if, but it doesn't matter.

450
00:36:01,450 --> 00:36:10,440
>> So else if this, else this, and this is a common thing I do

451
00:36:10,440 --> 00:36:14,340
where even if it is the case where everything is good here,

452
00:36:14,340 --> 00:36:22,780
like low can never be greater than up, it's not worth reasoning about whether that's true.

453
00:36:22,780 --> 00:36:28,010
So you may as well say while low is less than or equal to

454
00:36:28,010 --> 00:36:30,720
or while low is less than

455
00:36:30,720 --> 00:36:35,300
so if they are ever equal or low happens to pass up,

456
00:36:35,300 --> 00:36:40,130
then we can break out of this loop.

457
00:36:41,410 --> 00:36:44,630
Questions, concerns, comments?

458
00:36:47,080 --> 00:36:49,270
Okay. This looks good.

459
00:36:49,270 --> 00:36:52,230
Now we want to do sort.

460
00:36:52,230 --> 00:37:04,030
If we go to my second revision, we see these same numbers,

461
00:37:04,030 --> 00:37:07,550
but now they are no longer in sorted order.

462
00:37:07,550 --> 00:37:12,840
And we want to implement sort using any algorithm in O of n log n.

463
00:37:12,840 --> 00:37:17,240
So which algorithm do you think we should implement here? >>[student] Merge sort.

464
00:37:17,240 --> 00:37:23,810
[Bowden] Yeah. Merge sort is O(n log n), so that's what we're going to do.

465
00:37:23,810 --> 00:37:26,680
And the problem is going to be pretty similar,

466
00:37:26,680 --> 00:37:31,920
where it easily lends itself to a recursive solution.

467
00:37:31,920 --> 00:37:35,580
We can also come up with an iterative solution if we want,

468
00:37:35,580 --> 00:37:42,540
but recursion will be easier here and we should do recursion.

469
00:37:45,120 --> 00:37:49,530
I guess we will walk through merge sort first,

470
00:37:49,530 --> 00:37:54,280
although there is a lovely video on merge sort already. [laughter]

471
00:37:54,280 --> 00:37:59,780
So merge sort there are--I am wasting so much of this paper.

472
00:37:59,780 --> 00:38:02,080
Oh, there's only one left.

473
00:38:02,080 --> 00:38:03,630
So merge.

474
00:38:08,190 --> 00:38:12,470
Oh, 1, 3, 5.

475
00:38:26,090 --> 00:38:27,440
Okay.

476
00:38:29,910 --> 00:38:33,460
Merge takes two separate arrays.

477
00:38:33,460 --> 00:38:36,780
Individually those two arrays are both sorted.

478
00:38:36,780 --> 00:38:40,970
So this array, 1, 3, 5, sorted. This array, 0, 2, 4, sorted.

479
00:38:40,970 --> 00:38:46,710
Now what merge should do is combine them into a single array that is itself sorted.

480
00:38:46,710 --> 00:38:57,130
So we want an array of size 6 that is going to have these elements inside of it

481
00:38:57,130 --> 00:38:59,390
in sorted order.

482
00:38:59,390 --> 00:39:03,390
>> And so we can take advantage of the fact that these two arrays are sorted

483
00:39:03,390 --> 00:39:06,800
to do this in linear time,

484
00:39:06,800 --> 00:39:13,510
linear time meaning if this array is size x and this is size y,

485
00:39:13,510 --> 00:39:20,970
then the total algorithm should be O(x+y). Okay.

486
00:39:20,970 --> 00:39:23,150
So suggestions.

487
00:39:23,150 --> 00:39:26,030
[student] Could we start from the left?

488
00:39:26,030 --> 00:39:30,150
So you'll put the 0 down first and then the 1 and then here you're at the 2.

489
00:39:30,150 --> 00:39:33,320
So it's kind of like you have a tab that's moving to the right. >>[Bowden] Yeah.

490
00:39:33,320 --> 00:39:41,070
For both of these arrays if we just focus on the leftmost element.

491
00:39:41,070 --> 00:39:43,530
Because both arrays are sorted, we know that these 2 elements

492
00:39:43,530 --> 00:39:46,920
are the smallest elements in either array.

493
00:39:46,920 --> 00:39:53,500
So that means that 1 of those 2 elements must be the smallest element in our merged array.

494
00:39:53,500 --> 00:39:58,190
It just so happens that the smallest is the one on the right this time.

495
00:39:58,190 --> 00:40:02,580
So we take 0, insert it on the left because 0 is less than 1,

496
00:40:02,580 --> 00:40:08,210
so take 0, insert it into our first position, and then we update this

497
00:40:08,210 --> 00:40:12,070
to now focus on the first element.

498
00:40:12,070 --> 00:40:14,570
And now we repeat.

499
00:40:14,570 --> 00:40:20,670
So now we compare 2 and 1. 1 is smaller, so we'll insert 1.

500
00:40:20,670 --> 00:40:25,300
We update this pointer to point to this guy.

501
00:40:25,300 --> 00:40:33,160
Now we do it again, so 2. This will update, compare these 2, 3.

502
00:40:33,160 --> 00:40:37,770
This updates, then 4 and 5.

503
00:40:37,770 --> 00:40:42,110
So that is merge.

504
00:40:42,110 --> 00:40:49,010
>> It should be pretty obvious that it is linear time since we just go across each element once.

505
00:40:49,010 --> 00:40:55,980
And that is the biggest step to implementing merge sort is doing this.

506
00:40:55,980 --> 00:40:59,330
And it's not that hard.

507
00:40:59,330 --> 00:41:15,020
A couple things to worry about is let's say we were merging 1, 2, 3, 4, 5, 6.

508
00:41:15,020 --> 00:41:30,930
In this case we end up in the scenario where this one is going to be smaller,

509
00:41:30,930 --> 00:41:36,160
then we update this pointer, this one is going to be smaller, update this,

510
00:41:36,160 --> 00:41:41,280
this one's smaller, and now you have to recognize

511
00:41:41,280 --> 00:41:44,220
when you've actually run out of elements to compare with.

512
00:41:44,220 --> 00:41:49,400
Since we have already used this entire array,

513
00:41:49,400 --> 00:41:55,190
everything in this array is now just inserted into here.

514
00:41:55,190 --> 00:42:02,040
So if we ever run into the point where one of our arrays is completely merged already,

515
00:42:02,040 --> 00:42:06,510
then we just take all the elements of the other array and insert them into the end of the array.

516
00:42:06,510 --> 00:42:13,630
So we can just insert 4, 5, 6. Okay.

517
00:42:13,630 --> 00:42:18,070
That is one thing to watch out for.

518
00:42:22,080 --> 00:42:26,120
Implementing that should be step 1.

519
00:42:26,120 --> 00:42:32,600
Merge sort then based on that, it's 2 steps, 2 silly steps.

520
00:42:38,800 --> 00:42:42,090
Let's just give this array.

521
00:42:57,920 --> 00:43:05,680
So merge sort, step 1 is to recursively break the array into halves.

522
00:43:05,680 --> 00:43:09,350
So split this array into halves.

523
00:43:09,350 --> 00:43:22,920
We now have 4, 15, 16, 50 and 8, 23, 42, 108.

524
00:43:22,920 --> 00:43:25,800
And now we do it again and we split these into halves.

525
00:43:25,800 --> 00:43:27,530
I'll just do it on this side.

526
00:43:27,530 --> 00:43:34,790
So 4, 15 and 16, 50.

527
00:43:34,790 --> 00:43:37,440
We would do the same thing over here.

528
00:43:37,440 --> 00:43:40,340
And now we split it into halves again.

529
00:43:40,340 --> 00:43:51,080
And we have 4, 15, 16, 50.

530
00:43:51,080 --> 00:43:53,170
So that is our base case.

531
00:43:53,170 --> 00:44:00,540
Once the arrays are of size 1, then we stop with the splitting into halves.

532
00:44:00,540 --> 00:44:03,190
>> Now what do we do with this?

533
00:44:03,190 --> 00:44:15,730
We end up this will also break down into 8, 23, 42, and 108.

534
00:44:15,730 --> 00:44:24,000
So now that we're at this point, now step two of merge sort is just merging pairs to the lists.

535
00:44:24,000 --> 00:44:27,610
So we want to merge these. We just call merge.

536
00:44:27,610 --> 00:44:31,410
We know merge will return these in sorted order.

537
00:44:31,410 --> 00:44:33,920
4, 15.

538
00:44:33,920 --> 00:44:41,440
Now we want to merge these, and that will return a list with those in sorted order,

539
00:44:41,440 --> 00:44:44,160
16, 50.

540
00:44:44,160 --> 00:44:57,380
We merge those--I cannot write--8, 23 and 42, 108.

541
00:44:57,380 --> 00:45:02,890
So we have merged pairs once.

542
00:45:02,890 --> 00:45:05,140
Now we just merge again.

543
00:45:05,140 --> 00:45:10,130
Notice that each of these lists is sorted in itself,

544
00:45:10,130 --> 00:45:15,220
and then we can just merge these lists to get a list of size 4 which is sorted

545
00:45:15,220 --> 00:45:19,990
and merge these two lists to get a list of size 4 that is sorted.

546
00:45:19,990 --> 00:45:25,710
And finally, we can merge those two lists of size 4 to get one list of size 8 that is sorted.

547
00:45:25,710 --> 00:45:34,030
So to see that this is overall n log n, we already saw that merge is linear,

548
00:45:34,030 --> 00:45:40,390
so when we're dealing with merging these, so like the overall cost of merge

549
00:45:40,390 --> 00:45:43,410
for these two lists is just 2 because--

550
00:45:43,410 --> 00:45:49,610
Or well, it's O of n, but n here is just these 2 elements, so it's 2.

551
00:45:49,610 --> 00:45:52,850
And these 2 will be 2 and these 2 will be 2 and these 2 will be 2,

552
00:45:52,850 --> 00:45:58,820
so across all of the merges that we need to do, we end up doing n.

553
00:45:58,820 --> 00:46:03,210
Like 2 + 2 + 2 + 2 is 8, which is n,

554
00:46:03,210 --> 00:46:08,060
so the cost of merging in this set is n.

555
00:46:08,060 --> 00:46:10,810
And then the same thing here.

556
00:46:10,810 --> 00:46:16,980
We'll merge these 2, then these 2, and individually this merge will take four operations,

557
00:46:16,980 --> 00:46:23,610
this merge will take four operations, but once again, between all of these,

558
00:46:23,610 --> 00:46:29,030
we end up merging n total things, and so this step takes n.

559
00:46:29,030 --> 00:46:33,670
And so each level takes n elements being merged.

560
00:46:33,670 --> 00:46:36,110
>> And how many levels are there?

561
00:46:36,110 --> 00:46:40,160
At each level, our array grows by size 2.

562
00:46:40,160 --> 00:46:44,590
Here our arrays are of size 1, here they're of size 2, here they're of size 4,

563
00:46:44,590 --> 00:46:46,470
and finally, they're of size 8.

564
00:46:46,470 --> 00:46:56,450
So since it is doubling, there are going to be a total of log n of these levels.

565
00:46:56,450 --> 00:47:02,090
So with log n levels, each individual level taking n total operations,

566
00:47:02,090 --> 00:47:05,720
we get an n log n algorithm.

567
00:47:05,720 --> 00:47:07,790
Questions?

568
00:47:08,940 --> 00:47:13,320
Have people already made progress on how to implement this?

569
00:47:13,320 --> 00:47:18,260
Is anyone already in a state where I can just pull up their code?

570
00:47:20,320 --> 00:47:22,260
I can give a minute.

571
00:47:24,770 --> 00:47:27,470
This one is going to be longer.

572
00:47:27,470 --> 00:47:28,730
I highly recommend recur--

573
00:47:28,730 --> 00:47:30,510
You don't have to do recursion for merge

574
00:47:30,510 --> 00:47:33,750
because to do recursion for merge, you're going to have to pass a bunch of different sizes.

575
00:47:33,750 --> 00:47:37,150
You can, but it's annoying.

576
00:47:37,150 --> 00:47:43,720
But recursion for sort itself is pretty easy.

577
00:47:43,720 --> 00:47:49,190
You just literally call sort on left half, sort on right half. Okay.

578
00:47:51,770 --> 00:47:54,860
Anyone have anything I can pull up yet?

579
00:47:54,860 --> 00:47:57,540
Or else I'll give a minute.

580
00:47:58,210 --> 00:47:59,900
Okay.

581
00:47:59,900 --> 00:48:02,970
Anyone have something we can work with?

582
00:48:05,450 --> 00:48:09,680
Or else we'll just work with this and then expand from there.

583
00:48:09,680 --> 00:48:14,050
>> Anyone have more than this that I can pull up?

584
00:48:14,050 --> 00:48:17,770
[student] Yeah. You can pull up mine. >>All right.

585
00:48:17,770 --> 00:48:19,730
Yes!

586
00:48:22,170 --> 00:48:25,280
[student] There were a lot of conditions. >>Oh, shoot. Can you--

587
00:48:25,280 --> 00:48:28,110
[student] I have to save it. >>Yeah.

588
00:48:32,420 --> 00:48:35,730
So we did do the merge separately.

589
00:48:35,730 --> 00:48:38,570
Oh, but it's not that bad.

590
00:48:39,790 --> 00:48:41,650
Okay.

591
00:48:41,650 --> 00:48:47,080
So sort is itself just calling mergeSortHelp.

592
00:48:47,080 --> 00:48:49,530
Explain to us what mergeSortHelp does.

593
00:48:49,530 --> 00:48:55,700
[student] MergeSortHelp pretty much does the two main steps,

594
00:48:55,700 --> 00:49:01,270
which is to sort each half of the array and then to merge both of them.

595
00:49:04,960 --> 00:49:08,050
[Bowden] Okay, so give me a second.

596
00:49:10,850 --> 00:49:13,210
I think this-- >>[student] I need to--

597
00:49:17,100 --> 00:49:19,400
Yeah. I'm missing something.

598
00:49:19,400 --> 00:49:23,100
In merge, I realize that I need to create a new array

599
00:49:23,100 --> 00:49:26,530
because I couldn't do it in place. >>Yes. You cannot. Correct.

600
00:49:26,530 --> 00:49:28,170
[student] So I create a new array.

601
00:49:28,170 --> 00:49:31,510
I forgot at the end of merge to re-change.

602
00:49:31,510 --> 00:49:34,490
Okay. We need a new array.

603
00:49:34,490 --> 00:49:41,000
In merge sort, this is almost always true.

604
00:49:41,000 --> 00:49:44,340
Part of the cost of a better algorithm time-wise

605
00:49:44,340 --> 00:49:47,310
is almost always needing to use a bit more memory.

606
00:49:47,310 --> 00:49:51,570
So here, no matter how you do merge sort,

607
00:49:51,570 --> 00:49:54,780
you would inevitably need to use some extra memory.

608
00:49:54,780 --> 00:49:58,240
He or she created a new array.

609
00:49:58,240 --> 00:50:03,400
And then you say at the end we just need to copy new array into the original array.

610
00:50:03,400 --> 00:50:04,830
[student] I think so, yeah.

611
00:50:04,830 --> 00:50:08,210
I don't know if that works in terms of counting by reference or whatever--

612
00:50:08,210 --> 00:50:11,650
Yeah, it will work. >>[student] Okay.

613
00:50:20,620 --> 00:50:24,480
Did you try running this? >>[student] No, not yet. >>Okay.

614
00:50:24,480 --> 00:50:28,880
Try running it, and then I'll talk about it for a second.

615
00:50:28,880 --> 00:50:35,200
[student] I need to have all the function prototypes and everything, though, right?

616
00:50:37,640 --> 00:50:40,840
The function prototypes. Oh, you mean like-- Yes.

617
00:50:40,840 --> 00:50:43,040
Sort is calling mergeSortHelp.

618
00:50:43,040 --> 00:50:47,390
>> So in order for sort to call mergeSortHelp, mergeSortHelp must either have been defined

619
00:50:47,390 --> 00:50:56,370
before sort or we just need the prototype. Just copy and paste that.

620
00:50:56,370 --> 00:50:59,490
And similarly, mergeSortHelp is calling merge,

621
00:50:59,490 --> 00:51:03,830
but merge has not been defined yet, so we can just let mergeSortHelp know

622
00:51:03,830 --> 00:51:08,700
that that's what merge is going to look like, and that's that.

623
00:51:09,950 --> 00:51:15,730
So mergeSortHelp.

624
00:51:22,770 --> 00:51:32,660
We have an issue here where we have no base case.

625
00:51:32,660 --> 00:51:38,110
MergeSortHelp is recursive, so any recursive function

626
00:51:38,110 --> 00:51:42,610
is going to need some sort of base case to know when to stop recursively calling itself.

627
00:51:42,610 --> 00:51:45,590
What is our base case going to be here? Yeah.

628
00:51:45,590 --> 00:51:49,110
[student] If the size is 1? >>[Bowden] Yes.

629
00:51:49,110 --> 00:51:56,220
So like we saw right there, we stopped splitting arrays

630
00:51:56,220 --> 00:52:01,850
once we got into arrays of size 1, which inevitably are sorted themselves.

631
00:52:01,850 --> 00:52:09,530
So if size equals 1, we know the array is already sorted,

632
00:52:09,530 --> 00:52:12,970
so we can just return.

633
00:52:12,970 --> 00:52:16,880
>> Notice that's void, so we don't return anything particular, we just return.

634
00:52:16,880 --> 00:52:19,580
Okay. So that's our base case.

635
00:52:19,580 --> 00:52:27,440
I guess our base case could also be if we happen to be merging an array of size 0,

636
00:52:27,440 --> 00:52:30,030
we probably want to stop at some point,

637
00:52:30,030 --> 00:52:33,610
so we can just say size less than 2 or less than or equal to 1

638
00:52:33,610 --> 00:52:37,150
so that this will work for any array now.

639
00:52:37,150 --> 00:52:38,870
Okay.

640
00:52:38,870 --> 00:52:42,740
So that's our base case.

641
00:52:42,740 --> 00:52:45,950
Now do you want to walk us through merge?

642
00:52:45,950 --> 00:52:49,140
What do all these cases mean?

643
00:52:49,140 --> 00:52:54,480
Up here, we're just doing the same idea, the--

644
00:52:56,970 --> 00:53:02,470
[student] I need to be passing size with all the mergeSortHelp calls.

645
00:53:02,470 --> 00:53:10,080
I added size as an additional primary and it's not there, like size/2.

646
00:53:10,080 --> 00:53:16,210
[Bowden] Oh, size/2, size/2. >>[student] Yeah, and also in the above function as well.

647
00:53:16,210 --> 00:53:21,320
[Bowden] Here? >>[student] Just size. >>[Bowden] Oh. Size, size? >>[student] Yeah.

648
00:53:21,320 --> 00:53:23,010
[Bowden] Okay.

649
00:53:23,010 --> 00:53:26,580
Let me think for a second.

650
00:53:26,580 --> 00:53:28,780
Do we run into an issue?

651
00:53:28,780 --> 00:53:33,690
We're always treating the left as 0. >>[student] No.

652
00:53:33,690 --> 00:53:36,340
That's wrong too. Sorry. It should be start. Yeah.

653
00:53:36,340 --> 00:53:39,230
[Bowden] Okay. I like that better.

654
00:53:39,230 --> 00:53:43,880
And end. Okay.

655
00:53:43,880 --> 00:53:47,200
So now do you want to walk us through merge? >>[student] Okay.

656
00:53:47,200 --> 00:53:52,150
I'm just walking through this new array that I've created.

657
00:53:52,150 --> 00:53:57,420
Its size is the size of the portion of the array that we want to be sorted

658
00:53:57,420 --> 00:54:03,460
and trying to find the element that I should put in the new array step.

659
00:54:03,460 --> 00:54:10,140
So to do that, first I'm checking if the left half of the array continues to have any more elements,

660
00:54:10,140 --> 00:54:14,260
and if it doesn't, then you go down to that else condition, which just says

661
00:54:14,260 --> 00:54:20,180
okay, it must be in the right array, and we'll put that in the current index of newArray.

662
00:54:20,180 --> 00:54:27,620
>> And then otherwise, I'm checking if the right side of the array is also finished,

663
00:54:27,620 --> 00:54:30,630
in which case I just put in the left.

664
00:54:30,630 --> 00:54:34,180
That might not actually be necessary. I'm not sure.

665
00:54:34,180 --> 00:54:40,970
But anyway, the other two check which of the two are smaller in the left or the right.

666
00:54:40,970 --> 00:54:49,770
And also in each case, I'm incrementing whichever placeholder I increment.

667
00:54:49,770 --> 00:54:52,040
[Bowden] Okay.

668
00:54:52,040 --> 00:54:53,840
That looks good.

669
00:54:53,840 --> 00:54:58,800
Does anyone have comments or concerns or questions?

670
00:55:00,660 --> 00:55:07,720
So the four cases that we have to bring things into just being--or it looks like five--

671
00:55:07,720 --> 00:55:13,100
but we have to consider whether the left array has run out of things we need to merge,

672
00:55:13,100 --> 00:55:16,390
whether the right array has run out of things we need to merge--

673
00:55:16,390 --> 00:55:18,400
I'm pointing at nothing.

674
00:55:18,400 --> 00:55:21,730
So whether the left array has run out of things or the right array has run out of things.

675
00:55:21,730 --> 00:55:24,320
Those are two cases.

676
00:55:24,320 --> 00:55:30,920
We also need the trivial case of whether the left thing is less than the right thing.

677
00:55:30,920 --> 00:55:33,910
Then we want to choose the left thing.

678
00:55:33,910 --> 00:55:37,630
Those are the cases.

679
00:55:37,630 --> 00:55:40,990
So this was right, so that's that.

680
00:55:40,990 --> 00:55:46,760
Array left. It's 1, 2, 3. Okay. So yeah, those are the four things we might want to do.

681
00:55:50,350 --> 00:55:54,510
And we won't go over an iterative solution.

682
00:55:54,510 --> 00:55:55,980
I would not recommend--

683
00:55:55,980 --> 00:56:03,070
Merge sort is an example of a function that is both not tail recursive,

684
00:56:03,070 --> 00:56:07,040
it's not easy to make it tail recursive,

685
00:56:07,040 --> 00:56:13,450
but also it's not very easy to make it iterative.

686
00:56:13,450 --> 00:56:16,910
This is very easy.

687
00:56:16,910 --> 00:56:19,170
This implementation of merge sort,

688
00:56:19,170 --> 00:56:22,140
merge, no matter what you do, you're going to build merge.

689
00:56:22,140 --> 00:56:29,170
>> So merge sort built on top of merge recursively is just these three lines.

690
00:56:29,170 --> 00:56:34,700
Iteratively, it is more annoying and more difficult to think about.

691
00:56:34,700 --> 00:56:41,860
But notice that it's not tail recursive since mergeSortHelp--when it calls itself--

692
00:56:41,860 --> 00:56:46,590
it still needs to do things after this recursive call returns.

693
00:56:46,590 --> 00:56:50,830
So this stack frame must continue to exist even after calling this.

694
00:56:50,830 --> 00:56:54,170
And then when you call this, the stack frame must continue to exist

695
00:56:54,170 --> 00:56:57,780
because even after that call, we still need to merge.

696
00:56:57,780 --> 00:57:01,920
And it is nontrivial to make this tail recursive.

697
00:57:04,070 --> 00:57:06,270
Questions?

698
00:57:08,300 --> 00:57:09,860
All right.

699
00:57:09,860 --> 00:57:13,400
So going back to sort--oh, there's two things I want to show. Okay.

700
00:57:13,400 --> 00:57:17,840
Going back to sort, we'll do this quickly.

701
00:57:17,840 --> 00:57:21,030
Or search. Sort? Sort. Yeah.

702
00:57:21,030 --> 00:57:22,730
Going back to the beginnings of sort.

703
00:57:22,730 --> 00:57:29,870
We want to create an algorithm that sorts the array using any algorithm

704
00:57:29,870 --> 00:57:33,660
in O of n.

705
00:57:33,660 --> 00:57:40,860
So how is this possible? Does anyone have any sort of--

706
00:57:40,860 --> 00:57:44,300
I hinted before at--

707
00:57:44,300 --> 00:57:48,300
If we're about to improve from n log n to O of n,

708
00:57:48,300 --> 00:57:51,450
we have improved our algorithm time-wise,

709
00:57:51,450 --> 00:57:55,250
which means what are we going to need to do to make up for that?

710
00:57:55,250 --> 00:57:59,520
[student] Space. >>Yeah. We're going to be using more space.

711
00:57:59,520 --> 00:58:04,490
And not even just more space, it's exponentially more space.

712
00:58:04,490 --> 00:58:14,320
So I think this type of algorithm is pseudo something, pseudo polynom--

713
00:58:14,320 --> 00:58:18,980
pseudo-- I can't remember. Pseudo something.

714
00:58:18,980 --> 00:58:22,210
But it's because we need to use so much space

715
00:58:22,210 --> 00:58:28,610
that this is achievable but not realistic.

716
00:58:28,610 --> 00:58:31,220
>> And how do we achieve this?

717
00:58:31,220 --> 00:58:36,810
We can achieve this if we guarantee that any particular element in the array

718
00:58:36,810 --> 00:58:39,600
is below a certain size.

719
00:58:42,070 --> 00:58:44,500
So let's just say that size is 200,

720
00:58:44,500 --> 00:58:48,130
any element in an array is below size 200.

721
00:58:48,130 --> 00:58:51,080
And this is actually very realistic.

722
00:58:51,080 --> 00:58:58,660
You can very easily have an array that you know everything in it

723
00:58:58,660 --> 00:59:00,570
is going to be less than some number.

724
00:59:00,570 --> 00:59:07,400
Like if you have some absolutely massive vector or something

725
00:59:07,400 --> 00:59:11,810
but you know everything is going to be between 0 and 5,

726
00:59:11,810 --> 00:59:14,790
then it's going to be significantly faster to do this.

727
00:59:14,790 --> 00:59:17,930
And the bound on any of the elements is 5,

728
00:59:17,930 --> 00:59:21,980
so this bound, that is how much memory you're going to be using.

729
00:59:21,980 --> 00:59:26,300
So the bound is 200.

730
00:59:26,300 --> 00:59:32,960
In theory there is always a bound since an integer can only be up to 4 billion,

731
00:59:32,960 --> 00:59:40,600
but that's unrealistic since then we'd be using space

732
00:59:40,600 --> 00:59:44,400
on the order of 4 billion. So that's unrealistic.

733
00:59:44,400 --> 00:59:47,060
But here we'll say our bound is 200.

734
00:59:47,060 --> 00:59:59,570
The trick to doing it in O of n is we make another array called counts of size BOUND.

735
00:59:59,570 --> 01:00:10,470
So actually, this is a shortcut for--I actually don't know if Clang does this.

736
01:00:11,150 --> 01:00:15,330
But in GCC at the very least--I'm assuming Clang does it too--

737
01:00:15,330 --> 01:00:18,180
this will just initialize the entire array to be 0s.

738
01:00:18,180 --> 01:00:25,320
So if I don't want to do that, then I could separately do for (int i = 0;

739
01:00:25,320 --> 01:00:31,500
i < BOUND; i++) and counts[i] = 0;

740
01:00:31,500 --> 01:00:35,260
So now everything is initialized to 0.

741
01:00:35,260 --> 01:00:39,570
I iterate over my array,

742
01:00:39,570 --> 01:00:51,920
and what I'm doing is I'm counting the number of each-- Let's go down here.

743
01:00:51,920 --> 01:00:55,480
We have 4, 15, 16, 50, 8, 23, 42, 108.

744
01:00:55,480 --> 01:01:00,010
What I'm counting is the number of occurrences of each of those elements.

745
01:01:00,010 --> 01:01:03,470
Let's actually add a couple more in here with some repeats.

746
01:01:03,470 --> 01:01:11,070
So the value we have here, the value of that is going to be array[i].

747
01:01:11,070 --> 01:01:14,850
So val could be 4 or 8 or whatever.

748
01:01:14,850 --> 01:01:18,870
And now I'm counting how many of that value I've seen,

749
01:01:18,870 --> 01:01:21,230
so counts[val]++;

750
01:01:21,230 --> 01:01:29,430
After this is done, counts is going to look something like 0001.

751
01:01:29,430 --> 01:01:42,190
Let's do counts[val]-- BOUND + 1.

752
01:01:42,190 --> 01:01:48,230
>> Now that's just to account for the fact that we're starting from 0.

753
01:01:48,230 --> 01:01:50,850
So if 200 is going to be our largest number,

754
01:01:50,850 --> 01:01:54,720
then 0 to 200 is 201 things.

755
01:01:54,720 --> 01:02:01,540
So counts, it'll look like 00001 because we have one 4.

756
01:02:01,540 --> 01:02:10,210
Then we'll have 0001 where we'll have a 1 in the 8th index of count.

757
01:02:10,210 --> 01:02:14,560
We'll have a 2 in the 23rd index of count.

758
01:02:14,560 --> 01:02:17,630
We'll have a 2 in the 42nd index of count.

759
01:02:17,630 --> 01:02:21,670
So we can use count.

760
01:02:34,270 --> 01:02:44,920
So num_of_item = counts[i].

761
01:02:44,920 --> 01:02:52,540
And so if num_of_item is 2, that means we want to insert 2 of the number i

762
01:02:52,540 --> 01:02:55,290
into our sorted array.

763
01:02:55,290 --> 01:03:02,000
So we need to keep track of how far we are into the array.

764
01:03:02,000 --> 01:03:05,470
So index = 0.

765
01:03:05,470 --> 01:03:09,910
Array-- I'll just write it.

766
01:03:16,660 --> 01:03:18,020
Counts--

767
01:03:19,990 --> 01:03:28,580
array[index++] = i;

768
01:03:28,580 --> 01:03:32,490
Is that what I want? I think that's what I want.

769
01:03:35,100 --> 01:03:38,290
Yes, this looks good. Okay.

770
01:03:38,290 --> 01:03:43,050
So does everyone understand what the purpose of my counts array is?

771
01:03:43,050 --> 01:03:48,140
It is counting the number of occurrences of each of these numbers.

772
01:03:48,140 --> 01:03:51,780
Then I am iterating over that counts array,

773
01:03:51,780 --> 01:03:57,190
and the ith position in the counts array

774
01:03:57,190 --> 01:04:01,930
is the number of i's that should appear in my sorted array.

775
01:04:01,930 --> 01:04:06,840
That's why counts of 4 is going to be 1

776
01:04:06,840 --> 01:04:11,840
and counts of 8 is going to be 1, counts of 23 is going to be 2.

777
01:04:11,840 --> 01:04:16,900
So that's how many of them I want to insert into my sorted array.

778
01:04:16,900 --> 01:04:19,200
Then I just do that.

779
01:04:19,200 --> 01:04:28,960
I am inserting num_of_item i's into my sorted array.

780
01:04:28,960 --> 01:04:31,670
>> Questions?

781
01:04:32,460 --> 01:04:43,100
And so again, this is linear time since we are just iterating over this once,

782
01:04:43,100 --> 01:04:47,470
but it's also linear in what this number happens to be,

783
01:04:47,470 --> 01:04:50,730
and so it heavily depends on what your bound is.

784
01:04:50,730 --> 01:04:53,290
With a bound of 200, that's not that bad.

785
01:04:53,290 --> 01:04:58,330
If your bound is going to be 10,000, then that's a little worse,

786
01:04:58,330 --> 01:05:01,360
but if your bound is going to be 4 billion, that's completely unrealistic

787
01:05:01,360 --> 01:05:07,720
and this array is going to have to be of size 4 billion, which is unrealistic.

788
01:05:07,720 --> 01:05:10,860
So that's that. Questions?

789
01:05:10,860 --> 01:05:13,270
[inaudible student response] >>Okay.

790
01:05:13,270 --> 01:05:15,710
I realized one other thing when we were going through.

791
01:05:17,980 --> 01:05:23,720
I think the problem was in Lucas's and probably every single one we've seen.

792
01:05:23,720 --> 01:05:26,330
I completely forgot.

793
01:05:26,330 --> 01:05:31,040
The only thing I wanted to comment on is that when you're dealing with things like indices,

794
01:05:31,040 --> 01:05:38,320
you never really see this when you're writing a for loop,

795
01:05:38,320 --> 01:05:41,120
but technically, whenever you're dealing with these indices,

796
01:05:41,120 --> 01:05:45,950
you should pretty much always deal with unsigned integers.

797
01:05:45,950 --> 01:05:53,850
The reason for this is when you're dealing with signed integers,

798
01:05:53,850 --> 01:05:56,090
so if you have 2 signed integers and you add them together

799
01:05:56,090 --> 01:06:00,640
and they end up too big, then you end up with a negative number.

800
01:06:00,640 --> 01:06:03,410
So that's what integer overflow is.

801
01:06:03,410 --> 01:06:10,500
>> If I add 2 billion and 1 billion, I end up with negative 1 billion.

802
01:06:10,500 --> 01:06:15,480
That's how integers work on computers.

803
01:06:15,480 --> 01:06:17,510
So the problem with using--

804
01:06:17,510 --> 01:06:23,500
That's fine except if low happens to be 2 billion and up happens to be 1 billion,

805
01:06:23,500 --> 01:06:27,120
then this is going to be negative 1 billion and then we're going to divide that by 2

806
01:06:27,120 --> 01:06:29,730
and end up with negative 500 million.

807
01:06:29,730 --> 01:06:33,760
So this is only an issue if you happen to be searching through an array

808
01:06:33,760 --> 01:06:38,070
of billions of things.

809
01:06:38,070 --> 01:06:44,050
But if low + up happens to overflow, then that's a problem.

810
01:06:44,050 --> 01:06:47,750
As soon as we make them unsigned, then 2 billion plus 1 billion is 3 billion.

811
01:06:47,750 --> 01:06:51,960
3 billion divided by 2 is 1.5 billion.

812
01:06:51,960 --> 01:06:55,670
So as soon as they're unsigned, everything is perfect.

813
01:06:55,670 --> 01:06:59,900
And so that's also an issue when you're writing your for loops,

814
01:06:59,900 --> 01:07:03,940
and actually, it probably does it automatically.

815
01:07:09,130 --> 01:07:12,330
It will actually just yell at you.

816
01:07:12,330 --> 01:07:21,610
So if this number is too big to be in just an integer but it would fit in an unsigned integer,

817
01:07:21,610 --> 01:07:24,970
it will yell at you, so that's why you never really run into the issue.

818
01:07:29,150 --> 01:07:34,820
You can see that an index is never going to be negative,

819
01:07:34,820 --> 01:07:39,220
and so when you're iterating over an array,

820
01:07:39,220 --> 01:07:43,970
you can almost always say unsigned int i, but you don't really have to.

821
01:07:43,970 --> 01:07:47,110
Things are going to work pretty much just as well.

822
01:07:48,740 --> 01:07:50,090
Okay. [whispers] What time is it?

823
01:07:50,090 --> 01:07:54,020
The last thing I wanted to show--and I'll just do it really quick.

824
01:07:54,020 --> 01:08:03,190
You know how we have #define so we can #define MAX as 5 or something?

825
01:08:03,190 --> 01:08:05,940
Let's not do MAX. #define BOUND as 200. That's what we did before.

826
01:08:05,940 --> 01:08:10,380
That defines a constant, which is just going to be copied and pasted

827
01:08:10,380 --> 01:08:13,010
wherever we happen to write BOUND.

828
01:08:13,010 --> 01:08:18,189
>> So we can actually do more with #defines.

829
01:08:18,189 --> 01:08:21,170
We can #define functions.

830
01:08:21,170 --> 01:08:23,410
They're not really functions, but we'll call them functions.

831
01:08:23,410 --> 01:08:36,000
An example would be something like MAX(x, y) is defined as (x < y ? y : x). Okay.

832
01:08:36,000 --> 01:08:40,660
So you should get used to ternary operator syntax,

833
01:08:40,660 --> 01:08:49,029
but is x less than y? Return y, else return x.

834
01:08:49,029 --> 01:08:54,390
So you can see you can make this a separate function,

835
01:08:54,390 --> 01:09:01,399
and the function could be like bool MAX takes 2 arguments, return this.

836
01:09:01,399 --> 01:09:08,340
This is one of the most common ones I see done like this. We call them macros.

837
01:09:08,340 --> 01:09:11,790
This is a macro.

838
01:09:11,790 --> 01:09:15,859
This is just the syntax for it.

839
01:09:15,859 --> 01:09:18,740
You can write a macro to do whatever you want.

840
01:09:18,740 --> 01:09:22,649
You frequently see macros for debugging printfs and stuff.

841
01:09:22,649 --> 01:09:29,410
So a type of printf, there are special constants in C like underscore LINE underscore,

842
01:09:29,410 --> 01:09:31,710
2 underscores LINE underscore,

843
01:09:31,710 --> 01:09:37,550
and there's also I think 2 underscores FUNC. That might be it. Something like that.

844
01:09:37,550 --> 01:09:40,880
Those things will be replaced with the name of the function

845
01:09:40,880 --> 01:09:42,930
or the number of the line that you're on.

846
01:09:42,930 --> 01:09:48,630
Frequently, you write debugging printfs that down here I could then just write

847
01:09:48,630 --> 01:09:54,260
DEBUG and it will print the line number and function that I happen to be in

848
01:09:54,260 --> 01:09:57,020
that it encountered that DEBUG statement.

849
01:09:57,020 --> 01:09:59,550
And you can also print other things.

850
01:09:59,550 --> 01:10:05,990
So one thing you should watch out for is if I happen to #define DOUBLE_MAX

851
01:10:05,990 --> 01:10:11,380
as something like 2 * y and 2 * x.

852
01:10:11,380 --> 01:10:14,310
So for whatever reason, you happen to do that a lot.

853
01:10:14,310 --> 01:10:16,650
So make it a macro.

854
01:10:16,650 --> 01:10:18,680
This is actually broken.

855
01:10:18,680 --> 01:10:23,050
I would call it by doing something like DOUBLE_MAX(3, 6).

856
01:10:23,050 --> 01:10:27,530
So what should be returned?

857
01:10:28,840 --> 01:10:30,580
[student] 12.

858
01:10:30,580 --> 01:10:34,800
Yes, 12 should be returned, and 12 is returned.

859
01:10:34,800 --> 01:10:43,350
3 gets replaced for x, 6 gets replaced for y, and we return 2 * 6, which is 12.

860
01:10:43,350 --> 01:10:47,710
Now what about this? What should be returned?

861
01:10:47,710 --> 01:10:50,330
[student] 14. >>Ideally, 14.

862
01:10:50,330 --> 01:10:55,290
The issue is that how hash defines work, remember it's a literal copy and paste

863
01:10:55,290 --> 01:11:00,160
of pretty much everything, so what this is going to be interpreted as

864
01:11:00,160 --> 01:11:11,270
is 3 less than 1 plus 6, 2 times 1 plus 6, 2 times 3.

865
01:11:11,270 --> 01:11:19,780
>> So for this reason you almost always wrap everything in parentheses.

866
01:11:22,180 --> 01:11:25,050
Any variable you almost always wrap in parentheses.

867
01:11:25,050 --> 01:11:29,570
There are cases where you don't have to, like I know that I don't need to do that here

868
01:11:29,570 --> 01:11:32,110
because less than is pretty much always just going to work,

869
01:11:32,110 --> 01:11:34,330
although that might not even be true.

870
01:11:34,330 --> 01:11:41,870
If there is something ridiculous like DOUBLE_MAX(1 == 2),

871
01:11:41,870 --> 01:11:49,760
then that's going to get replaced with 3 less than 1 equals equals 2,

872
01:11:49,760 --> 01:11:53,460
and so then it's going to do 3 less than 1, does that equal 2,

873
01:11:53,460 --> 01:11:55,620
which is not what we want.

874
01:11:55,620 --> 01:12:00,730
So in order to prevent any operator precedence problems,

875
01:12:00,730 --> 01:12:02,870
always wrap in parentheses.

876
01:12:03,290 --> 01:12:07,700
Okay. And that's it, 5:30.

877
01:12:08,140 --> 01:12:12,470
If you have any questions on the pset, let us know.

878
01:12:12,470 --> 01:12:18,010
It should be fun, and the hacker edition also is much more realistic

879
01:12:18,010 --> 01:12:22,980
than the hacker edition of last year's, so we hope that a lot of you try it.

880
01:12:22,980 --> 01:12:26,460
Last year's was very overwhelming.

881
01:12:28,370 --> 01:12:30,000
>> [CS50.TV]