1
00:00:00,000 --> 00:00:02,640
[Seminar: Technical Interviews]

2
00:00:02,640 --> 00:00:04,630
[Kenny Yu, Harvard University]

3
00:00:04,630 --> 00:00:08,910
[This is CS50.] [CS50.TV]

4
00:00:08,910 --> 00:00:12,420
Hi everyone, I'm Kenny. I am currently a junior studying computer science. 

5
00:00:12,420 --> 00:00:17,310
I'm a former CS TF, and I wish I had this when I was an underclassman, 

6
00:00:17,310 --> 00:00:19,380
and that's why I'm giving this seminar.

7
00:00:19,380 --> 00:00:21,370
So I hope you enjoy it. 

8
00:00:21,370 --> 00:00:23,470
This seminar is about technical interviews, 

9
00:00:23,470 --> 00:00:26,650
and all my resources can be found at this link, 

10
00:00:26,650 --> 00:00:32,350
this link right here, a couple of resources.

11
00:00:32,350 --> 00:00:36,550
So I made a list of problems, actually, quite a few problems.

12
00:00:36,550 --> 00:00:40,800
Also a general resources page where we can find tips 

13
00:00:40,800 --> 00:00:42,870
on how to prepare for an interview, 

14
00:00:42,870 --> 00:00:46,470
tips on what you should do during an actual interview,

15
00:00:46,470 --> 00:00:51,910
as well as how to approach problems and resources for future reference. 

16
00:00:51,910 --> 00:00:53,980
It's all online. 

17
00:00:53,980 --> 00:00:58,290
And just to preface this seminar, a disclaimer, 

18
00:00:58,290 --> 00:01:00,690
like this should not--your interview preparation 

19
00:01:00,690 --> 00:01:02,800
should not be limited to this list. 

20
00:01:02,800 --> 00:01:04,750
This is only meant to be a guide, 

21
00:01:04,750 --> 00:01:08,890
and you should definitely take everything I say with a grain of salt, 

22
00:01:08,890 --> 00:01:14,620
but also use everything I used to help you in your interview preparation. 

23
00:01:14,620 --> 00:01:16,400
>> I'm going to speed through the next few slides

24
00:01:16,400 --> 00:01:18,650
so we can get to the actual case studies. 

25
00:01:18,650 --> 00:01:23,630
The structure of an interview for a software engineering postion, 

26
00:01:23,630 --> 00:01:26,320
typically it is 30 to 45 minutes, 

27
00:01:26,320 --> 00:01:29,810
multiple rounds, depending on the company. 

28
00:01:29,810 --> 00:01:33,090
Often you'll be coding on a white board. 

29
00:01:33,090 --> 00:01:35,960
So a white board like this, but often on a smaller scale. 

30
00:01:35,960 --> 00:01:38,540
If you're having a phone interview, you'll probably be using

31
00:01:38,540 --> 00:01:44,030
either collabedit or a Google Doc so they can see you live coding 

32
00:01:44,030 --> 00:01:46,500
while you're being interviewed over the phone. 

33
00:01:46,500 --> 00:01:48,490
An interview itself is typically 2 or 3 problems 

34
00:01:48,490 --> 00:01:50,620
testing your computer science knowledge. 

35
00:01:50,620 --> 00:01:54,490
And it will almost definitely involve coding. 

36
00:01:54,490 --> 00:01:59,540
The types of questions that you'll see are usually data structures and algorithms. 

37
00:01:59,540 --> 00:02:02,210
And in doing these kinds of problems, 

38
00:02:02,210 --> 00:02:07,830
they will ask you, like, what is the time and space complexity, big O?

39
00:02:07,830 --> 00:02:09,800
Often they also ask higher-level questions, 

40
00:02:09,800 --> 00:02:12,530
so, designing a system, 

41
00:02:12,530 --> 00:02:14,770
how would you lay out your code? 

42
00:02:14,770 --> 00:02:18,370
What interfaces, what classes, what modules do you have in your system, 

43
00:02:18,370 --> 00:02:20,900
and how do these interact together?

44
00:02:20,900 --> 00:02:26,130
So data structures and algorithms as well as designing systems. 

45
00:02:26,130 --> 00:02:29,180
>> Some general tips before we dive in to our case studies. 

46
00:02:29,180 --> 00:02:32,300
I think the most important rule is always be thinking out loud. 

47
00:02:32,300 --> 00:02:36,980
The interview is supposed to be your chance to show off your thinking process. 

48
00:02:36,980 --> 00:02:39,820
The point of the interview is for the interviewer to gauge 

49
00:02:39,820 --> 00:02:42,660
how you think and how you go through a problem. 

50
00:02:42,660 --> 00:02:45,210
The worst thing you can do is be silent throughout the whole interview. 

51
00:02:45,210 --> 00:02:50,090
That's just no good. 

52
00:02:50,090 --> 00:02:53,230
When you are given a question, you also want to make sure you understand the question. 

53
00:02:53,230 --> 00:02:55,350
So repeat the question back in your own words 

54
00:02:55,350 --> 00:02:58,920
and attempt to work thorough a few simple test cases 

55
00:02:58,920 --> 00:03:01,530
to make sure you understand the question. 

56
00:03:01,530 --> 00:03:05,510
Working through a few test cases will also give you an intuition on how to solve this problem. 

57
00:03:05,510 --> 00:03:11,210
You might even discover a few patterns to help you solve the problem. 

58
00:03:11,210 --> 00:03:14,500
Their big tip is to not get frustrated. 

59
00:03:14,500 --> 00:03:17,060
Don't get frustrated. 

60
00:03:17,060 --> 00:03:19,060
Interviews are challenging, but the worst thing you can do, 

61
00:03:19,060 --> 00:03:23,330
in addition to being silent, is to be visibly frustrated. 

62
00:03:23,330 --> 00:03:27,410
You do not want to give that impression to an interviewer. 

63
00:03:27,410 --> 00:03:33,960
One thing that you--so, many people, when they go into an interview, 

64
00:03:33,960 --> 00:03:37,150
they attempt to try to find the best solution first, 

65
00:03:37,150 --> 00:03:39,950
when really, there's usually a glaringly obvious solution. 

66
00:03:39,950 --> 00:03:43,500
It might be slow, it might be inefficient, but you should just state it, 

67
00:03:43,500 --> 00:03:46,210
just so you have a starting point from which to work better. 

68
00:03:46,210 --> 00:03:48,270
Also, pointing out the solution is slow, in terms of 

69
00:03:48,270 --> 00:03:52,160
big O time complexity or space complexity, 

70
00:03:52,160 --> 00:03:54,450
will demonstrate to the interviewer that you understand 

71
00:03:54,450 --> 00:03:57,510
these issues when writing code. 

72
00:03:57,510 --> 00:04:01,440
So don't be afraid to come up with the simplest algorithm first

73
00:04:01,440 --> 00:04:04,950
and then work better from there. 

74
00:04:04,950 --> 00:04:09,810
Any questions so far? Okay. 

75
00:04:09,810 --> 00:04:11,650
>> So let's dive into our first problem. 

76
00:04:11,650 --> 00:04:14,790
"Given an array of n integers, write a function that shuffles the array

77
00:04:14,790 --> 00:04:20,209
in place such that all permutations of the n integers are equally likely."

78
00:04:20,209 --> 00:04:23,470
And assume you have available a random integer generator

79
00:04:23,470 --> 00:04:30,980
that generates an integer in a range from 0 to i, half range. 

80
00:04:30,980 --> 00:04:32,970
Does everyone understand this question? 

81
00:04:32,970 --> 00:04:39,660
I give you an array of n integers, and I want you to shuffle it. 

82
00:04:39,660 --> 00:04:46,050
In my directory, I wrote a few programs to demonstrate what I mean. 

83
00:04:46,050 --> 00:04:48,910
I'm going to shuffle an array of 20 elements, 

84
00:04:48,910 --> 00:04:52,490
from -10 to +9, 

85
00:04:52,490 --> 00:04:57,050
and I want you to output a list like this. 

86
00:04:57,050 --> 00:05:02,890
So this is my sorted input array, and I want you to shuffle it. 

87
00:05:02,890 --> 00:05:07,070
We'll do it again. 

88
00:05:07,070 --> 00:05:13,780
Does everyone understand the question? Okay. 

89
00:05:13,780 --> 00:05:16,730
So it's up to you. 

90
00:05:16,730 --> 00:05:21,220
What are some ideas? Can you do it as n^2, n log n, n?

91
00:05:21,220 --> 00:05:34,400
Open to suggestions. 

92
00:05:34,400 --> 00:05:37,730
Okay. So one idea, suggested by Emmy, 

93
00:05:37,730 --> 00:05:45,300
is first compute a random number, random integer, in a range from 0 to 20. 

94
00:05:45,300 --> 00:05:49,840
So assume our array has a length of 20. 

95
00:05:49,840 --> 00:05:54,800
In our diagram of 20 elements, 

96
00:05:54,800 --> 00:05:58,560
this is our input array. 

97
00:05:58,560 --> 00:06:04,590
And now, her suggestion is to create a new array, 

98
00:06:04,590 --> 00:06:08,440
so this will be the output array. 

99
00:06:08,440 --> 00:06:12,880
And based on the i returned by rand--

100
00:06:12,880 --> 00:06:17,580
so if i was, let's say, 17, 

101
00:06:17,580 --> 00:06:25,640
copy the 17th element into the first position. 

102
00:06:25,640 --> 00:06:30,300
Now we need to delete--we need to shift all the elements here 

103
00:06:30,300 --> 00:06:36,920
over so that we have a gap at the end and no holes in the middle. 

104
00:06:36,920 --> 00:06:39,860
And now we repeat the process. 

105
00:06:39,860 --> 00:06:46,360
Now we pick a new random integer between 0 and 19. 

106
00:06:46,360 --> 00:06:52,510
We have a new i here, and we copy this element into this position. 

107
00:06:52,510 --> 00:07:00,960
Then we shift items over and we repeat the process until we have our full new array. 

108
00:07:00,960 --> 00:07:05,890
What is the run time of this algorithm? 

109
00:07:05,890 --> 00:07:08,110
Well, let's consider the impact of this. 

110
00:07:08,110 --> 00:07:10,380
We are shifting every element. 

111
00:07:10,380 --> 00:07:16,800
When we remove this i, we are shifting all the elements after it to the left. 

112
00:07:16,800 --> 00:07:21,600
And that is an O(n) cost

113
00:07:21,600 --> 00:07:26,100
because what if we remove the first element?

114
00:07:26,100 --> 00:07:29,670
So for each removal, we remove--

115
00:07:29,670 --> 00:07:32,170
each removal incurs an O(n) operation, 

116
00:07:32,170 --> 00:07:41,520
and since we have n removals, this leads to an O(n^2) shuffle. 

117
00:07:41,520 --> 00:07:49,550
Okay. So good start. Good start. 

118
00:07:49,550 --> 00:07:55,290
>> Another suggestion is to use something known as the Knuth shuffle, 

119
00:07:55,290 --> 00:07:57,540
or the Fisher-Yates shuffle. 

120
00:07:57,540 --> 00:07:59,630
And it's actually a linear time shuffle. 

121
00:07:59,630 --> 00:08:02,200
And the idea is very similar.

122
00:08:02,200 --> 00:08:05,160
Again, we have our input array, 

123
00:08:05,160 --> 00:08:08,580
but instead of using two arrays for our input/output, 

124
00:08:08,580 --> 00:08:13,590
we use the first portion of the array to keep track of our shuffled portion, 

125
00:08:13,590 --> 00:08:18,400
and we keep track, and then we leave the rest of our array for the unshuffled portion. 

126
00:08:18,400 --> 00:08:24,330
So here's what I mean. We start off with--we choose an i, 

127
00:08:24,330 --> 00:08:30,910
an array from 0 to 20. 

128
00:08:30,910 --> 00:08:36,150
Our current pointer is pointing to the first index.

129
00:08:36,150 --> 00:08:39,590
We choose some i here

130
00:08:39,590 --> 00:08:42,740
and now we swap. 

131
00:08:42,740 --> 00:08:47,690
So if this was 5 and this was 4, 

132
00:08:47,690 --> 00:08:57,150
the resulting array will have 5 here and 4 here. 

133
00:08:57,150 --> 00:09:00,390
And now we note a marker here. 

134
00:09:00,390 --> 00:09:05,770
Everything to the left is shuffled, 

135
00:09:05,770 --> 00:09:15,160
and everything to the right is unshuffled. 

136
00:09:15,160 --> 00:09:17,260
And now we can repeat the process. 

137
00:09:17,260 --> 00:09:25,210
We choose a random index between 1 and 20 now. 

138
00:09:25,210 --> 00:09:30,650
So suppose our new i is here. 

139
00:09:30,650 --> 00:09:39,370
Now we swap this i with our current new position here. 

140
00:09:39,370 --> 00:09:44,790
So we do a swapping back and forth like this. 

141
00:09:44,790 --> 00:09:51,630
Let me bring up the code to make it more concrete. 

142
00:09:51,630 --> 00:09:55,290
We start with our choice of i--

143
00:09:55,290 --> 00:09:58,370
we start with i equal to 0, we pick a random location j 

144
00:09:58,370 --> 00:10:02,420
in the unshuffled portion of the array, i to n-1. 

145
00:10:02,420 --> 00:10:07,280
So if I'm here, choose a random index between here and the rest of the array, 

146
00:10:07,280 --> 00:10:12,410
and we swap. 

147
00:10:12,410 --> 00:10:17,550
This is all the code necessary to shuffle your array. 

148
00:10:17,550 --> 00:10:21,670
Any questions? 

149
00:10:21,670 --> 00:10:25,530
>> Well, one needed question is, why is this correct? 

150
00:10:25,530 --> 00:10:28,360
Why is every permutation equally likely? 

151
00:10:28,360 --> 00:10:30,410
And I won't go through the proof of this, 

152
00:10:30,410 --> 00:10:35,970
but many problems in computer science can be proven through induction. 

153
00:10:35,970 --> 00:10:38,520
How many of you are familiar with induction? 

154
00:10:38,520 --> 00:10:40,590
Okay. Cool. 

155
00:10:40,590 --> 00:10:43,610
So you can prove the correctness of this algorithm by simple induction, 

156
00:10:43,610 --> 00:10:49,540
where your induction hypothesis would be, assume that

157
00:10:49,540 --> 00:10:53,290
my shuffle returns every permutation equally likely 

158
00:10:53,290 --> 00:10:56,120
up to the first i elements. 

159
00:10:56,120 --> 00:10:58,300
Now, consider i + 1. 

160
00:10:58,300 --> 00:11:02,550
And by the way we choose our index j to swap, 

161
00:11:02,550 --> 00:11:05,230
this leads to--and then you work out the details, 

162
00:11:05,230 --> 00:11:07,390
at least a full proof of why this algorithm returns

163
00:11:07,390 --> 00:11:12,800
every permutation with equally likely probability. 

164
00:11:12,800 --> 00:11:15,940
>> All right, next problem. 

165
00:11:15,940 --> 00:11:19,170
So "given an array of integers, postive, zero, negative, 

166
00:11:19,170 --> 00:11:21,290
write a function that calculates the maximum sum

167
00:11:21,290 --> 00:11:24,720
of any continueous subarray of the input array." 

168
00:11:24,720 --> 00:11:28,370
An example here is, in the case where all numbers are positive, 

169
00:11:28,370 --> 00:11:31,320
then currently the best choice is to take the whole array. 

170
00:11:31,320 --> 00:11:34,690
1, 2, 3, 4, equals 10. 

171
00:11:34,690 --> 00:11:36,780
When you have some negatives in there, 

172
00:11:36,780 --> 00:11:38,690
in this case we just want the first two

173
00:11:38,690 --> 00:11:44,590
because choosing -1 and/or -3 will bring our sum down. 

174
00:11:44,590 --> 00:11:48,120
Sometimes we might have to start in the middle of the array. 

175
00:11:48,120 --> 00:11:53,500
Sometimes we want to choose nothing at all; it's best to not take anything. 

176
00:11:53,500 --> 00:11:56,490
And sometimes it's better to take the fall, 

177
00:11:56,490 --> 00:12:07,510
because the thing after it is super big. So any ideas? 

178
00:12:07,510 --> 00:12:10,970
(student, unintelligible) >>Yeah. 

179
00:12:10,970 --> 00:12:13,560
Suppose I don't take -1. 

180
00:12:13,560 --> 00:12:16,170
Then either I choose 1,000 and 20,000, 

181
00:12:16,170 --> 00:12:18,630
or I just choose the 3 billion. 

182
00:12:18,630 --> 00:12:20,760
Well, the best choice is to take all the numbers. 

183
00:12:20,760 --> 00:12:24,350
This -1, despite being negative, 

184
00:12:24,350 --> 00:12:31,340
the whole sum is better than were I not to take -1. 

185
00:12:31,340 --> 00:12:36,460
So one of the tips I mentioned earlier was to state the clearly obvious

186
00:12:36,460 --> 00:12:40,540
and the brute force solution first.

187
00:12:40,540 --> 00:12:44,340
What is the brute force solution in this problem? Yeah? 

188
00:12:44,340 --> 00:12:46,890
[Jane] Well, I think the brute force solution 

189
00:12:46,890 --> 00:12:52,600
would be to add up all the possible combinations (unintelligible). 

190
00:12:52,600 --> 00:12:58,250
[Yu] Okay. So Jane's idea is to take every possible--

191
00:12:58,250 --> 00:13:01,470
I'm paraphrasing--is to take every possible continuous subarray, 

192
00:13:01,470 --> 00:13:07,840
compute its sum, and then take the maximum of all the possible continuous subarrays. 

193
00:13:07,840 --> 00:13:13,310
What uniquely identifies a subarray in my input array?

194
00:13:13,310 --> 00:13:17,380
Like, what two things do I need? Yeah? 

195
00:13:17,380 --> 00:13:19,970
(student, unintelligible) >>Right. 

196
00:13:19,970 --> 00:13:22,130
A lower bound on the index and an upper bound index 

197
00:13:22,130 --> 00:13:28,300
uniquely determines a continuous subarray. 

198
00:13:28,300 --> 00:13:31,400
[Female student] Are we estimating it's an array of unique numbers? 

199
00:13:31,400 --> 00:13:34,280
[Yu] No. So her question is, are we assuming our array--

200
00:13:34,280 --> 00:13:39,000
is our array all unique numbers, and the answer is no. 

201
00:13:39,000 --> 00:13:43,390
>> If we use our brute force solution, then the start/end indices

202
00:13:43,390 --> 00:13:47,200
uniquely determines our continuous subarray. 

203
00:13:47,200 --> 00:13:51,680
So if we iterate for all possible start entries, 

204
00:13:51,680 --> 00:13:58,320
and for all end entries > or = to start, and < n, 

205
00:13:58,320 --> 00:14:05,570
you compute the sum, and then we take the maximum sum we've seen so far. 

206
00:14:05,570 --> 00:14:07,880
Is this clear? 

207
00:14:07,880 --> 00:14:12,230
What is the big O of this solution? 

208
00:14:12,230 --> 00:14:16,660
Timewise. 

209
00:14:16,660 --> 00:14:18,860
Not quite n^2. 

210
00:14:18,860 --> 00:14:25,250
Note that we iterate from 0 to n, 

211
00:14:25,250 --> 00:14:27,520
so that's one for loop. 

212
00:14:27,520 --> 00:14:35,120
We iterate again from almost the beginning to the end, another for loop. 

213
00:14:35,120 --> 00:14:37,640
And now, within that, we have to sum up every entry, 

214
00:14:37,640 --> 00:14:43,810
so that's another for loop. So we have three nested for loops, n^3. 

215
00:14:43,810 --> 00:14:46,560
Okay. This goes as a starting point. 

216
00:14:46,560 --> 00:14:53,180
Our solution is no worse than n^3. 

217
00:14:53,180 --> 00:14:55,480
Does everyone understand the brute force solution? 

218
00:14:55,480 --> 00:14:59,950
>> Okay. A better solution is using an idea called dynamic programming. 

219
00:14:59,950 --> 00:15:03,040
If you take CS124, which is Algorithms and Data Structures, 

220
00:15:03,040 --> 00:15:05,680
you will become very familiar with this technique. 

221
00:15:05,680 --> 00:15:12,190
And the idea is, try to build up solutions to smaller problems first. 

222
00:15:12,190 --> 00:15:17,990
What I mean by this is, we currently have to worry about two things: start and end. 

223
00:15:17,990 --> 00:15:29,340
And that's annoying. What if we could get rid of one of those parameters? 

224
00:15:29,340 --> 00:15:32,650
One idea is to--we're given our original problem, 

225
00:15:32,650 --> 00:15:37,470
find the maximum sum of any subarray in a range [O,n-1]. 

226
00:15:37,470 --> 00:15:47,400
And now we have a new subproblem, where we know, in our current index i, 

227
00:15:47,400 --> 00:15:52,520
we know we must conclude there. Our subarray must end at the current index. 

228
00:15:52,520 --> 00:15:57,640
So the remaining problem is, where should we start our subarray? 

229
00:15:57,640 --> 00:16:05,160
Does this make sense? Okay. 

230
00:16:05,160 --> 00:16:12,030
So I've coded this up, and let's look at what this means. 

231
00:16:12,030 --> 00:16:16,230
In the codirectory, there's a program called subarray, 

232
00:16:16,230 --> 00:16:19,470
and it takes number of items, 

233
00:16:19,470 --> 00:16:25,550
and it returns the maximal subarray sum in my shuffled list. 

234
00:16:25,550 --> 00:16:29,920
So in this case, our maximal subarray is 3. 

235
00:16:29,920 --> 00:16:34,850
And that's taken by just using 2 and 1. 

236
00:16:34,850 --> 00:16:38,050
Let's run it again. It's also 3. 

237
00:16:38,050 --> 00:16:40,950
But this time, note how we got the 3. 

238
00:16:40,950 --> 00:16:46,690
We took the--we just take the 3 itself

239
00:16:46,690 --> 00:16:49,980
because it's surrounded by negatives on both sides, 

240
00:16:49,980 --> 00:16:55,080
which will bring a sum < 3. 

241
00:16:55,080 --> 00:16:57,820
Let's run on 10 items. 

242
00:16:57,820 --> 00:17:03,200
This time it's 7, we take the leading 3 and 4.

243
00:17:03,200 --> 00:17:09,450
This time it's 8, and we obtain that by taking 1, 4 and 3. 

244
00:17:09,450 --> 00:17:16,310
So to give you an intuition on how we can solve this transformed problem, 

245
00:17:16,310 --> 00:17:18,890
let's take a look at this subarray. 

246
00:17:18,890 --> 00:17:23,460
We're given this input array, and we know the answer is 8. 

247
00:17:23,460 --> 00:17:26,359
We take the 1, 4, and 3. 

248
00:17:26,359 --> 00:17:29,090
But let's look at how we actually got that answer. 

249
00:17:29,090 --> 00:17:34,160
Let's look at the maximal subarray that ended at each of these indices. 

250
00:17:34,160 --> 00:17:40,780
What's the maximal subarray that has to end at the first position? 

251
00:17:40,780 --> 00:17:46,310
[Student] Zero. >>Zero. Just don't take the -5. 

252
00:17:46,310 --> 00:17:50,210
Here it's going to be 0 as well. Yeah? 

253
00:17:50,210 --> 00:17:56,470
(student, unintelligible)

254
00:17:56,470 --> 00:17:58,960
[Yu] Oh, sorry, it is a -3. 

255
00:17:58,960 --> 00:18:03,220
So this is a 2, this is a -3. 

256
00:18:03,220 --> 00:18:08,690
Okay. So -4, what's the maximal subarray to end that position 

257
00:18:08,690 --> 00:18:12,910
where -4 is at? Zero. 

258
00:18:12,910 --> 00:18:22,570
One? 1, 5, 8.  

259
00:18:22,570 --> 00:18:28,060
Now, I must end at the location where -2 is at. 

260
00:18:28,060 --> 00:18:39,330
So 6, 5, 7, and the last one is 4. 

261
00:18:39,330 --> 00:18:43,480
Knowing that these are my entries

262
00:18:43,480 --> 00:18:48,130
for the transformed problem where I must end at each of these indices, 

263
00:18:48,130 --> 00:18:51,410
then my final answer is just, take a sweep across, 

264
00:18:51,410 --> 00:18:53,580
and take the maximum number. 

265
00:18:53,580 --> 00:18:55,620
So in this case it's 8. 

266
00:18:55,620 --> 00:19:00,010
This implies that the maximal subarray ends at this index, 

267
00:19:00,010 --> 00:19:04,970
and started somewhere before it. 

268
00:19:04,970 --> 00:19:09,630
Does everyone understand this transformed subarray?

269
00:19:09,630 --> 00:19:22,160
>> Okay. Well, let's figure out the recurrence for this. 

270
00:19:22,160 --> 00:19:27,990
Let's consider just the first few entries. 

271
00:19:27,990 --> 00:19:35,930
So here it was 0, 0, 0, 1, 5, 8. 

272
00:19:35,930 --> 00:19:39,790
And then there was a -2 here, and that brought it down to 6. 

273
00:19:39,790 --> 00:19:50,800
So if I call the entry at position i subproblem(i), 

274
00:19:50,800 --> 00:19:54,910
how can I use the answer to a previous subproblem

275
00:19:54,910 --> 00:19:59,360
to answer this subproblem? 

276
00:19:59,360 --> 00:20:04,550
If I look at, let's say, this entry. 

277
00:20:04,550 --> 00:20:09,190
How can I compute the answer 6 by looking at 

278
00:20:09,190 --> 00:20:18,780
a combination of this array and the answers to previous subproblems in this array? Yes?

279
00:20:18,780 --> 00:20:22,800
[Female student] You take the array of sums

280
00:20:22,800 --> 00:20:25,430
in the position right before it, so the 8, 

281
00:20:25,430 --> 00:20:32,170
and then you add the current subproblem.

282
00:20:32,170 --> 00:20:36,460
[Yu] So her suggestion is to look at these two numbers, 

283
00:20:36,460 --> 00:20:40,090
this number and this number. 

284
00:20:40,090 --> 00:20:50,130
So this 8 refers to the answer for the subproblem (i - 1). 

285
00:20:50,130 --> 00:20:57,300
And let's call my input array A. 

286
00:20:57,300 --> 00:21:01,470
In order to find a maximal subarray that ends at position i, 

287
00:21:01,470 --> 00:21:03,980
I have two choices: I can either continue the subarray 

288
00:21:03,980 --> 00:21:09,790
that ended at the previous index, or start a new array. 

289
00:21:09,790 --> 00:21:14,190
If I were to continue the subarray that started at the previous index, 

290
00:21:14,190 --> 00:21:19,260
then the maximum sum I can achieve is the answer to the previous subproblem

291
00:21:19,260 --> 00:21:24,120
plus the current array entry. 

292
00:21:24,120 --> 00:21:27,550
But, I also have the choice of starting a new subarray, 

293
00:21:27,550 --> 00:21:30,830
in which case the sum is 0. 

294
00:21:30,830 --> 00:21:42,860
So the answer is max of 0, subproblem i - 1, plus the current array entry. 

295
00:21:42,860 --> 00:21:46,150
Does this recurrence make sense? 

296
00:21:46,150 --> 00:21:50,840
Our recurrence, as we just discovered, is subproblem i 

297
00:21:50,840 --> 00:21:54,740
is equal to the maximum of the previous subproblem plus my current array entry, 

298
00:21:54,740 --> 00:22:01,490
which means continue the previous subarray,

299
00:22:01,490 --> 00:22:07,250
or 0, start a new subarray at my current index. 

300
00:22:07,250 --> 00:22:10,060
And once we have built up this table of solutions, then our final answer, 

301
00:22:10,060 --> 00:22:13,950
just do a linear sweep across the subproblem array

302
00:22:13,950 --> 00:22:19,890
and take the maximum number. 

303
00:22:19,890 --> 00:22:23,330
This is an exact implementation of what I just said. 

304
00:22:23,330 --> 00:22:27,320
So we create a new subproblem array, subproblems. 

305
00:22:27,320 --> 00:22:32,330
The first entry is either 0 or the first entry, the maximum of those two. 

306
00:22:32,330 --> 00:22:35,670
And for the rest of the subproblems 

307
00:22:35,670 --> 00:22:39,810
we use the exact recurrence we just discovered. 

308
00:22:39,810 --> 00:22:49,960
Now we compute the maximum of our subproblems array, and that's our final answer. 

309
00:22:49,960 --> 00:22:54,130
>> So how much space are we using in this algorithm?

310
00:22:54,130 --> 00:23:01,470
If you've only taken CS50, then you might not have discussed space very much. 

311
00:23:01,470 --> 00:23:07,750
Well, one thing to note is that I called malloc here with size n.  

312
00:23:07,750 --> 00:23:13,590
What does that suggest to you?

313
00:23:13,590 --> 00:23:17,450
This algorithm uses linear space. 

314
00:23:17,450 --> 00:23:21,030
Can we do better?

315
00:23:21,030 --> 00:23:30,780
Is there anything that you notice that is unnecessary to compute the final answer? 

316
00:23:30,780 --> 00:23:33,290
I guess a better question is, what information

317
00:23:33,290 --> 00:23:40,680
do we not need to carry all the way through to the end?

318
00:23:40,680 --> 00:23:44,280
Now, if we look at these two lines, 

319
00:23:44,280 --> 00:23:47,720
we only care about the previous subproblem, 

320
00:23:47,720 --> 00:23:50,910
and we only care about the maximum we've ever seen so far. 

321
00:23:50,910 --> 00:23:53,610
To compute our final answer, we don't need the entire array. 

322
00:23:53,610 --> 00:23:57,450
We only need the last number, last two numbers. 

323
00:23:57,450 --> 00:24:02,630
Last number for the subproblem array, and last number for the maximum. 

324
00:24:02,630 --> 00:24:07,380
So, in fact, we can fuse these loops together

325
00:24:07,380 --> 00:24:10,460
and go from linear space to constant space. 

326
00:24:10,460 --> 00:24:15,830
Current sum so far, here, replaces the role of subproblem, our subproblem array. 

327
00:24:15,830 --> 00:24:20,020
So current sum, so far, is the answer to the previous subproblem. 

328
00:24:20,020 --> 00:24:23,450
And that sum, so far, takes the place of our max. 

329
00:24:23,450 --> 00:24:28,100
We compute the maximum as we go along. 

330
00:24:28,100 --> 00:24:30,890
And so we go from linear space to constant space, 

331
00:24:30,890 --> 00:24:36,650
and we also have a linear solution to our subarray problem. 

332
00:24:36,650 --> 00:24:40,740
>> These kinds of questions you will get during an interview. 

333
00:24:40,740 --> 00:24:44,450
What is the time complexity; what is the space complexity? 

334
00:24:44,450 --> 00:24:50,600
Can you do better? Are there things that are unnecessary to keep around? 

335
00:24:50,600 --> 00:24:55,270
I did this to highlight analyses that you should take on your own 

336
00:24:55,270 --> 00:24:57,400
as you're working through these problems. 

337
00:24:57,400 --> 00:25:01,710
Always be asking yourself, "Can I do better?" 

338
00:25:01,710 --> 00:25:07,800
In fact, can we do better than this? 

339
00:25:07,800 --> 00:25:10,730
Sort of a trick question. You can't, because you need to 

340
00:25:10,730 --> 00:25:13,590
at least read the input to the problem. 

341
00:25:13,590 --> 00:25:15,570
So the fact that you need to at least read the input to the problem 

342
00:25:15,570 --> 00:25:19,580
means that you can't do better than linear time, 

343
00:25:19,580 --> 00:25:22,870
and you can't do better than constant space. 

344
00:25:22,870 --> 00:25:27,060
So this is, in fact, the best solution to this problem. 

345
00:25:27,060 --> 00:25:33,040
Questions? Okay. 

346
00:25:33,040 --> 00:25:35,190
>> Stock market problem: 

347
00:25:35,190 --> 00:25:38,350
"Given an array of n integers, positive, zero, or negative, 

348
00:25:38,350 --> 00:25:41,680
that represent the price of a stock over n days, 

349
00:25:41,680 --> 00:25:44,080
write a function to compute the maximum profit you can make

350
00:25:44,080 --> 00:25:49,350
given that you buy and sell exactly 1 stock within these n days."

351
00:25:49,350 --> 00:25:51,690
Essentially, we want to buy low, sell high. 

352
00:25:51,690 --> 00:25:58,580
And we want to figure out the best profit we can make. 

353
00:25:58,580 --> 00:26:11,500
Going back to my tip, what is the immediately clear, simplest answer, but it's slow?

354
00:26:11,500 --> 00:26:17,690
Yes? (student, unintelligible) >>Yes. 

355
00:26:17,690 --> 00:26:21,470
>>So you would just go though and look at the stock prices 

356
00:26:21,470 --> 00:26:30,550
at each point in time, (unintelligible).

357
00:26:30,550 --> 00:26:33,990
[Yu] Okay, so her solution--her suggestion of computing 

358
00:26:33,990 --> 00:26:37,380
the lowest and computing the highest doesn't necessarily work

359
00:26:37,380 --> 00:26:42,470
because the highest might occur before the lowest. 

360
00:26:42,470 --> 00:26:47,340
So what is the brute force solution to this problem? 

361
00:26:47,340 --> 00:26:53,150
What are the two things that I need to uniquely determine the profit I make? Right. 

362
00:26:53,150 --> 00:26:59,410
The brute force solution is--oh, so, George's suggestion is we only need two days

363
00:26:59,410 --> 00:27:02,880
to uniquely determine the profit of those two days. 

364
00:27:02,880 --> 00:27:06,660
So we compute every pair, like buy/sell, 

365
00:27:06,660 --> 00:27:12,850
compute the profit, which could be negative or positive or zero. 

366
00:27:12,850 --> 00:27:18,000
Compute the maximum profit that we make after iterating over all pairs of days. 

367
00:27:18,000 --> 00:27:20,330
That will be our final answer. 

368
00:27:20,330 --> 00:27:25,730
And that solution will be O(n^2), because there is n choose two pairs--

369
00:27:25,730 --> 00:27:30,270
of days that you can choose among end days. 

370
00:27:30,270 --> 00:27:32,580
Okay, so I'm not going to go over the brute force solution here. 

371
00:27:32,580 --> 00:27:37,420
I'm going to tell you that there's an n log n solution. 

372
00:27:37,420 --> 00:27:45,550
What algorithm do you currently know that is n log n?

373
00:27:45,550 --> 00:27:50,730
It's not a trick question. 

374
00:27:50,730 --> 00:27:54,790
>> Merge sort. Merge sort is n log n, 

375
00:27:54,790 --> 00:27:57,760
and in fact, one way of solving this problem is to use

376
00:27:57,760 --> 00:28:04,400
a merge sort kind of idea called, in general, divide and conquer. 

377
00:28:04,400 --> 00:28:07,570
And the idea is as follows. 

378
00:28:07,570 --> 00:28:12,400
You want to compute the best buy/sell pair in the left half. 

379
00:28:12,400 --> 00:28:16,480
Find the best profit you can make, just with the first n over two days. 

380
00:28:16,480 --> 00:28:19,780
Then you want to oompute the best buy/sell pair on the right half, 

381
00:28:19,780 --> 00:28:23,930
so the last n over two days. 

382
00:28:23,930 --> 00:28:32,400
And now the question is, how do we merge these solutions back together? 

383
00:28:32,400 --> 00:28:36,320
Yes? (student, unintelligible)

384
00:28:36,320 --> 00:28:49,890
>>Okay. So let me draw a picture. 

385
00:28:49,890 --> 00:29:03,870
Yes? (George, unintelligible)

386
00:29:03,870 --> 00:29:06,450
>>Exactly. George's solution is exactly right. 

387
00:29:06,450 --> 00:29:10,040
So his suggestion is, first compute the best buy/sell pair, 

388
00:29:10,040 --> 00:29:16,050
and that occurs in the left half, so let's call that left, left. 

389
00:29:16,050 --> 00:29:20,790
Best buy/sell pair that occurs in the right half. 

390
00:29:20,790 --> 00:29:25,180
But if we only compared these two numbers, we're missing the case

391
00:29:25,180 --> 00:29:30,460
where we buy here and sell somewhere in the right half. 

392
00:29:30,460 --> 00:29:33,810
We buy in the left half, sell in the right half. 

393
00:29:33,810 --> 00:29:38,490
And the best way to compute the best buy/sell pair that spans both halves

394
00:29:38,490 --> 00:29:43,480
is to compute the minimum here and compute the maximum here

395
00:29:43,480 --> 00:29:45,580
and take their difference. 

396
00:29:45,580 --> 00:29:50,850
So the two cases where the buy/sell pair occurs only here, 

397
00:29:50,850 --> 00:30:01,910
only here, or on both halves is defined by these three numbers. 

398
00:30:01,910 --> 00:30:06,450
So our algorithm to merge our solutions back together, 

399
00:30:06,450 --> 00:30:08,350
we want to compute the best buy/sell pair 

400
00:30:08,350 --> 00:30:13,120
where we buy on the left half and sell on the right half. 

401
00:30:13,120 --> 00:30:16,740
And the best way to do that is to compute the lowest price in the first half, 

402
00:30:16,740 --> 00:30:20,360
the highest price in the right half, and take their difference. 

403
00:30:20,360 --> 00:30:25,390
The resulting three profits, these three numbers, you take the maximum of the three, 

404
00:30:25,390 --> 00:30:32,720
and that's the best profit you can make over these first and end days. 

405
00:30:32,720 --> 00:30:36,940
Here the important lines are in red. 

406
00:30:36,940 --> 00:30:41,160
This is a recursive call to compute the answer in the left half. 

407
00:30:41,160 --> 00:30:44,760
This is a recursive call to compute the answer in the right half. 

408
00:30:44,760 --> 00:30:50,720
These two for loops compute the min and the max on the left and right half, respectively.            

409
00:30:50,720 --> 00:30:54,970
Now I compute the profit that spans both halves, 

410
00:30:54,970 --> 00:31:00,530
and the final answer is the maximum of these three. 

411
00:31:00,530 --> 00:31:04,120
Okay. 

412
00:31:04,120 --> 00:31:06,420
>> So, sure, we have an algorithm, but the bigger question is, 

413
00:31:06,420 --> 00:31:08,290
what is the time complexity of this? 

414
00:31:08,290 --> 00:31:16,190
And the reason why I mentioned merge sort is that this form of divide the answer

415
00:31:16,190 --> 00:31:19,200
into two and then merging our solutions back together

416
00:31:19,200 --> 00:31:23,580
is exactly the form of merge sort. 

417
00:31:23,580 --> 00:31:33,360
So let me go through the duration.

418
00:31:33,360 --> 00:31:41,340
If we defined a function t(n) to be the number of steps 

419
00:31:41,340 --> 00:31:50,010
for n days, 

420
00:31:50,010 --> 00:31:54,350
our two recursive calls 

421
00:31:54,350 --> 00:32:00,460
are each going to cost t(n/2), 

422
00:32:00,460 --> 00:32:03,540
and there's two of these calls. 

423
00:32:03,540 --> 00:32:10,020
Now I need to compute the minimum of the left half, 

424
00:32:10,020 --> 00:32:17,050
which I can do in n/2 time, plus the maximum of the right half. 

425
00:32:17,050 --> 00:32:20,820
So this is just n. 

426
00:32:20,820 --> 00:32:25,050
And then plus some constant work. 

427
00:32:25,050 --> 00:32:27,770
And this recurrence equation 

428
00:32:27,770 --> 00:32:35,560
is exactly the recurrence equation for merge sort. 

429
00:32:35,560 --> 00:32:39,170
And we all know that merge sort is n log n time. 

430
00:32:39,170 --> 00:32:46,880
Therefore, our algorithm is also n log n time. 

431
00:32:46,880 --> 00:32:52,220
Does this iteration make sense? 

432
00:32:52,220 --> 00:32:55,780
Just a brief recap of this: 

433
00:32:55,780 --> 00:32:59,170
T(n) is the number of steps to compute the maximum profit

434
00:32:59,170 --> 00:33:02,750
over the course of n days. 

435
00:33:02,750 --> 00:33:06,010
The way we split up our recursive calls 

436
00:33:06,010 --> 00:33:11,980
is by calling our solution on the first n/2 days, 

437
00:33:11,980 --> 00:33:14,490
so that's one call, 

438
00:33:14,490 --> 00:33:16,940
and then we call again on the second half. 

439
00:33:16,940 --> 00:33:20,440
So that's two calls. 

440
00:33:20,440 --> 00:33:25,310
And then we find a minimum on the left half, which we can do in linear time,

441
00:33:25,310 --> 00:33:29,010
find the maximum of the right half, which we can do in linear time. 

442
00:33:29,010 --> 00:33:31,570
So n/2 + n/2 is just n. 

443
00:33:31,570 --> 00:33:36,020
Then we have some constant work, which is like doing arithmetic. 

444
00:33:36,020 --> 00:33:39,860
This recurrence equation is exactly the recurrence equation for merge sort. 

445
00:33:39,860 --> 00:33:55,490
Hence, our shuffle algorithm is also n log n. 

446
00:33:55,490 --> 00:33:58,520
So how much space are we using? 

447
00:33:58,520 --> 00:34:04,910
Let's go back to the code. 

448
00:34:04,910 --> 00:34:09,420
>> A better question is, how many stack frames do we ever have at any given moment? 

449
00:34:09,420 --> 00:34:11,449
Since we're using recursion, 

450
00:34:11,449 --> 00:34:23,530
the number of stack frames determines our space usage. 

451
00:34:23,530 --> 00:34:29,440
Let's consider n = 8. 

452
00:34:29,440 --> 00:34:36,889
We call shuffle on 8, 

453
00:34:36,889 --> 00:34:41,580
which will call shuffle on the first four entries, 

454
00:34:41,580 --> 00:34:46,250
which will call a shuffle on the first two entries. 

455
00:34:46,250 --> 00:34:51,550
So our stack is--this is our stack. 

456
00:34:51,550 --> 00:34:54,980
And then we call shuffle again on 1, 

457
00:34:54,980 --> 00:34:58,070
and that's what our base case is, so we return immediately. 

458
00:34:58,070 --> 00:35:04,700
Do we ever have more than this many stack frames? 

459
00:35:04,700 --> 00:35:08,880
No. Because each time we do an invocation, 

460
00:35:08,880 --> 00:35:10,770
a recursive invocation to shuffle, 

461
00:35:10,770 --> 00:35:13,950
we divide our size in half. 

462
00:35:13,950 --> 00:35:17,020
So the maximum number of stack frames we ever have at any given moment 

463
00:35:17,020 --> 00:35:28,460
is on the order of log n stack frames. 

464
00:35:28,460 --> 00:35:42,460
Each stack frame has constant space, 

465
00:35:42,460 --> 00:35:44,410
and therefore the total amount of space, 

466
00:35:44,410 --> 00:35:49,240
the maximum amount of space we ever use is O(log n) space

467
00:35:49,240 --> 00:36:03,040
where n is the number of days. 

468
00:36:03,040 --> 00:36:07,230
>> Now, always ask yourself, "Can we do better?"

469
00:36:07,230 --> 00:36:12,390
And in particular, can we reduce this to a problem we've already solved? 

470
00:36:12,390 --> 00:36:20,040
A hint: we only discussed two other problems before this, and it's not going to be shuffle. 

471
00:36:20,040 --> 00:36:26,200
We can convert this stock market problem into the maximal subarray problem. 

472
00:36:26,200 --> 00:36:40,100
How can we do this?

473
00:36:40,100 --> 00:36:42,570
One of you? Emmy? 

474
00:36:42,570 --> 00:36:47,680
(Emmy, unintelligible)

475
00:36:47,680 --> 00:36:53,860
[Yu] Exactly. 

476
00:36:53,860 --> 00:36:59,940
So the maximal subarray problem, 

477
00:36:59,940 --> 00:37:10,610
we're looking for a sum over a continuous subarray. 

478
00:37:10,610 --> 00:37:16,230
And Emmy's suggestion for the stocks problem, 

479
00:37:16,230 --> 00:37:30,720
consider the changes, or the deltas. 

480
00:37:30,720 --> 00:37:37,440
And a picture of this is--this is the price of a stock, 

481
00:37:37,440 --> 00:37:42,610
but if we took the difference between each consecutive day--

482
00:37:42,610 --> 00:37:45,200
so we see that the maximum price, maximum profit we could make 

483
00:37:45,200 --> 00:37:50,070
is if we buy here and sell here. 

484
00:37:50,070 --> 00:37:54,240
But let's look at the continuous--let's look at the subarray problem. 

485
00:37:54,240 --> 00:38:02,510
So here, we can make--going from here to here, 

486
00:38:02,510 --> 00:38:08,410
we have a positive change, and then going from here to here we have a negative change. 

487
00:38:08,410 --> 00:38:14,220
But then, going from here to here we have a huge positive change. 

488
00:38:14,220 --> 00:38:20,930
And these are the changes that we want to sum up to get our final profit. 

489
00:38:20,930 --> 00:38:25,160
Then we have more negative changes here. 

490
00:38:25,160 --> 00:38:29,990
The key to reducing our stock problem into our maximal subarray problem

491
00:38:29,990 --> 00:38:36,630
is to consider the deltas between days. 

492
00:38:36,630 --> 00:38:40,630
So we create a new array called deltas, 

493
00:38:40,630 --> 00:38:43,000
initialize the first entry to be 0, 

494
00:38:43,000 --> 00:38:46,380
and then for each delta(i), let that be the difference

495
00:38:46,380 --> 00:38:52,830
of my input array(i), and array(i - 1). 

496
00:38:52,830 --> 00:38:55,530
Then we call our routine procedure for a maximal subarray

497
00:38:55,530 --> 00:39:01,500
passing in a delta's array. 

498
00:39:01,500 --> 00:39:06,440
And because maximal subarray is linear time, 

499
00:39:06,440 --> 00:39:09,370
and this reduction, this process of creating this delta array,

500
00:39:09,370 --> 00:39:11,780
is also linear time,

501
00:39:11,780 --> 00:39:19,060
then the final solution for stocks is O(n) work plus O(n) work, is still O(n) work. 

502
00:39:19,060 --> 00:39:23,900
So we have a linear time solution to our problem. 

503
00:39:23,900 --> 00:39:29,610
Does everyone understand this transformation? 

504
00:39:29,610 --> 00:39:32,140
>> In general, a good idea that you should always have 

505
00:39:32,140 --> 00:39:34,290
is try to reduce a new problem that you're seeing.

506
00:39:34,290 --> 00:39:37,700
If it looks familiar to an old problem, 

507
00:39:37,700 --> 00:39:39,590
try reducing it to an old problem. 

508
00:39:39,590 --> 00:39:41,950
And if you can use all the tools that you've used on the old problem 

509
00:39:41,950 --> 00:39:46,450
to solve the new problem. 

510
00:39:46,450 --> 00:39:49,010
So to wrap up, technical interviews are challenging. 

511
00:39:49,010 --> 00:39:52,280
These problems are probably some of the more difficult problems

512
00:39:52,280 --> 00:39:54,700
that you might see in an interview, 

513
00:39:54,700 --> 00:39:57,690
so if you don't understand all the problems that I just covered, it's okay. 

514
00:39:57,690 --> 00:40:01,080
These are some of the more challenging problems. 

515
00:40:01,080 --> 00:40:03,050
Practice, practice, practice. 

516
00:40:03,050 --> 00:40:08,170
I gave a lot of problems in the handout, so definitely check those out. 

517
00:40:08,170 --> 00:40:11,690
And good luck on your interviews. All my resources are posted at this link, 

518
00:40:11,690 --> 00:40:15,220
and one of my senior friends has offered to do mock technical interviews, 

519
00:40:15,220 --> 00:40:22,050
so if you're interested, email Will Yao at that email address. 

520
00:40:22,050 --> 00:40:26,070
If you have some questions, you can ask me. 

521
00:40:26,070 --> 00:40:28,780
Do you guys have specific questions related to technical interviews

522
00:40:28,780 --> 00:40:38,440
or any problems we've seen so far? 

523
00:40:38,440 --> 00:40:49,910
Okay. Well, good luck on your interviews. 

524
00:40:49,910 --> 00:40:52,910
[CS50.TV]