1 00:00:00,000 --> 00:00:02,640 [Seminar: Technical Interviews] 2 00:00:02,640 --> 00:00:04,630 [Kenny Yu, Harvard University] 3 00:00:04,630 --> 00:00:08,910 [This is CS50.] [CS50.TV] 4 00:00:08,910 --> 00:00:12,420 Hi everyone, I'm Kenny. I am currently a junior studying computer science. 5 00:00:12,420 --> 00:00:17,310 I'm a former CS TF, and I wish I had this when I was an underclassman, 6 00:00:17,310 --> 00:00:19,380 and that's why I'm giving this seminar. 7 00:00:19,380 --> 00:00:21,370 So I hope you enjoy it. 8 00:00:21,370 --> 00:00:23,470 This seminar is about technical interviews, 9 00:00:23,470 --> 00:00:26,650 and all my resources can be found at this link, 10 00:00:26,650 --> 00:00:32,350 this link right here, a couple of resources. 11 00:00:32,350 --> 00:00:36,550 So I made a list of problems, actually, quite a few problems. 12 00:00:36,550 --> 00:00:40,800 Also a general resources page where we can find tips 13 00:00:40,800 --> 00:00:42,870 on how to prepare for an interview, 14 00:00:42,870 --> 00:00:46,470 tips on what you should do during an actual interview, 15 00:00:46,470 --> 00:00:51,910 as well as how to approach problems and resources for future reference. 16 00:00:51,910 --> 00:00:53,980 It's all online. 17 00:00:53,980 --> 00:00:58,290 And just to preface this seminar, a disclaimer, 18 00:00:58,290 --> 00:01:00,690 like this should not--your interview preparation 19 00:01:00,690 --> 00:01:02,800 should not be limited to this list. 20 00:01:02,800 --> 00:01:04,750 This is only meant to be a guide, 21 00:01:04,750 --> 00:01:08,890 and you should definitely take everything I say with a grain of salt, 22 00:01:08,890 --> 00:01:14,620 but also use everything I used to help you in your interview preparation. 23 00:01:14,620 --> 00:01:16,400 >> I'm going to speed through the next few slides 24 00:01:16,400 --> 00:01:18,650 so we can get to the actual case studies. 25 00:01:18,650 --> 00:01:23,630 The structure of an interview for a software engineering postion, 26 00:01:23,630 --> 00:01:26,320 typically it is 30 to 45 minutes, 27 00:01:26,320 --> 00:01:29,810 multiple rounds, depending on the company. 28 00:01:29,810 --> 00:01:33,090 Often you'll be coding on a white board. 29 00:01:33,090 --> 00:01:35,960 So a white board like this, but often on a smaller scale. 30 00:01:35,960 --> 00:01:38,540 If you're having a phone interview, you'll probably be using 31 00:01:38,540 --> 00:01:44,030 either collabedit or a Google Doc so they can see you live coding 32 00:01:44,030 --> 00:01:46,500 while you're being interviewed over the phone. 33 00:01:46,500 --> 00:01:48,490 An interview itself is typically 2 or 3 problems 34 00:01:48,490 --> 00:01:50,620 testing your computer science knowledge. 35 00:01:50,620 --> 00:01:54,490 And it will almost definitely involve coding. 36 00:01:54,490 --> 00:01:59,540 The types of questions that you'll see are usually data structures and algorithms. 37 00:01:59,540 --> 00:02:02,210 And in doing these kinds of problems, 38 00:02:02,210 --> 00:02:07,830 they will ask you, like, what is the time and space complexity, big O? 39 00:02:07,830 --> 00:02:09,800 Often they also ask higher-level questions, 40 00:02:09,800 --> 00:02:12,530 so, designing a system, 41 00:02:12,530 --> 00:02:14,770 how would you lay out your code? 42 00:02:14,770 --> 00:02:18,370 What interfaces, what classes, what modules do you have in your system, 43 00:02:18,370 --> 00:02:20,900 and how do these interact together? 44 00:02:20,900 --> 00:02:26,130 So data structures and algorithms as well as designing systems. 45 00:02:26,130 --> 00:02:29,180 >> Some general tips before we dive in to our case studies. 46 00:02:29,180 --> 00:02:32,300 I think the most important rule is always be thinking out loud. 47 00:02:32,300 --> 00:02:36,980 The interview is supposed to be your chance to show off your thinking process. 48 00:02:36,980 --> 00:02:39,820 The point of the interview is for the interviewer to gauge 49 00:02:39,820 --> 00:02:42,660 how you think and how you go through a problem. 50 00:02:42,660 --> 00:02:45,210 The worst thing you can do is be silent throughout the whole interview. 51 00:02:45,210 --> 00:02:50,090 That's just no good. 52 00:02:50,090 --> 00:02:53,230 When you are given a question, you also want to make sure you understand the question. 53 00:02:53,230 --> 00:02:55,350 So repeat the question back in your own words 54 00:02:55,350 --> 00:02:58,920 and attempt to work thorough a few simple test cases 55 00:02:58,920 --> 00:03:01,530 to make sure you understand the question. 56 00:03:01,530 --> 00:03:05,510 Working through a few test cases will also give you an intuition on how to solve this problem. 57 00:03:05,510 --> 00:03:11,210 You might even discover a few patterns to help you solve the problem. 58 00:03:11,210 --> 00:03:14,500 Their big tip is to not get frustrated. 59 00:03:14,500 --> 00:03:17,060 Don't get frustrated. 60 00:03:17,060 --> 00:03:19,060 Interviews are challenging, but the worst thing you can do, 61 00:03:19,060 --> 00:03:23,330 in addition to being silent, is to be visibly frustrated. 62 00:03:23,330 --> 00:03:27,410 You do not want to give that impression to an interviewer. 63 00:03:27,410 --> 00:03:33,960 One thing that you--so, many people, when they go into an interview, 64 00:03:33,960 --> 00:03:37,150 they attempt to try to find the best solution first, 65 00:03:37,150 --> 00:03:39,950 when really, there's usually a glaringly obvious solution. 66 00:03:39,950 --> 00:03:43,500 It might be slow, it might be inefficient, but you should just state it, 67 00:03:43,500 --> 00:03:46,210 just so you have a starting point from which to work better. 68 00:03:46,210 --> 00:03:48,270 Also, pointing out the solution is slow, in terms of 69 00:03:48,270 --> 00:03:52,160 big O time complexity or space complexity, 70 00:03:52,160 --> 00:03:54,450 will demonstrate to the interviewer that you understand 71 00:03:54,450 --> 00:03:57,510 these issues when writing code. 72 00:03:57,510 --> 00:04:01,440 So don't be afraid to come up with the simplest algorithm first 73 00:04:01,440 --> 00:04:04,950 and then work better from there. 74 00:04:04,950 --> 00:04:09,810 Any questions so far? Okay. 75 00:04:09,810 --> 00:04:11,650 >> So let's dive into our first problem. 76 00:04:11,650 --> 00:04:14,790 "Given an array of n integers, write a function that shuffles the array 77 00:04:14,790 --> 00:04:20,209 in place such that all permutations of the n integers are equally likely." 78 00:04:20,209 --> 00:04:23,470 And assume you have available a random integer generator 79 00:04:23,470 --> 00:04:30,980 that generates an integer in a range from 0 to i, half range. 80 00:04:30,980 --> 00:04:32,970 Does everyone understand this question? 81 00:04:32,970 --> 00:04:39,660 I give you an array of n integers, and I want you to shuffle it. 82 00:04:39,660 --> 00:04:46,050 In my directory, I wrote a few programs to demonstrate what I mean. 83 00:04:46,050 --> 00:04:48,910 I'm going to shuffle an array of 20 elements, 84 00:04:48,910 --> 00:04:52,490 from -10 to +9, 85 00:04:52,490 --> 00:04:57,050 and I want you to output a list like this. 86 00:04:57,050 --> 00:05:02,890 So this is my sorted input array, and I want you to shuffle it. 87 00:05:02,890 --> 00:05:07,070 We'll do it again. 88 00:05:07,070 --> 00:05:13,780 Does everyone understand the question? Okay. 89 00:05:13,780 --> 00:05:16,730 So it's up to you. 90 00:05:16,730 --> 00:05:21,220 What are some ideas? Can you do it as n^2, n log n, n? 91 00:05:21,220 --> 00:05:34,400 Open to suggestions. 92 00:05:34,400 --> 00:05:37,730 Okay. So one idea, suggested by Emmy, 93 00:05:37,730 --> 00:05:45,300 is first compute a random number, random integer, in a range from 0 to 20. 94 00:05:45,300 --> 00:05:49,840 So assume our array has a length of 20. 95 00:05:49,840 --> 00:05:54,800 In our diagram of 20 elements, 96 00:05:54,800 --> 00:05:58,560 this is our input array. 97 00:05:58,560 --> 00:06:04,590 And now, her suggestion is to create a new array, 98 00:06:04,590 --> 00:06:08,440 so this will be the output array. 99 00:06:08,440 --> 00:06:12,880 And based on the i returned by rand-- 100 00:06:12,880 --> 00:06:17,580 so if i was, let's say, 17, 101 00:06:17,580 --> 00:06:25,640 copy the 17th element into the first position. 102 00:06:25,640 --> 00:06:30,300 Now we need to delete--we need to shift all the elements here 103 00:06:30,300 --> 00:06:36,920 over so that we have a gap at the end and no holes in the middle. 104 00:06:36,920 --> 00:06:39,860 And now we repeat the process. 105 00:06:39,860 --> 00:06:46,360 Now we pick a new random integer between 0 and 19. 106 00:06:46,360 --> 00:06:52,510 We have a new i here, and we copy this element into this position. 107 00:06:52,510 --> 00:07:00,960 Then we shift items over and we repeat the process until we have our full new array. 108 00:07:00,960 --> 00:07:05,890 What is the run time of this algorithm? 109 00:07:05,890 --> 00:07:08,110 Well, let's consider the impact of this. 110 00:07:08,110 --> 00:07:10,380 We are shifting every element. 111 00:07:10,380 --> 00:07:16,800 When we remove this i, we are shifting all the elements after it to the left. 112 00:07:16,800 --> 00:07:21,600 And that is an O(n) cost 113 00:07:21,600 --> 00:07:26,100 because what if we remove the first element? 114 00:07:26,100 --> 00:07:29,670 So for each removal, we remove-- 115 00:07:29,670 --> 00:07:32,170 each removal incurs an O(n) operation, 116 00:07:32,170 --> 00:07:41,520 and since we have n removals, this leads to an O(n^2) shuffle. 117 00:07:41,520 --> 00:07:49,550 Okay. So good start. Good start. 118 00:07:49,550 --> 00:07:55,290 >> Another suggestion is to use something known as the Knuth shuffle, 119 00:07:55,290 --> 00:07:57,540 or the Fisher-Yates shuffle. 120 00:07:57,540 --> 00:07:59,630 And it's actually a linear time shuffle. 121 00:07:59,630 --> 00:08:02,200 And the idea is very similar. 122 00:08:02,200 --> 00:08:05,160 Again, we have our input array, 123 00:08:05,160 --> 00:08:08,580 but instead of using two arrays for our input/output, 124 00:08:08,580 --> 00:08:13,590 we use the first portion of the array to keep track of our shuffled portion, 125 00:08:13,590 --> 00:08:18,400 and we keep track, and then we leave the rest of our array for the unshuffled portion. 126 00:08:18,400 --> 00:08:24,330 So here's what I mean. We start off with--we choose an i, 127 00:08:24,330 --> 00:08:30,910 an array from 0 to 20. 128 00:08:30,910 --> 00:08:36,150 Our current pointer is pointing to the first index. 129 00:08:36,150 --> 00:08:39,590 We choose some i here 130 00:08:39,590 --> 00:08:42,740 and now we swap. 131 00:08:42,740 --> 00:08:47,690 So if this was 5 and this was 4, 132 00:08:47,690 --> 00:08:57,150 the resulting array will have 5 here and 4 here. 133 00:08:57,150 --> 00:09:00,390 And now we note a marker here. 134 00:09:00,390 --> 00:09:05,770 Everything to the left is shuffled, 135 00:09:05,770 --> 00:09:15,160 and everything to the right is unshuffled. 136 00:09:15,160 --> 00:09:17,260 And now we can repeat the process. 137 00:09:17,260 --> 00:09:25,210 We choose a random index between 1 and 20 now. 138 00:09:25,210 --> 00:09:30,650 So suppose our new i is here. 139 00:09:30,650 --> 00:09:39,370 Now we swap this i with our current new position here. 140 00:09:39,370 --> 00:09:44,790 So we do a swapping back and forth like this. 141 00:09:44,790 --> 00:09:51,630 Let me bring up the code to make it more concrete. 142 00:09:51,630 --> 00:09:55,290 We start with our choice of i-- 143 00:09:55,290 --> 00:09:58,370 we start with i equal to 0, we pick a random location j 144 00:09:58,370 --> 00:10:02,420 in the unshuffled portion of the array, i to n-1. 145 00:10:02,420 --> 00:10:07,280 So if I'm here, choose a random index between here and the rest of the array, 146 00:10:07,280 --> 00:10:12,410 and we swap. 147 00:10:12,410 --> 00:10:17,550 This is all the code necessary to shuffle your array. 148 00:10:17,550 --> 00:10:21,670 Any questions? 149 00:10:21,670 --> 00:10:25,530 >> Well, one needed question is, why is this correct? 150 00:10:25,530 --> 00:10:28,360 Why is every permutation equally likely? 151 00:10:28,360 --> 00:10:30,410 And I won't go through the proof of this, 152 00:10:30,410 --> 00:10:35,970 but many problems in computer science can be proven through induction. 153 00:10:35,970 --> 00:10:38,520 How many of you are familiar with induction? 154 00:10:38,520 --> 00:10:40,590 Okay. Cool. 155 00:10:40,590 --> 00:10:43,610 So you can prove the correctness of this algorithm by simple induction, 156 00:10:43,610 --> 00:10:49,540 where your induction hypothesis would be, assume that 157 00:10:49,540 --> 00:10:53,290 my shuffle returns every permutation equally likely 158 00:10:53,290 --> 00:10:56,120 up to the first i elements. 159 00:10:56,120 --> 00:10:58,300 Now, consider i + 1. 160 00:10:58,300 --> 00:11:02,550 And by the way we choose our index j to swap, 161 00:11:02,550 --> 00:11:05,230 this leads to--and then you work out the details, 162 00:11:05,230 --> 00:11:07,390 at least a full proof of why this algorithm returns 163 00:11:07,390 --> 00:11:12,800 every permutation with equally likely probability. 164 00:11:12,800 --> 00:11:15,940 >> All right, next problem. 165 00:11:15,940 --> 00:11:19,170 So "given an array of integers, postive, zero, negative, 166 00:11:19,170 --> 00:11:21,290 write a function that calculates the maximum sum 167 00:11:21,290 --> 00:11:24,720 of any continueous subarray of the input array." 168 00:11:24,720 --> 00:11:28,370 An example here is, in the case where all numbers are positive, 169 00:11:28,370 --> 00:11:31,320 then currently the best choice is to take the whole array. 170 00:11:31,320 --> 00:11:34,690 1, 2, 3, 4, equals 10. 171 00:11:34,690 --> 00:11:36,780 When you have some negatives in there, 172 00:11:36,780 --> 00:11:38,690 in this case we just want the first two 173 00:11:38,690 --> 00:11:44,590 because choosing -1 and/or -3 will bring our sum down. 174 00:11:44,590 --> 00:11:48,120 Sometimes we might have to start in the middle of the array. 175 00:11:48,120 --> 00:11:53,500 Sometimes we want to choose nothing at all; it's best to not take anything. 176 00:11:53,500 --> 00:11:56,490 And sometimes it's better to take the fall, 177 00:11:56,490 --> 00:12:07,510 because the thing after it is super big. So any ideas? 178 00:12:07,510 --> 00:12:10,970 (student, unintelligible) >>Yeah. 179 00:12:10,970 --> 00:12:13,560 Suppose I don't take -1. 180 00:12:13,560 --> 00:12:16,170 Then either I choose 1,000 and 20,000, 181 00:12:16,170 --> 00:12:18,630 or I just choose the 3 billion. 182 00:12:18,630 --> 00:12:20,760 Well, the best choice is to take all the numbers. 183 00:12:20,760 --> 00:12:24,350 This -1, despite being negative, 184 00:12:24,350 --> 00:12:31,340 the whole sum is better than were I not to take -1. 185 00:12:31,340 --> 00:12:36,460 So one of the tips I mentioned earlier was to state the clearly obvious 186 00:12:36,460 --> 00:12:40,540 and the brute force solution first. 187 00:12:40,540 --> 00:12:44,340 What is the brute force solution in this problem? Yeah? 188 00:12:44,340 --> 00:12:46,890 [Jane] Well, I think the brute force solution 189 00:12:46,890 --> 00:12:52,600 would be to add up all the possible combinations (unintelligible). 190 00:12:52,600 --> 00:12:58,250 [Yu] Okay. So Jane's idea is to take every possible-- 191 00:12:58,250 --> 00:13:01,470 I'm paraphrasing--is to take every possible continuous subarray, 192 00:13:01,470 --> 00:13:07,840 compute its sum, and then take the maximum of all the possible continuous subarrays. 193 00:13:07,840 --> 00:13:13,310 What uniquely identifies a subarray in my input array? 194 00:13:13,310 --> 00:13:17,380 Like, what two things do I need? Yeah? 195 00:13:17,380 --> 00:13:19,970 (student, unintelligible) >>Right. 196 00:13:19,970 --> 00:13:22,130 A lower bound on the index and an upper bound index 197 00:13:22,130 --> 00:13:28,300 uniquely determines a continuous subarray. 198 00:13:28,300 --> 00:13:31,400 [Female student] Are we estimating it's an array of unique numbers? 199 00:13:31,400 --> 00:13:34,280 [Yu] No. So her question is, are we assuming our array-- 200 00:13:34,280 --> 00:13:39,000 is our array all unique numbers, and the answer is no. 201 00:13:39,000 --> 00:13:43,390 >> If we use our brute force solution, then the start/end indices 202 00:13:43,390 --> 00:13:47,200 uniquely determines our continuous subarray. 203 00:13:47,200 --> 00:13:51,680 So if we iterate for all possible start entries, 204 00:13:51,680 --> 00:13:58,320 and for all end entries > or = to start, and < n, 205 00:13:58,320 --> 00:14:05,570 you compute the sum, and then we take the maximum sum we've seen so far. 206 00:14:05,570 --> 00:14:07,880 Is this clear? 207 00:14:07,880 --> 00:14:12,230 What is the big O of this solution? 208 00:14:12,230 --> 00:14:16,660 Timewise. 209 00:14:16,660 --> 00:14:18,860 Not quite n^2. 210 00:14:18,860 --> 00:14:25,250 Note that we iterate from 0 to n, 211 00:14:25,250 --> 00:14:27,520 so that's one for loop. 212 00:14:27,520 --> 00:14:35,120 We iterate again from almost the beginning to the end, another for loop. 213 00:14:35,120 --> 00:14:37,640 And now, within that, we have to sum up every entry, 214 00:14:37,640 --> 00:14:43,810 so that's another for loop. So we have three nested for loops, n^3. 215 00:14:43,810 --> 00:14:46,560 Okay. This goes as a starting point. 216 00:14:46,560 --> 00:14:53,180 Our solution is no worse than n^3. 217 00:14:53,180 --> 00:14:55,480 Does everyone understand the brute force solution? 218 00:14:55,480 --> 00:14:59,950 >> Okay. A better solution is using an idea called dynamic programming. 219 00:14:59,950 --> 00:15:03,040 If you take CS124, which is Algorithms and Data Structures, 220 00:15:03,040 --> 00:15:05,680 you will become very familiar with this technique. 221 00:15:05,680 --> 00:15:12,190 And the idea is, try to build up solutions to smaller problems first. 222 00:15:12,190 --> 00:15:17,990 What I mean by this is, we currently have to worry about two things: start and end. 223 00:15:17,990 --> 00:15:29,340 And that's annoying. What if we could get rid of one of those parameters? 224 00:15:29,340 --> 00:15:32,650 One idea is to--we're given our original problem, 225 00:15:32,650 --> 00:15:37,470 find the maximum sum of any subarray in a range [O,n-1]. 226 00:15:37,470 --> 00:15:47,400 And now we have a new subproblem, where we know, in our current index i, 227 00:15:47,400 --> 00:15:52,520 we know we must conclude there. Our subarray must end at the current index. 228 00:15:52,520 --> 00:15:57,640 So the remaining problem is, where should we start our subarray? 229 00:15:57,640 --> 00:16:05,160 Does this make sense? Okay. 230 00:16:05,160 --> 00:16:12,030 So I've coded this up, and let's look at what this means. 231 00:16:12,030 --> 00:16:16,230 In the codirectory, there's a program called subarray, 232 00:16:16,230 --> 00:16:19,470 and it takes number of items, 233 00:16:19,470 --> 00:16:25,550 and it returns the maximal subarray sum in my shuffled list. 234 00:16:25,550 --> 00:16:29,920 So in this case, our maximal subarray is 3. 235 00:16:29,920 --> 00:16:34,850 And that's taken by just using 2 and 1. 236 00:16:34,850 --> 00:16:38,050 Let's run it again. It's also 3. 237 00:16:38,050 --> 00:16:40,950 But this time, note how we got the 3. 238 00:16:40,950 --> 00:16:46,690 We took the--we just take the 3 itself 239 00:16:46,690 --> 00:16:49,980 because it's surrounded by negatives on both sides, 240 00:16:49,980 --> 00:16:55,080 which will bring a sum < 3. 241 00:16:55,080 --> 00:16:57,820 Let's run on 10 items. 242 00:16:57,820 --> 00:17:03,200 This time it's 7, we take the leading 3 and 4. 243 00:17:03,200 --> 00:17:09,450 This time it's 8, and we obtain that by taking 1, 4 and 3. 244 00:17:09,450 --> 00:17:16,310 So to give you an intuition on how we can solve this transformed problem, 245 00:17:16,310 --> 00:17:18,890 let's take a look at this subarray. 246 00:17:18,890 --> 00:17:23,460 We're given this input array, and we know the answer is 8. 247 00:17:23,460 --> 00:17:26,359 We take the 1, 4, and 3. 248 00:17:26,359 --> 00:17:29,090 But let's look at how we actually got that answer. 249 00:17:29,090 --> 00:17:34,160 Let's look at the maximal subarray that ended at each of these indices. 250 00:17:34,160 --> 00:17:40,780 What's the maximal subarray that has to end at the first position? 251 00:17:40,780 --> 00:17:46,310 [Student] Zero. >>Zero. Just don't take the -5. 252 00:17:46,310 --> 00:17:50,210 Here it's going to be 0 as well. Yeah? 253 00:17:50,210 --> 00:17:56,470 (student, unintelligible) 254 00:17:56,470 --> 00:17:58,960 [Yu] Oh, sorry, it is a -3. 255 00:17:58,960 --> 00:18:03,220 So this is a 2, this is a -3. 256 00:18:03,220 --> 00:18:08,690 Okay. So -4, what's the maximal subarray to end that position 257 00:18:08,690 --> 00:18:12,910 where -4 is at? Zero. 258 00:18:12,910 --> 00:18:22,570 One? 1, 5, 8. 259 00:18:22,570 --> 00:18:28,060 Now, I must end at the location where -2 is at. 260 00:18:28,060 --> 00:18:39,330 So 6, 5, 7, and the last one is 4. 261 00:18:39,330 --> 00:18:43,480 Knowing that these are my entries 262 00:18:43,480 --> 00:18:48,130 for the transformed problem where I must end at each of these indices, 263 00:18:48,130 --> 00:18:51,410 then my final answer is just, take a sweep across, 264 00:18:51,410 --> 00:18:53,580 and take the maximum number. 265 00:18:53,580 --> 00:18:55,620 So in this case it's 8. 266 00:18:55,620 --> 00:19:00,010 This implies that the maximal subarray ends at this index, 267 00:19:00,010 --> 00:19:04,970 and started somewhere before it. 268 00:19:04,970 --> 00:19:09,630 Does everyone understand this transformed subarray? 269 00:19:09,630 --> 00:19:22,160 >> Okay. Well, let's figure out the recurrence for this. 270 00:19:22,160 --> 00:19:27,990 Let's consider just the first few entries. 271 00:19:27,990 --> 00:19:35,930 So here it was 0, 0, 0, 1, 5, 8. 272 00:19:35,930 --> 00:19:39,790 And then there was a -2 here, and that brought it down to 6. 273 00:19:39,790 --> 00:19:50,800 So if I call the entry at position i subproblem(i), 274 00:19:50,800 --> 00:19:54,910 how can I use the answer to a previous subproblem 275 00:19:54,910 --> 00:19:59,360 to answer this subproblem? 276 00:19:59,360 --> 00:20:04,550 If I look at, let's say, this entry. 277 00:20:04,550 --> 00:20:09,190 How can I compute the answer 6 by looking at 278 00:20:09,190 --> 00:20:18,780 a combination of this array and the answers to previous subproblems in this array? Yes? 279 00:20:18,780 --> 00:20:22,800 [Female student] You take the array of sums 280 00:20:22,800 --> 00:20:25,430 in the position right before it, so the 8, 281 00:20:25,430 --> 00:20:32,170 and then you add the current subproblem. 282 00:20:32,170 --> 00:20:36,460 [Yu] So her suggestion is to look at these two numbers, 283 00:20:36,460 --> 00:20:40,090 this number and this number. 284 00:20:40,090 --> 00:20:50,130 So this 8 refers to the answer for the subproblem (i - 1). 285 00:20:50,130 --> 00:20:57,300 And let's call my input array A. 286 00:20:57,300 --> 00:21:01,470 In order to find a maximal subarray that ends at position i, 287 00:21:01,470 --> 00:21:03,980 I have two choices: I can either continue the subarray 288 00:21:03,980 --> 00:21:09,790 that ended at the previous index, or start a new array. 289 00:21:09,790 --> 00:21:14,190 If I were to continue the subarray that started at the previous index, 290 00:21:14,190 --> 00:21:19,260 then the maximum sum I can achieve is the answer to the previous subproblem 291 00:21:19,260 --> 00:21:24,120 plus the current array entry. 292 00:21:24,120 --> 00:21:27,550 But, I also have the choice of starting a new subarray, 293 00:21:27,550 --> 00:21:30,830 in which case the sum is 0. 294 00:21:30,830 --> 00:21:42,860 So the answer is max of 0, subproblem i - 1, plus the current array entry. 295 00:21:42,860 --> 00:21:46,150 Does this recurrence make sense? 296 00:21:46,150 --> 00:21:50,840 Our recurrence, as we just discovered, is subproblem i 297 00:21:50,840 --> 00:21:54,740 is equal to the maximum of the previous subproblem plus my current array entry, 298 00:21:54,740 --> 00:22:01,490 which means continue the previous subarray, 299 00:22:01,490 --> 00:22:07,250 or 0, start a new subarray at my current index. 300 00:22:07,250 --> 00:22:10,060 And once we have built up this table of solutions, then our final answer, 301 00:22:10,060 --> 00:22:13,950 just do a linear sweep across the subproblem array 302 00:22:13,950 --> 00:22:19,890 and take the maximum number. 303 00:22:19,890 --> 00:22:23,330 This is an exact implementation of what I just said. 304 00:22:23,330 --> 00:22:27,320 So we create a new subproblem array, subproblems. 305 00:22:27,320 --> 00:22:32,330 The first entry is either 0 or the first entry, the maximum of those two. 306 00:22:32,330 --> 00:22:35,670 And for the rest of the subproblems 307 00:22:35,670 --> 00:22:39,810 we use the exact recurrence we just discovered. 308 00:22:39,810 --> 00:22:49,960 Now we compute the maximum of our subproblems array, and that's our final answer. 309 00:22:49,960 --> 00:22:54,130 >> So how much space are we using in this algorithm? 310 00:22:54,130 --> 00:23:01,470 If you've only taken CS50, then you might not have discussed space very much. 311 00:23:01,470 --> 00:23:07,750 Well, one thing to note is that I called malloc here with size n. 312 00:23:07,750 --> 00:23:13,590 What does that suggest to you? 313 00:23:13,590 --> 00:23:17,450 This algorithm uses linear space. 314 00:23:17,450 --> 00:23:21,030 Can we do better? 315 00:23:21,030 --> 00:23:30,780 Is there anything that you notice that is unnecessary to compute the final answer? 316 00:23:30,780 --> 00:23:33,290 I guess a better question is, what information 317 00:23:33,290 --> 00:23:40,680 do we not need to carry all the way through to the end? 318 00:23:40,680 --> 00:23:44,280 Now, if we look at these two lines, 319 00:23:44,280 --> 00:23:47,720 we only care about the previous subproblem, 320 00:23:47,720 --> 00:23:50,910 and we only care about the maximum we've ever seen so far. 321 00:23:50,910 --> 00:23:53,610 To compute our final answer, we don't need the entire array. 322 00:23:53,610 --> 00:23:57,450 We only need the last number, last two numbers. 323 00:23:57,450 --> 00:24:02,630 Last number for the subproblem array, and last number for the maximum. 324 00:24:02,630 --> 00:24:07,380 So, in fact, we can fuse these loops together 325 00:24:07,380 --> 00:24:10,460 and go from linear space to constant space. 326 00:24:10,460 --> 00:24:15,830 Current sum so far, here, replaces the role of subproblem, our subproblem array. 327 00:24:15,830 --> 00:24:20,020 So current sum, so far, is the answer to the previous subproblem. 328 00:24:20,020 --> 00:24:23,450 And that sum, so far, takes the place of our max. 329 00:24:23,450 --> 00:24:28,100 We compute the maximum as we go along. 330 00:24:28,100 --> 00:24:30,890 And so we go from linear space to constant space, 331 00:24:30,890 --> 00:24:36,650 and we also have a linear solution to our subarray problem. 332 00:24:36,650 --> 00:24:40,740 >> These kinds of questions you will get during an interview. 333 00:24:40,740 --> 00:24:44,450 What is the time complexity; what is the space complexity? 334 00:24:44,450 --> 00:24:50,600 Can you do better? Are there things that are unnecessary to keep around? 335 00:24:50,600 --> 00:24:55,270 I did this to highlight analyses that you should take on your own 336 00:24:55,270 --> 00:24:57,400 as you're working through these problems. 337 00:24:57,400 --> 00:25:01,710 Always be asking yourself, "Can I do better?" 338 00:25:01,710 --> 00:25:07,800 In fact, can we do better than this? 339 00:25:07,800 --> 00:25:10,730 Sort of a trick question. You can't, because you need to 340 00:25:10,730 --> 00:25:13,590 at least read the input to the problem. 341 00:25:13,590 --> 00:25:15,570 So the fact that you need to at least read the input to the problem 342 00:25:15,570 --> 00:25:19,580 means that you can't do better than linear time, 343 00:25:19,580 --> 00:25:22,870 and you can't do better than constant space. 344 00:25:22,870 --> 00:25:27,060 So this is, in fact, the best solution to this problem. 345 00:25:27,060 --> 00:25:33,040 Questions? Okay. 346 00:25:33,040 --> 00:25:35,190 >> Stock market problem: 347 00:25:35,190 --> 00:25:38,350 "Given an array of n integers, positive, zero, or negative, 348 00:25:38,350 --> 00:25:41,680 that represent the price of a stock over n days, 349 00:25:41,680 --> 00:25:44,080 write a function to compute the maximum profit you can make 350 00:25:44,080 --> 00:25:49,350 given that you buy and sell exactly 1 stock within these n days." 351 00:25:49,350 --> 00:25:51,690 Essentially, we want to buy low, sell high. 352 00:25:51,690 --> 00:25:58,580 And we want to figure out the best profit we can make. 353 00:25:58,580 --> 00:26:11,500 Going back to my tip, what is the immediately clear, simplest answer, but it's slow? 354 00:26:11,500 --> 00:26:17,690 Yes? (student, unintelligible) >>Yes. 355 00:26:17,690 --> 00:26:21,470 >>So you would just go though and look at the stock prices 356 00:26:21,470 --> 00:26:30,550 at each point in time, (unintelligible). 357 00:26:30,550 --> 00:26:33,990 [Yu] Okay, so her solution--her suggestion of computing 358 00:26:33,990 --> 00:26:37,380 the lowest and computing the highest doesn't necessarily work 359 00:26:37,380 --> 00:26:42,470 because the highest might occur before the lowest. 360 00:26:42,470 --> 00:26:47,340 So what is the brute force solution to this problem? 361 00:26:47,340 --> 00:26:53,150 What are the two things that I need to uniquely determine the profit I make? Right. 362 00:26:53,150 --> 00:26:59,410 The brute force solution is--oh, so, George's suggestion is we only need two days 363 00:26:59,410 --> 00:27:02,880 to uniquely determine the profit of those two days. 364 00:27:02,880 --> 00:27:06,660 So we compute every pair, like buy/sell, 365 00:27:06,660 --> 00:27:12,850 compute the profit, which could be negative or positive or zero. 366 00:27:12,850 --> 00:27:18,000 Compute the maximum profit that we make after iterating over all pairs of days. 367 00:27:18,000 --> 00:27:20,330 That will be our final answer. 368 00:27:20,330 --> 00:27:25,730 And that solution will be O(n^2), because there is n choose two pairs-- 369 00:27:25,730 --> 00:27:30,270 of days that you can choose among end days. 370 00:27:30,270 --> 00:27:32,580 Okay, so I'm not going to go over the brute force solution here. 371 00:27:32,580 --> 00:27:37,420 I'm going to tell you that there's an n log n solution. 372 00:27:37,420 --> 00:27:45,550 What algorithm do you currently know that is n log n? 373 00:27:45,550 --> 00:27:50,730 It's not a trick question. 374 00:27:50,730 --> 00:27:54,790 >> Merge sort. Merge sort is n log n, 375 00:27:54,790 --> 00:27:57,760 and in fact, one way of solving this problem is to use 376 00:27:57,760 --> 00:28:04,400 a merge sort kind of idea called, in general, divide and conquer. 377 00:28:04,400 --> 00:28:07,570 And the idea is as follows. 378 00:28:07,570 --> 00:28:12,400 You want to compute the best buy/sell pair in the left half. 379 00:28:12,400 --> 00:28:16,480 Find the best profit you can make, just with the first n over two days. 380 00:28:16,480 --> 00:28:19,780 Then you want to oompute the best buy/sell pair on the right half, 381 00:28:19,780 --> 00:28:23,930 so the last n over two days. 382 00:28:23,930 --> 00:28:32,400 And now the question is, how do we merge these solutions back together? 383 00:28:32,400 --> 00:28:36,320 Yes? (student, unintelligible) 384 00:28:36,320 --> 00:28:49,890 >>Okay. So let me draw a picture. 385 00:28:49,890 --> 00:29:03,870 Yes? (George, unintelligible) 386 00:29:03,870 --> 00:29:06,450 >>Exactly. George's solution is exactly right. 387 00:29:06,450 --> 00:29:10,040 So his suggestion is, first compute the best buy/sell pair, 388 00:29:10,040 --> 00:29:16,050 and that occurs in the left half, so let's call that left, left. 389 00:29:16,050 --> 00:29:20,790 Best buy/sell pair that occurs in the right half. 390 00:29:20,790 --> 00:29:25,180 But if we only compared these two numbers, we're missing the case 391 00:29:25,180 --> 00:29:30,460 where we buy here and sell somewhere in the right half. 392 00:29:30,460 --> 00:29:33,810 We buy in the left half, sell in the right half. 393 00:29:33,810 --> 00:29:38,490 And the best way to compute the best buy/sell pair that spans both halves 394 00:29:38,490 --> 00:29:43,480 is to compute the minimum here and compute the maximum here 395 00:29:43,480 --> 00:29:45,580 and take their difference. 396 00:29:45,580 --> 00:29:50,850 So the two cases where the buy/sell pair occurs only here, 397 00:29:50,850 --> 00:30:01,910 only here, or on both halves is defined by these three numbers. 398 00:30:01,910 --> 00:30:06,450 So our algorithm to merge our solutions back together, 399 00:30:06,450 --> 00:30:08,350 we want to compute the best buy/sell pair 400 00:30:08,350 --> 00:30:13,120 where we buy on the left half and sell on the right half. 401 00:30:13,120 --> 00:30:16,740 And the best way to do that is to compute the lowest price in the first half, 402 00:30:16,740 --> 00:30:20,360 the highest price in the right half, and take their difference. 403 00:30:20,360 --> 00:30:25,390 The resulting three profits, these three numbers, you take the maximum of the three, 404 00:30:25,390 --> 00:30:32,720 and that's the best profit you can make over these first and end days. 405 00:30:32,720 --> 00:30:36,940 Here the important lines are in red. 406 00:30:36,940 --> 00:30:41,160 This is a recursive call to compute the answer in the left half. 407 00:30:41,160 --> 00:30:44,760 This is a recursive call to compute the answer in the right half. 408 00:30:44,760 --> 00:30:50,720 These two for loops compute the min and the max on the left and right half, respectively. 409 00:30:50,720 --> 00:30:54,970 Now I compute the profit that spans both halves, 410 00:30:54,970 --> 00:31:00,530 and the final answer is the maximum of these three. 411 00:31:00,530 --> 00:31:04,120 Okay. 412 00:31:04,120 --> 00:31:06,420 >> So, sure, we have an algorithm, but the bigger question is, 413 00:31:06,420 --> 00:31:08,290 what is the time complexity of this? 414 00:31:08,290 --> 00:31:16,190 And the reason why I mentioned merge sort is that this form of divide the answer 415 00:31:16,190 --> 00:31:19,200 into two and then merging our solutions back together 416 00:31:19,200 --> 00:31:23,580 is exactly the form of merge sort. 417 00:31:23,580 --> 00:31:33,360 So let me go through the duration. 418 00:31:33,360 --> 00:31:41,340 If we defined a function t(n) to be the number of steps 419 00:31:41,340 --> 00:31:50,010 for n days, 420 00:31:50,010 --> 00:31:54,350 our two recursive calls 421 00:31:54,350 --> 00:32:00,460 are each going to cost t(n/2), 422 00:32:00,460 --> 00:32:03,540 and there's two of these calls. 423 00:32:03,540 --> 00:32:10,020 Now I need to compute the minimum of the left half, 424 00:32:10,020 --> 00:32:17,050 which I can do in n/2 time, plus the maximum of the right half. 425 00:32:17,050 --> 00:32:20,820 So this is just n. 426 00:32:20,820 --> 00:32:25,050 And then plus some constant work. 427 00:32:25,050 --> 00:32:27,770 And this recurrence equation 428 00:32:27,770 --> 00:32:35,560 is exactly the recurrence equation for merge sort. 429 00:32:35,560 --> 00:32:39,170 And we all know that merge sort is n log n time. 430 00:32:39,170 --> 00:32:46,880 Therefore, our algorithm is also n log n time. 431 00:32:46,880 --> 00:32:52,220 Does this iteration make sense? 432 00:32:52,220 --> 00:32:55,780 Just a brief recap of this: 433 00:32:55,780 --> 00:32:59,170 T(n) is the number of steps to compute the maximum profit 434 00:32:59,170 --> 00:33:02,750 over the course of n days. 435 00:33:02,750 --> 00:33:06,010 The way we split up our recursive calls 436 00:33:06,010 --> 00:33:11,980 is by calling our solution on the first n/2 days, 437 00:33:11,980 --> 00:33:14,490 so that's one call, 438 00:33:14,490 --> 00:33:16,940 and then we call again on the second half. 439 00:33:16,940 --> 00:33:20,440 So that's two calls. 440 00:33:20,440 --> 00:33:25,310 And then we find a minimum on the left half, which we can do in linear time, 441 00:33:25,310 --> 00:33:29,010 find the maximum of the right half, which we can do in linear time. 442 00:33:29,010 --> 00:33:31,570 So n/2 + n/2 is just n. 443 00:33:31,570 --> 00:33:36,020 Then we have some constant work, which is like doing arithmetic. 444 00:33:36,020 --> 00:33:39,860 This recurrence equation is exactly the recurrence equation for merge sort. 445 00:33:39,860 --> 00:33:55,490 Hence, our shuffle algorithm is also n log n. 446 00:33:55,490 --> 00:33:58,520 So how much space are we using? 447 00:33:58,520 --> 00:34:04,910 Let's go back to the code. 448 00:34:04,910 --> 00:34:09,420 >> A better question is, how many stack frames do we ever have at any given moment? 449 00:34:09,420 --> 00:34:11,449 Since we're using recursion, 450 00:34:11,449 --> 00:34:23,530 the number of stack frames determines our space usage. 451 00:34:23,530 --> 00:34:29,440 Let's consider n = 8. 452 00:34:29,440 --> 00:34:36,889 We call shuffle on 8, 453 00:34:36,889 --> 00:34:41,580 which will call shuffle on the first four entries, 454 00:34:41,580 --> 00:34:46,250 which will call a shuffle on the first two entries. 455 00:34:46,250 --> 00:34:51,550 So our stack is--this is our stack. 456 00:34:51,550 --> 00:34:54,980 And then we call shuffle again on 1, 457 00:34:54,980 --> 00:34:58,070 and that's what our base case is, so we return immediately. 458 00:34:58,070 --> 00:35:04,700 Do we ever have more than this many stack frames? 459 00:35:04,700 --> 00:35:08,880 No. Because each time we do an invocation, 460 00:35:08,880 --> 00:35:10,770 a recursive invocation to shuffle, 461 00:35:10,770 --> 00:35:13,950 we divide our size in half. 462 00:35:13,950 --> 00:35:17,020 So the maximum number of stack frames we ever have at any given moment 463 00:35:17,020 --> 00:35:28,460 is on the order of log n stack frames. 464 00:35:28,460 --> 00:35:42,460 Each stack frame has constant space, 465 00:35:42,460 --> 00:35:44,410 and therefore the total amount of space, 466 00:35:44,410 --> 00:35:49,240 the maximum amount of space we ever use is O(log n) space 467 00:35:49,240 --> 00:36:03,040 where n is the number of days. 468 00:36:03,040 --> 00:36:07,230 >> Now, always ask yourself, "Can we do better?" 469 00:36:07,230 --> 00:36:12,390 And in particular, can we reduce this to a problem we've already solved? 470 00:36:12,390 --> 00:36:20,040 A hint: we only discussed two other problems before this, and it's not going to be shuffle. 471 00:36:20,040 --> 00:36:26,200 We can convert this stock market problem into the maximal subarray problem. 472 00:36:26,200 --> 00:36:40,100 How can we do this? 473 00:36:40,100 --> 00:36:42,570 One of you? Emmy? 474 00:36:42,570 --> 00:36:47,680 (Emmy, unintelligible) 475 00:36:47,680 --> 00:36:53,860 [Yu] Exactly. 476 00:36:53,860 --> 00:36:59,940 So the maximal subarray problem, 477 00:36:59,940 --> 00:37:10,610 we're looking for a sum over a continuous subarray. 478 00:37:10,610 --> 00:37:16,230 And Emmy's suggestion for the stocks problem, 479 00:37:16,230 --> 00:37:30,720 consider the changes, or the deltas. 480 00:37:30,720 --> 00:37:37,440 And a picture of this is--this is the price of a stock, 481 00:37:37,440 --> 00:37:42,610 but if we took the difference between each consecutive day-- 482 00:37:42,610 --> 00:37:45,200 so we see that the maximum price, maximum profit we could make 483 00:37:45,200 --> 00:37:50,070 is if we buy here and sell here. 484 00:37:50,070 --> 00:37:54,240 But let's look at the continuous--let's look at the subarray problem. 485 00:37:54,240 --> 00:38:02,510 So here, we can make--going from here to here, 486 00:38:02,510 --> 00:38:08,410 we have a positive change, and then going from here to here we have a negative change. 487 00:38:08,410 --> 00:38:14,220 But then, going from here to here we have a huge positive change. 488 00:38:14,220 --> 00:38:20,930 And these are the changes that we want to sum up to get our final profit. 489 00:38:20,930 --> 00:38:25,160 Then we have more negative changes here. 490 00:38:25,160 --> 00:38:29,990 The key to reducing our stock problem into our maximal subarray problem 491 00:38:29,990 --> 00:38:36,630 is to consider the deltas between days. 492 00:38:36,630 --> 00:38:40,630 So we create a new array called deltas, 493 00:38:40,630 --> 00:38:43,000 initialize the first entry to be 0, 494 00:38:43,000 --> 00:38:46,380 and then for each delta(i), let that be the difference 495 00:38:46,380 --> 00:38:52,830 of my input array(i), and array(i - 1). 496 00:38:52,830 --> 00:38:55,530 Then we call our routine procedure for a maximal subarray 497 00:38:55,530 --> 00:39:01,500 passing in a delta's array. 498 00:39:01,500 --> 00:39:06,440 And because maximal subarray is linear time, 499 00:39:06,440 --> 00:39:09,370 and this reduction, this process of creating this delta array, 500 00:39:09,370 --> 00:39:11,780 is also linear time, 501 00:39:11,780 --> 00:39:19,060 then the final solution for stocks is O(n) work plus O(n) work, is still O(n) work. 502 00:39:19,060 --> 00:39:23,900 So we have a linear time solution to our problem. 503 00:39:23,900 --> 00:39:29,610 Does everyone understand this transformation? 504 00:39:29,610 --> 00:39:32,140 >> In general, a good idea that you should always have 505 00:39:32,140 --> 00:39:34,290 is try to reduce a new problem that you're seeing. 506 00:39:34,290 --> 00:39:37,700 If it looks familiar to an old problem, 507 00:39:37,700 --> 00:39:39,590 try reducing it to an old problem. 508 00:39:39,590 --> 00:39:41,950 And if you can use all the tools that you've used on the old problem 509 00:39:41,950 --> 00:39:46,450 to solve the new problem. 510 00:39:46,450 --> 00:39:49,010 So to wrap up, technical interviews are challenging. 511 00:39:49,010 --> 00:39:52,280 These problems are probably some of the more difficult problems 512 00:39:52,280 --> 00:39:54,700 that you might see in an interview, 513 00:39:54,700 --> 00:39:57,690 so if you don't understand all the problems that I just covered, it's okay. 514 00:39:57,690 --> 00:40:01,080 These are some of the more challenging problems. 515 00:40:01,080 --> 00:40:03,050 Practice, practice, practice. 516 00:40:03,050 --> 00:40:08,170 I gave a lot of problems in the handout, so definitely check those out. 517 00:40:08,170 --> 00:40:11,690 And good luck on your interviews. All my resources are posted at this link, 518 00:40:11,690 --> 00:40:15,220 and one of my senior friends has offered to do mock technical interviews, 519 00:40:15,220 --> 00:40:22,050 so if you're interested, email Will Yao at that email address. 520 00:40:22,050 --> 00:40:26,070 If you have some questions, you can ask me. 521 00:40:26,070 --> 00:40:28,780 Do you guys have specific questions related to technical interviews 522 00:40:28,780 --> 00:40:38,440 or any problems we've seen so far? 523 00:40:38,440 --> 00:40:49,910 Okay. Well, good luck on your interviews. 524 00:40:49,910 --> 00:40:52,910 [CS50.TV]