1 00:00:00,000 --> 00:00:02,570 [Section 3 - More Comfortable] 2 00:00:02,570 --> 00:00:05,070 [Rob Bowden - Harvard University] 3 00:00:05,070 --> 00:00:07,310 >> [This is CS50. - CS50.TV] 4 00:00:07,310 --> 00:00:12,700 >> So the first question is strangely worded. 5 00:00:12,700 --> 00:00:17,480 GDB lets you "debug" a program, but, more specifically, what does it let you do? 6 00:00:17,480 --> 00:00:22,590 I'll answer that one, and I don't know what's exactly expected, 7 00:00:22,590 --> 00:00:27,910 so I'm guessing it's something along the lines of it lets you step by step 8 00:00:27,910 --> 00:00:31,540 walk through the program, interact with it, change variables, do all these things-- 9 00:00:31,540 --> 00:00:34,270 basically completely control execution of a program 10 00:00:34,270 --> 00:00:38,410 and inspect any given part of the execution of the program. 11 00:00:38,410 --> 00:00:43,030 So those features let you debug things. 12 00:00:43,030 --> 00:00:44,830 Okay. 13 00:00:44,830 --> 00:00:50,520 Why does binary search require that an array be sorted? 14 00:00:50,520 --> 00:00:53,360 Who wants to answer that? 15 00:00:56,120 --> 00:01:00,070 [student] Because it doesn't work if it's not sorted. >>Yeah. [laughter] 16 00:01:00,070 --> 00:01:04,910 If it's not sorted, then it's impossible to split it in half 17 00:01:04,910 --> 00:01:07,850 and know that everything to the left is less and everything to the right 18 00:01:07,850 --> 00:01:10,490 is greater than the middle value. 19 00:01:10,490 --> 00:01:12,790 So it needs to be sorted. 20 00:01:12,790 --> 00:01:14,170 Okay. 21 00:01:14,170 --> 00:01:17,570 Why is bubble sort in O of n squared? 22 00:01:17,570 --> 00:01:23,230 Does anyone first want to give a very quick high-level overview of what bubble sort is? 23 00:01:25,950 --> 00:01:33,020 [student] You basically go through each element and you check the first few elements. 24 00:01:33,020 --> 00:01:37,150 If they're out of where you swap them, then you check the next few elements and so on. 25 00:01:37,150 --> 00:01:40,770 When you reach the end, then you know that the largest element is placed at the end, 26 00:01:40,770 --> 00:01:42,750 so you ignore that one then you keep on going through, 27 00:01:42,750 --> 00:01:48,490 and each time you have to check one less element until you make no changes. >>Yeah. 28 00:01:48,490 --> 00:01:58,470 It's called bubble sort because if you flip the array on its side so it's up and down, vertical, 29 00:01:58,470 --> 00:02:03,100 then the large values will sink to the bottom and the small values will bubble up to the top. 30 00:02:03,100 --> 00:02:05,210 That's how it got its name. 31 00:02:05,210 --> 00:02:08,220 And yeah, you just go through. 32 00:02:08,220 --> 00:02:11,190 You keep going through the array, swapping the larger value 33 00:02:11,190 --> 00:02:14,040 to get the largest values to the bottom. 34 00:02:14,040 --> 00:02:17,500 >> Why is it O of n squared? 35 00:02:18,690 --> 00:02:24,620 First, does anyone want to say why it's O of n squared? 36 00:02:24,620 --> 00:02:28,760 [student] Because for each run it goes n times. 37 00:02:28,760 --> 00:02:32,100 You can only be sure that you've taken the biggest element all the way down, 38 00:02:32,100 --> 00:02:35,230 then you have to repeat that for as many elements. >>Yeah. 39 00:02:35,230 --> 00:02:41,800 So keep in mind what big O means and what big Omega means. 40 00:02:41,800 --> 00:02:50,560 The big O is like the upper bound on how slow it can actually run. 41 00:02:50,560 --> 00:02:58,990 So by saying it's O of n squared, it is not O of n or else it would be able to run 42 00:02:58,990 --> 00:03:02,640 in linear time, but it is O of n cubed 43 00:03:02,640 --> 00:03:06,390 because it is bounded by O of n cubed. 44 00:03:06,390 --> 00:03:12,300 If it's bounded by O of n squared, then it's bounded also by n cubed. 45 00:03:12,300 --> 00:03:20,280 So it is n squared, and in the absolute worst case it can't do better than n squared, 46 00:03:20,280 --> 00:03:22,830 which is why it's O of n squared. 47 00:03:22,830 --> 00:03:31,200 So to see slight math at how it comes out to be n squared, 48 00:03:31,200 --> 00:03:40,530 if we have five things in our list, the first time how many swaps could we potentially need to make 49 00:03:40,530 --> 00:03:47,170 in order to get this? Let's actually just-- 50 00:03:47,170 --> 00:03:52,040 How many swaps are we going to have to make in the first run of bubble sort through the array? 51 00:03:52,040 --> 00:03:53,540 [student] n - 1. >>Yeah. 52 00:03:53,540 --> 00:03:58,340 >> If there are 5 elements, we're going to need to make n - 1. 53 00:03:58,340 --> 00:04:01,100 Then on the second one how many swaps are we going to have to make? 54 00:04:01,100 --> 00:04:03,440 [student] n - 2. >>Yeah. 55 00:04:03,440 --> 00:04:11,640 And the third is going to be n - 3, and then for convenience I will write the last two 56 00:04:11,640 --> 00:04:15,270 as then we're going to need to make 2 swaps and 1 swap. 57 00:04:15,270 --> 00:04:19,899 I guess the last one may or may not actually need to happen. 58 00:04:19,899 --> 00:04:22,820 Is it a swap? I don't know. 59 00:04:22,820 --> 00:04:26,490 So these are the total amounts of swaps or at least comparisons you have to make. 60 00:04:26,490 --> 00:04:29,910 Even if you don't swap, you still need to compare the values. 61 00:04:29,910 --> 00:04:33,910 So there are n - 1 comparisons in the first run through the array. 62 00:04:33,910 --> 00:04:42,050 If you rearrange these things, let's actually make it six things so things stack up nicely, 63 00:04:42,050 --> 00:04:44,790 and then I'll do 3, 2, 1. 64 00:04:44,790 --> 00:04:49,910 So just rearranging these sums, we want to see how many comparisons we make 65 00:04:49,910 --> 00:04:52,700 in the entire algorithm. 66 00:04:52,700 --> 00:04:56,550 So if we bring these guys down here, 67 00:04:56,550 --> 00:05:07,470 then we're still just summing however many comparisons there were. 68 00:05:07,470 --> 00:05:13,280 But if we sum these and we sum these and we sum these, 69 00:05:13,280 --> 00:05:18,130 it's still the same problem. We just sum those particular groups. 70 00:05:18,130 --> 00:05:22,400 >> So now we're summing 3 n's. It's not just 3 n's. 71 00:05:22,400 --> 00:05:27,650 It's always going to be n/2 n's. 72 00:05:27,650 --> 00:05:29,430 So here we happen to have 6. 73 00:05:29,430 --> 00:05:34,830 If we had 10 things, then we could do this grouping for 5 different pairs of things 74 00:05:34,830 --> 00:05:37,180 and end up with n + n + n + n + n. 75 00:05:37,180 --> 00:05:45,840 So you're always going to get n/2 n's, and so this we'll jot it out to n squared/2. 76 00:05:45,840 --> 00:05:48,890 And so even though it's the factor of half, which happens to come in 77 00:05:48,890 --> 00:05:54,190 because of the fact that through each iteration over the array we compare 1 less 78 00:05:54,190 --> 00:05:58,040 so that's how we get the over 2, but it is still n squared. 79 00:05:58,040 --> 00:06:01,650 We don't care about the constant factor of a half. 80 00:06:01,650 --> 00:06:07,760 So a lot of big O stuff like this relies on just kind of doing this sort of math, 81 00:06:07,760 --> 00:06:12,260 doing arithmetic sums and geometric series stuff, 82 00:06:12,260 --> 00:06:17,750 but most of them in this course are pretty straightforward. 83 00:06:17,750 --> 00:06:19,290 Okay. 84 00:06:19,290 --> 00:06:24,430 Why is insertion sort in Omega of n? What does omega mean? 85 00:06:24,430 --> 00:06:27,730 [two students speaking at once - unintelligible] >>Yeah. 86 00:06:27,730 --> 00:06:30,630 Omega you can think of as the lower bound. 87 00:06:30,630 --> 00:06:36,550 >> So no matter how efficient your insertion sort algorithm is, 88 00:06:36,550 --> 00:06:41,810 regardless of the list that's passed in, it always has to compare at least n things 89 00:06:41,810 --> 00:06:44,620 or it has to iterate over n things. 90 00:06:44,620 --> 00:06:47,280 Why is that? 91 00:06:47,280 --> 00:06:51,190 [student] Because if the list is already sorted, then through the first iteration 92 00:06:51,190 --> 00:06:54,080 you can only guarantee that the very first element is the least, 93 00:06:54,080 --> 00:06:56,490 and the second iteration you can only guarantee the first two are 94 00:06:56,490 --> 00:07:00,010 because you don't know that the rest of the list is sorted. >>Yeah. 95 00:07:00,010 --> 00:07:08,910 If you pass in a completely sorted list, at the very least you have to go over all the elements 96 00:07:08,910 --> 00:07:12,180 to see that nothing needs to be moved around. 97 00:07:12,180 --> 00:07:14,720 So passing over the list and saying oh, this is already sorted, 98 00:07:14,720 --> 00:07:18,240 it's impossible for you to know it's sorted until you check each element 99 00:07:18,240 --> 00:07:20,660 to see that they are in sorted order. 100 00:07:20,660 --> 00:07:25,290 So the lower bound on insertion sort is Omega of n. 101 00:07:25,290 --> 00:07:28,210 What's the worst case running time of merge sort, 102 00:07:28,210 --> 00:07:31,390 worst case being big O again? 103 00:07:31,390 --> 00:07:37,660 So in the worst case scenario, how does merge sort run? 104 00:07:42,170 --> 00:07:43,690 [student] N log n. >>Yeah. 105 00:07:43,690 --> 00:07:49,990 The fastest general sorting algorithms are n log n. You can't do better. 106 00:07:51,930 --> 00:07:55,130 >> There are special cases, and if we have time today--but we probably won't-- 107 00:07:55,130 --> 00:07:59,330 we could see one that does better than n log n. 108 00:07:59,330 --> 00:08:04,050 But in the general case, you cannot do better than n log n. 109 00:08:04,050 --> 00:08:09,680 And merge sort happens to be the one you should know for this course that is n log n. 110 00:08:09,680 --> 00:08:13,260 And so we'll actually be implementing that today. 111 00:08:13,260 --> 00:08:18,070 And finally, in no more than three sentences, how does selection sort work? 112 00:08:18,070 --> 00:08:20,370 Does someone want to answer, and I'll count your sentences 113 00:08:20,370 --> 00:08:22,390 because if you go over 3-- 114 00:08:25,530 --> 00:08:28,330 Does anyone remember selection sort? 115 00:08:31,280 --> 00:08:37,809 Selection sort is usually pretty easy to remember just from the name. 116 00:08:37,809 --> 00:08:45,350 You just iterate over the array, find whatever the largest value is or smallest-- 117 00:08:45,350 --> 00:08:47,290 whatever order you're sorting in. 118 00:08:47,290 --> 00:08:50,750 So let's say we're sorting from smallest to greatest. 119 00:08:50,750 --> 00:08:55,250 You iterate over the array, looking for whatever the minimum element is, 120 00:08:55,250 --> 00:08:59,750 select it, and then just swap it whatever is in the first place. 121 00:08:59,750 --> 00:09:04,090 And then on the second pass over the array, look for the minimum element again, 122 00:09:04,090 --> 00:09:07,490 select it, and then swap it with what's in the second position. 123 00:09:07,490 --> 00:09:10,650 So we are just picking and choosing the minimum values 124 00:09:10,650 --> 00:09:16,050 and inserting them into the front of the array until it is sorted. 125 00:09:19,210 --> 00:09:21,560 Questions on that? 126 00:09:21,560 --> 00:09:25,710 >> These inevitably appear in the forms you have to fill out when you're submitting the pset. 127 00:09:29,010 --> 00:09:32,360 Those are basically the answers to those. 128 00:09:32,360 --> 00:09:34,230 Okay, so now coding problems. 129 00:09:34,230 --> 00:09:40,140 I already sent out over email-- Did anyone not get that email? Okay. 130 00:09:40,140 --> 00:09:46,630 I already sent out over email the space that we're going to be using, 131 00:09:46,630 --> 00:09:52,120 and if you click on my name--so I think I'm going to be at the bottom 132 00:09:52,120 --> 00:09:57,170 because of the backwards r--but if you click on my name you'll see 2 revisions. 133 00:09:57,170 --> 00:10:02,650 Revision 1 is going to be I already copied and pasted the code into Spaces 134 00:10:02,650 --> 00:10:06,900 for the search thing you're going to have to implement. 135 00:10:06,900 --> 00:10:10,540 And Revision 2 will be the sort thing that we implement after that. 136 00:10:10,540 --> 00:10:15,770 So you can click on my Revision 1 and work from there. 137 00:10:17,350 --> 00:10:22,060 And now we want to implement binary search. 138 00:10:22,060 --> 00:10:26,470 >> Does anyone want to just give a pseudocode high-level explanation 139 00:10:26,470 --> 00:10:31,440 of what we're going to have to do for search? Yeah. 140 00:10:31,440 --> 00:10:36,170 [student] You just take the middle of the array and see if what you're looking for 141 00:10:36,170 --> 00:10:38,650 is less than that or more than that. 142 00:10:38,650 --> 00:10:41,080 And if it's less, you go to the half that's less, 143 00:10:41,080 --> 00:10:44,750 and if it's more, you go to the half that's more and you repeat that until you just get one thing. 144 00:10:44,750 --> 00:10:46,570 [Bowden] Yeah. 145 00:10:46,570 --> 00:10:51,320 Notice that our numbers array is already sorted, 146 00:10:51,320 --> 00:10:57,190 and so that means that we can take advantage of that and we could first check, 147 00:10:57,190 --> 00:11:00,390 okay, I'm looking for the number 50. 148 00:11:00,390 --> 00:11:03,720 So I can go into the middle. 149 00:11:03,720 --> 00:11:07,380 Middle is hard to define when it's an even number of things, 150 00:11:07,380 --> 00:11:10,820 but let's just say we'll always truncate to the middle. 151 00:11:10,820 --> 00:11:14,420 So here we have 8 things, so the middle would be 16. 152 00:11:14,420 --> 00:11:17,330 I'm looking for 50, so 50 is greater than 16. 153 00:11:17,330 --> 00:11:21,310 So now I can basically treat my array as these elements. 154 00:11:21,310 --> 00:11:23,450 I can throw away everything from 16 over. 155 00:11:23,450 --> 00:11:27,440 Now my array is just these 4 elements, and I repeat. 156 00:11:27,440 --> 00:11:31,910 So then I want to find the middle again, which is going to be 42. 157 00:11:31,910 --> 00:11:34,730 42 is less than 50, so I can throw away these two elements. 158 00:11:34,730 --> 00:11:36,890 This is my remaining array. 159 00:11:36,890 --> 00:11:38,430 I'm going to find the middle again. 160 00:11:38,430 --> 00:11:42,100 I guess 50 was a bad example because I was always throwing away things to the left, 161 00:11:42,100 --> 00:11:48,280 but by the same measure, if I'm looking for something 162 00:11:48,280 --> 00:11:52,100 and it's less than the element I'm currently looking at, 163 00:11:52,100 --> 00:11:55,080 then I'm going to throw away everything to the right. 164 00:11:55,080 --> 00:11:58,150 So now we need to implement that. 165 00:11:58,150 --> 00:12:02,310 Notice that we need to pass in size. 166 00:12:02,310 --> 00:12:06,730 We can also not need to hard-code size. 167 00:12:06,730 --> 00:12:11,640 So if we get rid of that #define-- 168 00:12:19,630 --> 00:12:21,430 Okay. 169 00:12:21,430 --> 00:12:27,180 How can I nicely figure out what the size of the numbers array currently is? 170 00:12:27,180 --> 00:12:30,950 >> How many elements are in the numbers array? 171 00:12:30,950 --> 00:12:33,630 [student] Numbers, brackets, .length? 172 00:12:33,630 --> 00:12:36,600 [Bowden] That doesn't exist in C. 173 00:12:36,600 --> 00:12:38,580 Need .length. 174 00:12:38,580 --> 00:12:42,010 Arrays don't have properties, so there is no length property of arrays 175 00:12:42,010 --> 00:12:45,650 that will just give you however long it happens to be. 176 00:12:48,180 --> 00:12:51,620 [student] See how much memory it has and divide by how much-- >>Yeah. 177 00:12:51,620 --> 00:12:55,810 So how can we see how much memory it has? >>[student] Sizeof. >>Yeah, sizeof. 178 00:12:55,810 --> 00:13:01,680 Sizeof is the operator that's going to return the size of the numbers array. 179 00:13:01,680 --> 00:13:10,060 And that's going to be however many integers there are times the size of an integer 180 00:13:10,060 --> 00:13:14,050 since that's how much memory it's actually taking up. 181 00:13:14,050 --> 00:13:17,630 So if I want the number of things in the array, 182 00:13:17,630 --> 00:13:20,560 then I'm going to want to divide by the size of an integer. 183 00:13:22,820 --> 00:13:26,010 Okay. So that lets me pass in size here. 184 00:13:26,010 --> 00:13:29,430 Why do I need to pass in size at all? 185 00:13:29,430 --> 00:13:38,570 Why can't I just do up here int size = sizeof(haystack) / sizeof(int)? 186 00:13:38,570 --> 00:13:41,490 Why does this not work? 187 00:13:41,490 --> 00:13:44,470 [student] It's not a global variable. 188 00:13:44,470 --> 00:13:51,540 [Bowden] Haystack exists and we're passing in numbers as haystack, 189 00:13:51,540 --> 00:13:54,700 and this is kind of a foreshadowing of what's to come. Yeah. 190 00:13:54,700 --> 00:14:00,170 [student] Haystack is just the reference to it, so it would return how big that reference is. 191 00:14:00,170 --> 00:14:02,150 Yeah. 192 00:14:02,150 --> 00:14:09,000 I doubt in lecture that you've seen the stack yet really, right? 193 00:14:09,000 --> 00:14:11,270 We've just spoken about it. 194 00:14:11,270 --> 00:14:16,090 So the stack is where all of your variables are going to be stored. 195 00:14:16,090 --> 00:14:19,960 >> Any memory that's allocated for local variables is going in the stack, 196 00:14:19,960 --> 00:14:24,790 and each function gets its own space on the stack, its own stack frame is what it's called. 197 00:14:24,790 --> 00:14:31,590 So main has its stack frame, and inside of it is going to exist this numbers array, 198 00:14:31,590 --> 00:14:34,270 and it's going to be of size sizeof(numbers). 199 00:14:34,270 --> 00:14:38,180 It's going to have size of numbers divided by size of elements, 200 00:14:38,180 --> 00:14:42,010 but that all lives within main's stack frame. 201 00:14:42,010 --> 00:14:45,190 When we call search, search gets its own stack frame, 202 00:14:45,190 --> 00:14:48,840 its own space to store all of its local variables. 203 00:14:48,840 --> 00:14:56,420 But these arguments--so haystack isn't a copy of this entire array. 204 00:14:56,420 --> 00:15:00,990 We don't pass in the entire array as a copy into search. 205 00:15:00,990 --> 00:15:04,030 It just passes a reference to that array. 206 00:15:04,030 --> 00:15:11,470 So search can access these numbers through this reference. 207 00:15:11,470 --> 00:15:17,100 It's still accessing the things that live inside of main's stack frame, 208 00:15:17,100 --> 00:15:22,990 but basically, when we get to pointers, which should be soon, 209 00:15:22,990 --> 00:15:24,980 this is what pointers are. 210 00:15:24,980 --> 00:15:29,400 Pointers are just references to things, and you can use pointers to access things 211 00:15:29,400 --> 00:15:32,030 that are in other things' stack frames. 212 00:15:32,030 --> 00:15:37,660 So even though numbers is local to main, we can still access it through this pointer. 213 00:15:37,660 --> 00:15:41,770 But since it's just a pointer and it's just a reference, 214 00:15:41,770 --> 00:15:45,040 sizeof(haystack) just returns the size of the reference itself. 215 00:15:45,040 --> 00:15:47,950 It doesn't return the size of the thing it's pointing to. 216 00:15:47,950 --> 00:15:51,110 It doesn't return the actual size of numbers. 217 00:15:51,110 --> 00:15:55,660 And so this isn't going to work as we want it to. 218 00:15:55,660 --> 00:15:57,320 >> Questions on that? 219 00:15:57,320 --> 00:16:03,250 Pointers will be gone into in significantly more gory detail in weeks to come. 220 00:16:06,750 --> 00:16:13,740 And this is why a lot of things that you see, most search things or sort things, 221 00:16:13,740 --> 00:16:16,990 they're almost all going to need to take the actual size of the array, 222 00:16:16,990 --> 00:16:20,440 because in C, we have no idea what the size of the array is. 223 00:16:20,440 --> 00:16:22,720 You need to manually pass it in. 224 00:16:22,720 --> 00:16:27,190 And you can't manually pass in the entire array because you're just passing in the reference 225 00:16:27,190 --> 00:16:30,390 and it can't get the size from the reference. 226 00:16:30,390 --> 00:16:32,300 Okay. 227 00:16:32,300 --> 00:16:38,160 So now we want to implement what was explained before. 228 00:16:38,160 --> 00:16:41,530 You can work on it for a minute, 229 00:16:41,530 --> 00:16:45,250 and you don't have to worry about getting everything perfectly 100% working. 230 00:16:45,250 --> 00:16:51,410 Just write up the half pseudocode for how you think it should work. 231 00:16:52,000 --> 00:16:53,630 Okay. 232 00:16:53,630 --> 00:16:56,350 No need to be completely done with this yet. 233 00:16:56,350 --> 00:17:02,380 But does anyone feel comfortable with what they have so far, 234 00:17:02,380 --> 00:17:05,599 like something we can work with together? 235 00:17:05,599 --> 00:17:09,690 Does anyone want to volunteer? Or I will randomly pick. 236 00:17:12,680 --> 00:17:18,599 It doesn't have to be right by any measure but something we can modify into a working state. 237 00:17:18,599 --> 00:17:20,720 [student] Sure. >>Okay. 238 00:17:20,720 --> 00:17:27,220 So can you save the revision by clicking on the little Save icon. 239 00:17:27,220 --> 00:17:29,950 You're Ramya, right? >>[student] Yeah. >>[Bowden] Okay. 240 00:17:29,950 --> 00:17:35,140 So now I can view your revision and everyone can pull up the revision. 241 00:17:35,140 --> 00:17:38,600 And here we have-- Okay. 242 00:17:38,600 --> 00:17:43,160 So Ramya went with the recursive solution, which is definitely a valid solution. 243 00:17:43,160 --> 00:17:44,970 There are two ways you can do this problem. 244 00:17:44,970 --> 00:17:48,060 You can either do it iteratively or recursively. 245 00:17:48,060 --> 00:17:53,860 Most problems you encounter that can be done recursively can also be done iteratively. 246 00:17:53,860 --> 00:17:58,510 So here we've done it recursively. 247 00:17:58,510 --> 00:18:03,730 >> Does someone want to define what it means to make a function recursive? 248 00:18:07,210 --> 00:18:08,920 [student] When you have a function call itself 249 00:18:08,920 --> 00:18:13,470 and then call itself until it comes out with true and true. >>Yeah. 250 00:18:13,470 --> 00:18:17,680 A recursive function is just a function which calls itself. 251 00:18:17,680 --> 00:18:24,140 There are three big things that a recursive function must have. 252 00:18:24,140 --> 00:18:27,310 The first is obviously, it calls itself. 253 00:18:27,310 --> 00:18:29,650 The second is the base case. 254 00:18:29,650 --> 00:18:33,390 So at some point the function needs to stop calling itself, 255 00:18:33,390 --> 00:18:35,610 and that's what the base case is for. 256 00:18:35,610 --> 00:18:43,860 So here we know that we should stop, we should give up in our search 257 00:18:43,860 --> 00:18:48,150 when start equals end--and we'll go over what that means. 258 00:18:48,150 --> 00:18:52,130 But finally, the last thing that's important for recursive functions: 259 00:18:52,130 --> 00:18:59,250 the functions must somehow approach the base case. 260 00:18:59,250 --> 00:19:04,140 Like if you're not actually updating anything when you make the second recursive call, 261 00:19:04,140 --> 00:19:07,880 if you're literally just calling the function again with the same arguments 262 00:19:07,880 --> 00:19:10,130 and no global variables have changed or anything, 263 00:19:10,130 --> 00:19:14,430 you will never reach the base case, in which case that's bad. 264 00:19:14,430 --> 00:19:17,950 It will be an infinite recursion and a stack overflow. 265 00:19:17,950 --> 00:19:24,330 But here we see that the update is happening since we are updating start + end/2, 266 00:19:24,330 --> 00:19:28,180 we're updating the end argument here, we're updating the start argument here. 267 00:19:28,180 --> 00:19:32,860 So in all recursive calls we are updating something. Okay. 268 00:19:32,860 --> 00:19:38,110 Do you want to walk us through your solution? >>Sure. 269 00:19:38,110 --> 00:19:44,270 I'm using SearchHelp so that every time I make this function call 270 00:19:44,270 --> 00:19:47,910 I have the start of where I'm looking for in the array and the end 271 00:19:47,910 --> 00:19:49,380 of where I'm looking the array. 272 00:19:49,380 --> 00:19:55,330 >> At every step where it's saying it's the middle element, which is start + end/2, 273 00:19:55,330 --> 00:19:58,850 is that equal to what we're looking for? 274 00:19:58,850 --> 00:20:04,650 And if it is, then we found it, and I guess that gets passed up the levels of recursion. 275 00:20:04,650 --> 00:20:12,540 And if that's not true, then we check whether that middle value of the array is too big, 276 00:20:12,540 --> 00:20:19,320 in which case we look at the left half of the array by going from start to the middle index. 277 00:20:19,320 --> 00:20:22,710 And otherwise we do the end half. 278 00:20:22,710 --> 00:20:24,740 [Bowden] Okay. 279 00:20:24,740 --> 00:20:27,730 That sounds good. 280 00:20:27,730 --> 00:20:36,640 Okay, so a couple things, and actually, this is a very high-level thing 281 00:20:36,640 --> 00:20:41,270 that you will never need to know for this course, but it is true. 282 00:20:41,270 --> 00:20:46,080 Recursive functions, you always hear that they're a bad deal 283 00:20:46,080 --> 00:20:51,160 because if you recursively call yourself too many times, you get stack overflow 284 00:20:51,160 --> 00:20:54,990 since, as I said before, every function gets its own stack frame. 285 00:20:54,990 --> 00:20:59,500 So each call of the recursive function gets its own stack frame. 286 00:20:59,500 --> 00:21:04,140 So if you make 1,000 recursive calls, you get 1,000 stack frames, 287 00:21:04,140 --> 00:21:08,390 and quickly you lead to having too many stack frames and things just break. 288 00:21:08,390 --> 00:21:13,480 So that's why recursive functions are generally bad. 289 00:21:13,480 --> 00:21:19,370 But there is a nice subset of recursive functions called tail-recursive functions, 290 00:21:19,370 --> 00:21:26,120 and this happens to be an example of one where if the compiler notices this 291 00:21:26,120 --> 00:21:29,920 and it should, I think--in Clang if you pass it the -O2 flag 292 00:21:29,920 --> 00:21:33,250 then it will notice this is tail recursive and make things good. 293 00:21:33,250 --> 00:21:40,050 >> It will reuse the same stack frame over and over again for each recursive call. 294 00:21:40,050 --> 00:21:47,010 And so since you're using the same stack frame, you do not need to worry about 295 00:21:47,010 --> 00:21:51,690 ever stack overflowing, and at the same time, like you said before, 296 00:21:51,690 --> 00:21:56,380 where once you return true, then it has to return up all of these stack frames 297 00:21:56,380 --> 00:22:01,740 and the 10th call to SearchHelp has to return to the 9th, has to return to the 8th. 298 00:22:01,740 --> 00:22:05,360 So that doesn't need to happen when functions are tail recursive. 299 00:22:05,360 --> 00:22:13,740 And so what makes this function tail recursive is notice that for any given call to searchHelp 300 00:22:13,740 --> 00:22:18,470 the recursive call that it's making is what it's returning. 301 00:22:18,470 --> 00:22:25,290 So in the first call to SearchHelp, we either immediately return false, 302 00:22:25,290 --> 00:22:29,590 immediately return true, or we make a recursive call to SearchHelp 303 00:22:29,590 --> 00:22:33,810 where what we're returning is what that call is returning. 304 00:22:33,810 --> 00:22:51,560 And we could not do this if we did something like int x = SearchHelp, return x * 2, 305 00:22:51,560 --> 00:22:55,440 just some random change. 306 00:22:55,440 --> 00:23:01,470 >> So now this recursive call, this int x = SearchHelp recursive call, 307 00:23:01,470 --> 00:23:05,740 is no longer tail recursive because it actually does have to return 308 00:23:05,740 --> 00:23:10,400 back to a previous stack frame so that that previous call to the function 309 00:23:10,400 --> 00:23:13,040 can then do something with the return value. 310 00:23:13,040 --> 00:23:22,190 So this is not tail recursive, but what we had before is nicely tail recursive. Yeah. 311 00:23:22,190 --> 00:23:27,000 [student] Shouldn't the second base case be checked first 312 00:23:27,000 --> 00:23:30,640 because there could be a situation where when you pass it the argument 313 00:23:30,640 --> 00:23:35,770 you have start = end, but they are the needle value. 314 00:23:35,770 --> 00:23:47,310 The question was can't we run into the case where end is the needle value 315 00:23:47,310 --> 00:23:52,000 or start = end, appropriately, start = end 316 00:23:52,000 --> 00:23:59,480 and you haven't actually checked that particular value yet, 317 00:23:59,480 --> 00:24:03,910 then start + end/2 is just going to be the same value. 318 00:24:03,910 --> 00:24:07,890 But we've already returned false and we never actually checked the value. 319 00:24:07,890 --> 00:24:14,240 So at the very least, in the first call, if size is 0, then we want to return false. 320 00:24:14,240 --> 00:24:17,710 But if size is 1, then start isn't going to equal end, 321 00:24:17,710 --> 00:24:19,820 and we'll at least check the one element. 322 00:24:19,820 --> 00:24:26,750 But I think you are right in that we can end up in a case where start + end/2, 323 00:24:26,750 --> 00:24:31,190 start ends up being the same as start + end/2, 324 00:24:31,190 --> 00:24:35,350 but we never actually checked that element. 325 00:24:35,350 --> 00:24:44,740 >> So if we first check is the middle element the value we're looking for, 326 00:24:44,740 --> 00:24:47,110 then we can immediately return true. 327 00:24:47,110 --> 00:24:50,740 Else if they're equal, then there's no point in continuing 328 00:24:50,740 --> 00:24:58,440 since we're just going to update to a case where we are on a single-element array. 329 00:24:58,440 --> 00:25:01,110 If that single element is not the one we're looking for, 330 00:25:01,110 --> 00:25:03,530 then everything is wrong. Yeah. 331 00:25:03,530 --> 00:25:08,900 [student] The thing is that since size is actually larger than the number of elements in the array, 332 00:25:08,900 --> 00:25:13,070 there is already an offset-- >>So will size-- 333 00:25:13,070 --> 00:25:19,380 [student] Say if the array was size 0, then the SearchHelp will actually check haystack of 0 334 00:25:19,380 --> 00:25:21,490 on the first call. 335 00:25:21,490 --> 00:25:25,300 The array has size 0, so the 0 is-- >>Yeah. 336 00:25:25,300 --> 00:25:30,750 There's another thing that--it might be good. Let's think. 337 00:25:30,750 --> 00:25:40,120 So if the array had 10 elements and the middle one we're going to check is index 5, 338 00:25:40,120 --> 00:25:45,700 so we're checking 5, and let's say that the value is less. 339 00:25:45,700 --> 00:25:50,720 So we're throwing everything away from 5 onward. 340 00:25:50,720 --> 00:25:54,030 So start + end/2 is going to be our new end, 341 00:25:54,030 --> 00:25:57,450 so yeah, it's always going to stay beyond the end of the array. 342 00:25:57,450 --> 00:26:03,570 If it's a case if it was even or odd, then we would check, say, 4, 343 00:26:03,570 --> 00:26:05,770 but we're still throwing away-- 344 00:26:05,770 --> 00:26:13,500 So yeah, end is always going to be beyond the actual end of the array. 345 00:26:13,500 --> 00:26:18,350 So the elements we are focusing on, end is always going to be the one after that. 346 00:26:18,350 --> 00:26:24,270 And so if start does ever equal end, we are in an array of size 0. 347 00:26:24,270 --> 00:26:35,600 >> The other thing I was thinking is we are updating start to be start + end/2, 348 00:26:35,600 --> 00:26:44,020 so this is the case that I'm having trouble with, where start + end/2 349 00:26:44,020 --> 00:26:46,820 is the element we're checking. 350 00:26:46,820 --> 00:26:58,150 Let's say we had this 10-element array. Whatever. 351 00:26:58,150 --> 00:27:03,250 So start + end/2 is going to be something like this one, 352 00:27:03,250 --> 00:27:07,060 and if that's not the value, say we want to update. 353 00:27:07,060 --> 00:27:10,060 The value is greater, so we want to look at this half of the array. 354 00:27:10,060 --> 00:27:15,910 So how we're updating start, we're updating start to now be this element. 355 00:27:15,910 --> 00:27:23,790 But this may still work, or at the very least, you can do start + end/2 + 1. 356 00:27:23,790 --> 00:27:27,850 [student] You don't need to be start + end [inaudible] >>Yeah. 357 00:27:27,850 --> 00:27:33,240 We have already checked this element and know it's not the one we're looking for. 358 00:27:33,240 --> 00:27:36,800 So we don't need to update start to be this element. 359 00:27:36,800 --> 00:27:39,560 We can just skip it and update start to be this element. 360 00:27:39,560 --> 00:27:46,060 And is there ever a case, let's say, that this were end, 361 00:27:46,060 --> 00:27:53,140 so then start would be this, start + end/2 would be this, 362 00:27:53,140 --> 00:28:00,580 start + end-- Yeah, I think it can end up in infinite recursion. 363 00:28:00,580 --> 00:28:12,690 Let's say it's just an array of size 2 or an array of size 1. I think this will work. 364 00:28:12,690 --> 00:28:19,490 So currently, start is that element and end is 1 beyond it. 365 00:28:19,490 --> 00:28:24,110 So the element that we're going to check is this one, 366 00:28:24,110 --> 00:28:29,400 and then when we update start, we're updating start to be 0 + 1/2, 367 00:28:29,400 --> 00:28:33,160 which is going to end us back with start being this element. 368 00:28:33,160 --> 00:28:36,210 >> So we're checking the same element over and over again. 369 00:28:36,210 --> 00:28:43,310 So this is the case where every recursive call must actually update something. 370 00:28:43,310 --> 00:28:48,370 So we need to do start + end/2 + 1, or else there's a case 371 00:28:48,370 --> 00:28:50,710 where we're not actually updating start. 372 00:28:50,710 --> 00:28:53,820 Everyone see that? 373 00:28:53,820 --> 00:28:56,310 Okay. 374 00:28:56,310 --> 00:29:03,860 Does anyone have questions on this solution or any more comments? 375 00:29:05,220 --> 00:29:10,280 Okay. Does anyone have an iterative solution that we can all look at? 376 00:29:17,400 --> 00:29:20,940 Did we all do it recursively? 377 00:29:20,940 --> 00:29:25,950 Or also I guess if you opened hers, then you might have overridden your previous one. 378 00:29:25,950 --> 00:29:28,810 Does it automatically save? I'm not positive. 379 00:29:35,090 --> 00:29:39,130 Does anyone have iterative? 380 00:29:39,130 --> 00:29:42,430 We can walk through it together if not. 381 00:29:46,080 --> 00:29:48,100 The idea is going to be the same. 382 00:30:00,260 --> 00:30:02,830 Iterative solution. 383 00:30:02,830 --> 00:30:07,140 We're going to want to basically do the same idea 384 00:30:07,140 --> 00:30:16,530 where we want to keep track of the new end of the array and the new start of the array 385 00:30:16,530 --> 00:30:18,510 and do that over and over. 386 00:30:18,510 --> 00:30:22,430 And if what we're keeping track of as the start and the end ever intersect, 387 00:30:22,430 --> 00:30:29,020 then we did not find it and we can return false. 388 00:30:29,020 --> 00:30:37,540 So how do I do that? Anyone have suggestions or code for me to pull up? 389 00:30:42,190 --> 00:30:47,450 [student] Do a while loop. >>Yeah. 390 00:30:47,450 --> 00:30:49,450 You are going to want to do a loop. 391 00:30:49,450 --> 00:30:51,830 >> Did you have code I could pull up, or what were you going to suggest? 392 00:30:51,830 --> 00:30:56,340 [student] I think so. >>All right. This makes things easier. What was your name? 393 00:30:56,340 --> 00:30:57,890 [student] Lucas. 394 00:31:00,140 --> 00:31:04,190 Revision 1. Okay. 395 00:31:04,190 --> 00:31:13,200 Low is what we called start before. 396 00:31:13,200 --> 00:31:17,080 Up is not quite what we called end before. 397 00:31:17,080 --> 00:31:22,750 Actually, end is now within the array. It's an element we should consider. 398 00:31:22,750 --> 00:31:26,890 So low is 0, up is the size of the array - 1, 399 00:31:26,890 --> 00:31:34,780 and now we are looping, and we are checking-- 400 00:31:34,780 --> 00:31:37,340 I guess you can walk through it. 401 00:31:37,340 --> 00:31:41,420 What was your thinking through this? Walk us through your code. 402 00:31:41,420 --> 00:31:49,940 [student] Sure. Look at the haystack value in the middle and compare it to needle. 403 00:31:49,940 --> 00:31:58,520 So if it's greater than your needle, then you want to--oh, actually, that should be backwards. 404 00:31:58,520 --> 00:32:05,180 You're going to want to throw away the right half, and so yeah, that should be the way. 405 00:32:05,180 --> 00:32:08,830 [Bowden] So this should be less? Is that what you said? >>[student] Yeah. 406 00:32:08,830 --> 00:32:10,390 [Bowden] Okay. Less. 407 00:32:10,390 --> 00:32:15,700 So if what we're looking at is smaller than what we want, 408 00:32:15,700 --> 00:32:19,410 then yeah, we want to throw away the left half, 409 00:32:19,410 --> 00:32:22,210 which means we are updating everything we're considering 410 00:32:22,210 --> 00:32:26,610 by moving low to the right of the array. 411 00:32:26,610 --> 00:32:30,130 This looks good. 412 00:32:30,130 --> 00:32:34,550 I think it has the same issue that we said on the previous one, 413 00:32:34,550 --> 00:32:49,760 where if low is 0 and up is 1, then low + up/2 is going to set up to be the same thing again. 414 00:32:49,760 --> 00:32:53,860 >> And even if that's not the case, it is still more efficient at the very least 415 00:32:53,860 --> 00:32:57,630 to just throw away the element we just looked at which we know is wrong. 416 00:32:57,630 --> 00:33:03,240 So low + up/2 + 1-- >>[student] That should be the other way. 417 00:33:03,240 --> 00:33:05,900 [Bowden] Or this should be - 1 and the other one should be + 1. 418 00:33:05,900 --> 00:33:09,580 [student] And there should be a double equals sign. >>[Bowden] Yeah. 419 00:33:09,580 --> 00:33:11,340 [student] Yeah. 420 00:33:14,540 --> 00:33:15,910 Okay. 421 00:33:15,910 --> 00:33:21,280 And finally, now that we have this + 1 - 1 thing, 422 00:33:21,280 --> 00:33:31,520 is it--it might not be--is it ever possible for low to end up with a value greater than up? 423 00:33:35,710 --> 00:33:40,320 I think the only way that can happen-- Is it possible? >>[student] I don't know. 424 00:33:40,320 --> 00:33:45,220 But if it gets truncated and then gets minus that 1 and then-- >>Yeah. 425 00:33:45,220 --> 00:33:47,480 [student] It would possibly get messed up. 426 00:33:49,700 --> 00:33:53,940 I think it should be good only because 427 00:33:53,940 --> 00:33:58,800 for it to end up lower they would have to be equal, I think. 428 00:33:58,800 --> 00:34:03,070 But if they are equal, then we wouldn't have done the while loop to begin with 429 00:34:03,070 --> 00:34:06,670 and we just would have returned the value. So I think we're good now. 430 00:34:06,670 --> 00:34:11,530 Notice that even though this problem is no longer recursive, 431 00:34:11,530 --> 00:34:17,400 the same kind of ideas apply where we can see how this so easily lends itself 432 00:34:17,400 --> 00:34:23,659 to a recursive solution by the fact that we're just updating the indices over and over again, 433 00:34:23,659 --> 00:34:29,960 we're making the problem smaller and smaller, we're focusing on a subset of the array. 434 00:34:29,960 --> 00:34:40,860 [student] If low is 0 and up is 1, they would both be 0 + 1/2, which would go to 0, 435 00:34:40,860 --> 00:34:44,429 and then one would be + 1, one would be - 1. 436 00:34:47,000 --> 00:34:50,870 [student] Where are we checking equality? 437 00:34:50,870 --> 00:34:55,100 Like if the middle one is actually needle? 438 00:34:55,100 --> 00:34:58,590 We're not currently doing that? Oh! 439 00:35:00,610 --> 00:35:02,460 If it's-- 440 00:35:05,340 --> 00:35:13,740 Yes. We can't just do the test down here because let's say the first middle-- 441 00:35:13,740 --> 00:35:16,430 [student] It's actually like not throw away the bound. 442 00:35:16,430 --> 00:35:20,220 So if you throw away the bound, you have to check it first or whatever. 443 00:35:20,220 --> 00:35:23,350 Ah. Yeah. >>[student] Yeah. 444 00:35:23,350 --> 00:35:29,650 So now we have thrown away the one we currently looked at, 445 00:35:29,650 --> 00:35:33,260 which means we now need to also have 446 00:35:33,260 --> 00:35:44,810 if (haystack[(low+up)/2] == needle), then we can return true. 447 00:35:44,810 --> 00:35:52,070 And whether I put else or just if, it means literally the same thing 448 00:35:52,070 --> 00:35:57,110 because this would have returned true. 449 00:35:57,110 --> 00:36:01,450 So I'll put else if, but it doesn't matter. 450 00:36:01,450 --> 00:36:10,440 >> So else if this, else this, and this is a common thing I do 451 00:36:10,440 --> 00:36:14,340 where even if it is the case where everything is good here, 452 00:36:14,340 --> 00:36:22,780 like low can never be greater than up, it's not worth reasoning about whether that's true. 453 00:36:22,780 --> 00:36:28,010 So you may as well say while low is less than or equal to 454 00:36:28,010 --> 00:36:30,720 or while low is less than 455 00:36:30,720 --> 00:36:35,300 so if they are ever equal or low happens to pass up, 456 00:36:35,300 --> 00:36:40,130 then we can break out of this loop. 457 00:36:41,410 --> 00:36:44,630 Questions, concerns, comments? 458 00:36:47,080 --> 00:36:49,270 Okay. This looks good. 459 00:36:49,270 --> 00:36:52,230 Now we want to do sort. 460 00:36:52,230 --> 00:37:04,030 If we go to my second revision, we see these same numbers, 461 00:37:04,030 --> 00:37:07,550 but now they are no longer in sorted order. 462 00:37:07,550 --> 00:37:12,840 And we want to implement sort using any algorithm in O of n log n. 463 00:37:12,840 --> 00:37:17,240 So which algorithm do you think we should implement here? >>[student] Merge sort. 464 00:37:17,240 --> 00:37:23,810 [Bowden] Yeah. Merge sort is O(n log n), so that's what we're going to do. 465 00:37:23,810 --> 00:37:26,680 And the problem is going to be pretty similar, 466 00:37:26,680 --> 00:37:31,920 where it easily lends itself to a recursive solution. 467 00:37:31,920 --> 00:37:35,580 We can also come up with an iterative solution if we want, 468 00:37:35,580 --> 00:37:42,540 but recursion will be easier here and we should do recursion. 469 00:37:45,120 --> 00:37:49,530 I guess we will walk through merge sort first, 470 00:37:49,530 --> 00:37:54,280 although there is a lovely video on merge sort already. [laughter] 471 00:37:54,280 --> 00:37:59,780 So merge sort there are--I am wasting so much of this paper. 472 00:37:59,780 --> 00:38:02,080 Oh, there's only one left. 473 00:38:02,080 --> 00:38:03,630 So merge. 474 00:38:08,190 --> 00:38:12,470 Oh, 1, 3, 5. 475 00:38:26,090 --> 00:38:27,440 Okay. 476 00:38:29,910 --> 00:38:33,460 Merge takes two separate arrays. 477 00:38:33,460 --> 00:38:36,780 Individually those two arrays are both sorted. 478 00:38:36,780 --> 00:38:40,970 So this array, 1, 3, 5, sorted. This array, 0, 2, 4, sorted. 479 00:38:40,970 --> 00:38:46,710 Now what merge should do is combine them into a single array that is itself sorted. 480 00:38:46,710 --> 00:38:57,130 So we want an array of size 6 that is going to have these elements inside of it 481 00:38:57,130 --> 00:38:59,390 in sorted order. 482 00:38:59,390 --> 00:39:03,390 >> And so we can take advantage of the fact that these two arrays are sorted 483 00:39:03,390 --> 00:39:06,800 to do this in linear time, 484 00:39:06,800 --> 00:39:13,510 linear time meaning if this array is size x and this is size y, 485 00:39:13,510 --> 00:39:20,970 then the total algorithm should be O(x+y). Okay. 486 00:39:20,970 --> 00:39:23,150 So suggestions. 487 00:39:23,150 --> 00:39:26,030 [student] Could we start from the left? 488 00:39:26,030 --> 00:39:30,150 So you'll put the 0 down first and then the 1 and then here you're at the 2. 489 00:39:30,150 --> 00:39:33,320 So it's kind of like you have a tab that's moving to the right. >>[Bowden] Yeah. 490 00:39:33,320 --> 00:39:41,070 For both of these arrays if we just focus on the leftmost element. 491 00:39:41,070 --> 00:39:43,530 Because both arrays are sorted, we know that these 2 elements 492 00:39:43,530 --> 00:39:46,920 are the smallest elements in either array. 493 00:39:46,920 --> 00:39:53,500 So that means that 1 of those 2 elements must be the smallest element in our merged array. 494 00:39:53,500 --> 00:39:58,190 It just so happens that the smallest is the one on the right this time. 495 00:39:58,190 --> 00:40:02,580 So we take 0, insert it on the left because 0 is less than 1, 496 00:40:02,580 --> 00:40:08,210 so take 0, insert it into our first position, and then we update this 497 00:40:08,210 --> 00:40:12,070 to now focus on the first element. 498 00:40:12,070 --> 00:40:14,570 And now we repeat. 499 00:40:14,570 --> 00:40:20,670 So now we compare 2 and 1. 1 is smaller, so we'll insert 1. 500 00:40:20,670 --> 00:40:25,300 We update this pointer to point to this guy. 501 00:40:25,300 --> 00:40:33,160 Now we do it again, so 2. This will update, compare these 2, 3. 502 00:40:33,160 --> 00:40:37,770 This updates, then 4 and 5. 503 00:40:37,770 --> 00:40:42,110 So that is merge. 504 00:40:42,110 --> 00:40:49,010 >> It should be pretty obvious that it is linear time since we just go across each element once. 505 00:40:49,010 --> 00:40:55,980 And that is the biggest step to implementing merge sort is doing this. 506 00:40:55,980 --> 00:40:59,330 And it's not that hard. 507 00:40:59,330 --> 00:41:15,020 A couple things to worry about is let's say we were merging 1, 2, 3, 4, 5, 6. 508 00:41:15,020 --> 00:41:30,930 In this case we end up in the scenario where this one is going to be smaller, 509 00:41:30,930 --> 00:41:36,160 then we update this pointer, this one is going to be smaller, update this, 510 00:41:36,160 --> 00:41:41,280 this one's smaller, and now you have to recognize 511 00:41:41,280 --> 00:41:44,220 when you've actually run out of elements to compare with. 512 00:41:44,220 --> 00:41:49,400 Since we have already used this entire array, 513 00:41:49,400 --> 00:41:55,190 everything in this array is now just inserted into here. 514 00:41:55,190 --> 00:42:02,040 So if we ever run into the point where one of our arrays is completely merged already, 515 00:42:02,040 --> 00:42:06,510 then we just take all the elements of the other array and insert them into the end of the array. 516 00:42:06,510 --> 00:42:13,630 So we can just insert 4, 5, 6. Okay. 517 00:42:13,630 --> 00:42:18,070 That is one thing to watch out for. 518 00:42:22,080 --> 00:42:26,120 Implementing that should be step 1. 519 00:42:26,120 --> 00:42:32,600 Merge sort then based on that, it's 2 steps, 2 silly steps. 520 00:42:38,800 --> 00:42:42,090 Let's just give this array. 521 00:42:57,920 --> 00:43:05,680 So merge sort, step 1 is to recursively break the array into halves. 522 00:43:05,680 --> 00:43:09,350 So split this array into halves. 523 00:43:09,350 --> 00:43:22,920 We now have 4, 15, 16, 50 and 8, 23, 42, 108. 524 00:43:22,920 --> 00:43:25,800 And now we do it again and we split these into halves. 525 00:43:25,800 --> 00:43:27,530 I'll just do it on this side. 526 00:43:27,530 --> 00:43:34,790 So 4, 15 and 16, 50. 527 00:43:34,790 --> 00:43:37,440 We would do the same thing over here. 528 00:43:37,440 --> 00:43:40,340 And now we split it into halves again. 529 00:43:40,340 --> 00:43:51,080 And we have 4, 15, 16, 50. 530 00:43:51,080 --> 00:43:53,170 So that is our base case. 531 00:43:53,170 --> 00:44:00,540 Once the arrays are of size 1, then we stop with the splitting into halves. 532 00:44:00,540 --> 00:44:03,190 >> Now what do we do with this? 533 00:44:03,190 --> 00:44:15,730 We end up this will also break down into 8, 23, 42, and 108. 534 00:44:15,730 --> 00:44:24,000 So now that we're at this point, now step two of merge sort is just merging pairs to the lists. 535 00:44:24,000 --> 00:44:27,610 So we want to merge these. We just call merge. 536 00:44:27,610 --> 00:44:31,410 We know merge will return these in sorted order. 537 00:44:31,410 --> 00:44:33,920 4, 15. 538 00:44:33,920 --> 00:44:41,440 Now we want to merge these, and that will return a list with those in sorted order, 539 00:44:41,440 --> 00:44:44,160 16, 50. 540 00:44:44,160 --> 00:44:57,380 We merge those--I cannot write--8, 23 and 42, 108. 541 00:44:57,380 --> 00:45:02,890 So we have merged pairs once. 542 00:45:02,890 --> 00:45:05,140 Now we just merge again. 543 00:45:05,140 --> 00:45:10,130 Notice that each of these lists is sorted in itself, 544 00:45:10,130 --> 00:45:15,220 and then we can just merge these lists to get a list of size 4 which is sorted 545 00:45:15,220 --> 00:45:19,990 and merge these two lists to get a list of size 4 that is sorted. 546 00:45:19,990 --> 00:45:25,710 And finally, we can merge those two lists of size 4 to get one list of size 8 that is sorted. 547 00:45:25,710 --> 00:45:34,030 So to see that this is overall n log n, we already saw that merge is linear, 548 00:45:34,030 --> 00:45:40,390 so when we're dealing with merging these, so like the overall cost of merge 549 00:45:40,390 --> 00:45:43,410 for these two lists is just 2 because-- 550 00:45:43,410 --> 00:45:49,610 Or well, it's O of n, but n here is just these 2 elements, so it's 2. 551 00:45:49,610 --> 00:45:52,850 And these 2 will be 2 and these 2 will be 2 and these 2 will be 2, 552 00:45:52,850 --> 00:45:58,820 so across all of the merges that we need to do, we end up doing n. 553 00:45:58,820 --> 00:46:03,210 Like 2 + 2 + 2 + 2 is 8, which is n, 554 00:46:03,210 --> 00:46:08,060 so the cost of merging in this set is n. 555 00:46:08,060 --> 00:46:10,810 And then the same thing here. 556 00:46:10,810 --> 00:46:16,980 We'll merge these 2, then these 2, and individually this merge will take four operations, 557 00:46:16,980 --> 00:46:23,610 this merge will take four operations, but once again, between all of these, 558 00:46:23,610 --> 00:46:29,030 we end up merging n total things, and so this step takes n. 559 00:46:29,030 --> 00:46:33,670 And so each level takes n elements being merged. 560 00:46:33,670 --> 00:46:36,110 >> And how many levels are there? 561 00:46:36,110 --> 00:46:40,160 At each level, our array grows by size 2. 562 00:46:40,160 --> 00:46:44,590 Here our arrays are of size 1, here they're of size 2, here they're of size 4, 563 00:46:44,590 --> 00:46:46,470 and finally, they're of size 8. 564 00:46:46,470 --> 00:46:56,450 So since it is doubling, there are going to be a total of log n of these levels. 565 00:46:56,450 --> 00:47:02,090 So with log n levels, each individual level taking n total operations, 566 00:47:02,090 --> 00:47:05,720 we get an n log n algorithm. 567 00:47:05,720 --> 00:47:07,790 Questions? 568 00:47:08,940 --> 00:47:13,320 Have people already made progress on how to implement this? 569 00:47:13,320 --> 00:47:18,260 Is anyone already in a state where I can just pull up their code? 570 00:47:20,320 --> 00:47:22,260 I can give a minute. 571 00:47:24,770 --> 00:47:27,470 This one is going to be longer. 572 00:47:27,470 --> 00:47:28,730 I highly recommend recur-- 573 00:47:28,730 --> 00:47:30,510 You don't have to do recursion for merge 574 00:47:30,510 --> 00:47:33,750 because to do recursion for merge, you're going to have to pass a bunch of different sizes. 575 00:47:33,750 --> 00:47:37,150 You can, but it's annoying. 576 00:47:37,150 --> 00:47:43,720 But recursion for sort itself is pretty easy. 577 00:47:43,720 --> 00:47:49,190 You just literally call sort on left half, sort on right half. Okay. 578 00:47:51,770 --> 00:47:54,860 Anyone have anything I can pull up yet? 579 00:47:54,860 --> 00:47:57,540 Or else I'll give a minute. 580 00:47:58,210 --> 00:47:59,900 Okay. 581 00:47:59,900 --> 00:48:02,970 Anyone have something we can work with? 582 00:48:05,450 --> 00:48:09,680 Or else we'll just work with this and then expand from there. 583 00:48:09,680 --> 00:48:14,050 >> Anyone have more than this that I can pull up? 584 00:48:14,050 --> 00:48:17,770 [student] Yeah. You can pull up mine. >>All right. 585 00:48:17,770 --> 00:48:19,730 Yes! 586 00:48:22,170 --> 00:48:25,280 [student] There were a lot of conditions. >>Oh, shoot. Can you-- 587 00:48:25,280 --> 00:48:28,110 [student] I have to save it. >>Yeah. 588 00:48:32,420 --> 00:48:35,730 So we did do the merge separately. 589 00:48:35,730 --> 00:48:38,570 Oh, but it's not that bad. 590 00:48:39,790 --> 00:48:41,650 Okay. 591 00:48:41,650 --> 00:48:47,080 So sort is itself just calling mergeSortHelp. 592 00:48:47,080 --> 00:48:49,530 Explain to us what mergeSortHelp does. 593 00:48:49,530 --> 00:48:55,700 [student] MergeSortHelp pretty much does the two main steps, 594 00:48:55,700 --> 00:49:01,270 which is to sort each half of the array and then to merge both of them. 595 00:49:04,960 --> 00:49:08,050 [Bowden] Okay, so give me a second. 596 00:49:10,850 --> 00:49:13,210 I think this-- >>[student] I need to-- 597 00:49:17,100 --> 00:49:19,400 Yeah. I'm missing something. 598 00:49:19,400 --> 00:49:23,100 In merge, I realize that I need to create a new array 599 00:49:23,100 --> 00:49:26,530 because I couldn't do it in place. >>Yes. You cannot. Correct. 600 00:49:26,530 --> 00:49:28,170 [student] So I create a new array. 601 00:49:28,170 --> 00:49:31,510 I forgot at the end of merge to re-change. 602 00:49:31,510 --> 00:49:34,490 Okay. We need a new array. 603 00:49:34,490 --> 00:49:41,000 In merge sort, this is almost always true. 604 00:49:41,000 --> 00:49:44,340 Part of the cost of a better algorithm time-wise 605 00:49:44,340 --> 00:49:47,310 is almost always needing to use a bit more memory. 606 00:49:47,310 --> 00:49:51,570 So here, no matter how you do merge sort, 607 00:49:51,570 --> 00:49:54,780 you would inevitably need to use some extra memory. 608 00:49:54,780 --> 00:49:58,240 He or she created a new array. 609 00:49:58,240 --> 00:50:03,400 And then you say at the end we just need to copy new array into the original array. 610 00:50:03,400 --> 00:50:04,830 [student] I think so, yeah. 611 00:50:04,830 --> 00:50:08,210 I don't know if that works in terms of counting by reference or whatever-- 612 00:50:08,210 --> 00:50:11,650 Yeah, it will work. >>[student] Okay. 613 00:50:20,620 --> 00:50:24,480 Did you try running this? >>[student] No, not yet. >>Okay. 614 00:50:24,480 --> 00:50:28,880 Try running it, and then I'll talk about it for a second. 615 00:50:28,880 --> 00:50:35,200 [student] I need to have all the function prototypes and everything, though, right? 616 00:50:37,640 --> 00:50:40,840 The function prototypes. Oh, you mean like-- Yes. 617 00:50:40,840 --> 00:50:43,040 Sort is calling mergeSortHelp. 618 00:50:43,040 --> 00:50:47,390 >> So in order for sort to call mergeSortHelp, mergeSortHelp must either have been defined 619 00:50:47,390 --> 00:50:56,370 before sort or we just need the prototype. Just copy and paste that. 620 00:50:56,370 --> 00:50:59,490 And similarly, mergeSortHelp is calling merge, 621 00:50:59,490 --> 00:51:03,830 but merge has not been defined yet, so we can just let mergeSortHelp know 622 00:51:03,830 --> 00:51:08,700 that that's what merge is going to look like, and that's that. 623 00:51:09,950 --> 00:51:15,730 So mergeSortHelp. 624 00:51:22,770 --> 00:51:32,660 We have an issue here where we have no base case. 625 00:51:32,660 --> 00:51:38,110 MergeSortHelp is recursive, so any recursive function 626 00:51:38,110 --> 00:51:42,610 is going to need some sort of base case to know when to stop recursively calling itself. 627 00:51:42,610 --> 00:51:45,590 What is our base case going to be here? Yeah. 628 00:51:45,590 --> 00:51:49,110 [student] If the size is 1? >>[Bowden] Yes. 629 00:51:49,110 --> 00:51:56,220 So like we saw right there, we stopped splitting arrays 630 00:51:56,220 --> 00:52:01,850 once we got into arrays of size 1, which inevitably are sorted themselves. 631 00:52:01,850 --> 00:52:09,530 So if size equals 1, we know the array is already sorted, 632 00:52:09,530 --> 00:52:12,970 so we can just return. 633 00:52:12,970 --> 00:52:16,880 >> Notice that's void, so we don't return anything particular, we just return. 634 00:52:16,880 --> 00:52:19,580 Okay. So that's our base case. 635 00:52:19,580 --> 00:52:27,440 I guess our base case could also be if we happen to be merging an array of size 0, 636 00:52:27,440 --> 00:52:30,030 we probably want to stop at some point, 637 00:52:30,030 --> 00:52:33,610 so we can just say size less than 2 or less than or equal to 1 638 00:52:33,610 --> 00:52:37,150 so that this will work for any array now. 639 00:52:37,150 --> 00:52:38,870 Okay. 640 00:52:38,870 --> 00:52:42,740 So that's our base case. 641 00:52:42,740 --> 00:52:45,950 Now do you want to walk us through merge? 642 00:52:45,950 --> 00:52:49,140 What do all these cases mean? 643 00:52:49,140 --> 00:52:54,480 Up here, we're just doing the same idea, the-- 644 00:52:56,970 --> 00:53:02,470 [student] I need to be passing size with all the mergeSortHelp calls. 645 00:53:02,470 --> 00:53:10,080 I added size as an additional primary and it's not there, like size/2. 646 00:53:10,080 --> 00:53:16,210 [Bowden] Oh, size/2, size/2. >>[student] Yeah, and also in the above function as well. 647 00:53:16,210 --> 00:53:21,320 [Bowden] Here? >>[student] Just size. >>[Bowden] Oh. Size, size? >>[student] Yeah. 648 00:53:21,320 --> 00:53:23,010 [Bowden] Okay. 649 00:53:23,010 --> 00:53:26,580 Let me think for a second. 650 00:53:26,580 --> 00:53:28,780 Do we run into an issue? 651 00:53:28,780 --> 00:53:33,690 We're always treating the left as 0. >>[student] No. 652 00:53:33,690 --> 00:53:36,340 That's wrong too. Sorry. It should be start. Yeah. 653 00:53:36,340 --> 00:53:39,230 [Bowden] Okay. I like that better. 654 00:53:39,230 --> 00:53:43,880 And end. Okay. 655 00:53:43,880 --> 00:53:47,200 So now do you want to walk us through merge? >>[student] Okay. 656 00:53:47,200 --> 00:53:52,150 I'm just walking through this new array that I've created. 657 00:53:52,150 --> 00:53:57,420 Its size is the size of the portion of the array that we want to be sorted 658 00:53:57,420 --> 00:54:03,460 and trying to find the element that I should put in the new array step. 659 00:54:03,460 --> 00:54:10,140 So to do that, first I'm checking if the left half of the array continues to have any more elements, 660 00:54:10,140 --> 00:54:14,260 and if it doesn't, then you go down to that else condition, which just says 661 00:54:14,260 --> 00:54:20,180 okay, it must be in the right array, and we'll put that in the current index of newArray. 662 00:54:20,180 --> 00:54:27,620 >> And then otherwise, I'm checking if the right side of the array is also finished, 663 00:54:27,620 --> 00:54:30,630 in which case I just put in the left. 664 00:54:30,630 --> 00:54:34,180 That might not actually be necessary. I'm not sure. 665 00:54:34,180 --> 00:54:40,970 But anyway, the other two check which of the two are smaller in the left or the right. 666 00:54:40,970 --> 00:54:49,770 And also in each case, I'm incrementing whichever placeholder I increment. 667 00:54:49,770 --> 00:54:52,040 [Bowden] Okay. 668 00:54:52,040 --> 00:54:53,840 That looks good. 669 00:54:53,840 --> 00:54:58,800 Does anyone have comments or concerns or questions? 670 00:55:00,660 --> 00:55:07,720 So the four cases that we have to bring things into just being--or it looks like five-- 671 00:55:07,720 --> 00:55:13,100 but we have to consider whether the left array has run out of things we need to merge, 672 00:55:13,100 --> 00:55:16,390 whether the right array has run out of things we need to merge-- 673 00:55:16,390 --> 00:55:18,400 I'm pointing at nothing. 674 00:55:18,400 --> 00:55:21,730 So whether the left array has run out of things or the right array has run out of things. 675 00:55:21,730 --> 00:55:24,320 Those are two cases. 676 00:55:24,320 --> 00:55:30,920 We also need the trivial case of whether the left thing is less than the right thing. 677 00:55:30,920 --> 00:55:33,910 Then we want to choose the left thing. 678 00:55:33,910 --> 00:55:37,630 Those are the cases. 679 00:55:37,630 --> 00:55:40,990 So this was right, so that's that. 680 00:55:40,990 --> 00:55:46,760 Array left. It's 1, 2, 3. Okay. So yeah, those are the four things we might want to do. 681 00:55:50,350 --> 00:55:54,510 And we won't go over an iterative solution. 682 00:55:54,510 --> 00:55:55,980 I would not recommend-- 683 00:55:55,980 --> 00:56:03,070 Merge sort is an example of a function that is both not tail recursive, 684 00:56:03,070 --> 00:56:07,040 it's not easy to make it tail recursive, 685 00:56:07,040 --> 00:56:13,450 but also it's not very easy to make it iterative. 686 00:56:13,450 --> 00:56:16,910 This is very easy. 687 00:56:16,910 --> 00:56:19,170 This implementation of merge sort, 688 00:56:19,170 --> 00:56:22,140 merge, no matter what you do, you're going to build merge. 689 00:56:22,140 --> 00:56:29,170 >> So merge sort built on top of merge recursively is just these three lines. 690 00:56:29,170 --> 00:56:34,700 Iteratively, it is more annoying and more difficult to think about. 691 00:56:34,700 --> 00:56:41,860 But notice that it's not tail recursive since mergeSortHelp--when it calls itself-- 692 00:56:41,860 --> 00:56:46,590 it still needs to do things after this recursive call returns. 693 00:56:46,590 --> 00:56:50,830 So this stack frame must continue to exist even after calling this. 694 00:56:50,830 --> 00:56:54,170 And then when you call this, the stack frame must continue to exist 695 00:56:54,170 --> 00:56:57,780 because even after that call, we still need to merge. 696 00:56:57,780 --> 00:57:01,920 And it is nontrivial to make this tail recursive. 697 00:57:04,070 --> 00:57:06,270 Questions? 698 00:57:08,300 --> 00:57:09,860 All right. 699 00:57:09,860 --> 00:57:13,400 So going back to sort--oh, there's two things I want to show. Okay. 700 00:57:13,400 --> 00:57:17,840 Going back to sort, we'll do this quickly. 701 00:57:17,840 --> 00:57:21,030 Or search. Sort? Sort. Yeah. 702 00:57:21,030 --> 00:57:22,730 Going back to the beginnings of sort. 703 00:57:22,730 --> 00:57:29,870 We want to create an algorithm that sorts the array using any algorithm 704 00:57:29,870 --> 00:57:33,660 in O of n. 705 00:57:33,660 --> 00:57:40,860 So how is this possible? Does anyone have any sort of-- 706 00:57:40,860 --> 00:57:44,300 I hinted before at-- 707 00:57:44,300 --> 00:57:48,300 If we're about to improve from n log n to O of n, 708 00:57:48,300 --> 00:57:51,450 we have improved our algorithm time-wise, 709 00:57:51,450 --> 00:57:55,250 which means what are we going to need to do to make up for that? 710 00:57:55,250 --> 00:57:59,520 [student] Space. >>Yeah. We're going to be using more space. 711 00:57:59,520 --> 00:58:04,490 And not even just more space, it's exponentially more space. 712 00:58:04,490 --> 00:58:14,320 So I think this type of algorithm is pseudo something, pseudo polynom-- 713 00:58:14,320 --> 00:58:18,980 pseudo-- I can't remember. Pseudo something. 714 00:58:18,980 --> 00:58:22,210 But it's because we need to use so much space 715 00:58:22,210 --> 00:58:28,610 that this is achievable but not realistic. 716 00:58:28,610 --> 00:58:31,220 >> And how do we achieve this? 717 00:58:31,220 --> 00:58:36,810 We can achieve this if we guarantee that any particular element in the array 718 00:58:36,810 --> 00:58:39,600 is below a certain size. 719 00:58:42,070 --> 00:58:44,500 So let's just say that size is 200, 720 00:58:44,500 --> 00:58:48,130 any element in an array is below size 200. 721 00:58:48,130 --> 00:58:51,080 And this is actually very realistic. 722 00:58:51,080 --> 00:58:58,660 You can very easily have an array that you know everything in it 723 00:58:58,660 --> 00:59:00,570 is going to be less than some number. 724 00:59:00,570 --> 00:59:07,400 Like if you have some absolutely massive vector or something 725 00:59:07,400 --> 00:59:11,810 but you know everything is going to be between 0 and 5, 726 00:59:11,810 --> 00:59:14,790 then it's going to be significantly faster to do this. 727 00:59:14,790 --> 00:59:17,930 And the bound on any of the elements is 5, 728 00:59:17,930 --> 00:59:21,980 so this bound, that is how much memory you're going to be using. 729 00:59:21,980 --> 00:59:26,300 So the bound is 200. 730 00:59:26,300 --> 00:59:32,960 In theory there is always a bound since an integer can only be up to 4 billion, 731 00:59:32,960 --> 00:59:40,600 but that's unrealistic since then we'd be using space 732 00:59:40,600 --> 00:59:44,400 on the order of 4 billion. So that's unrealistic. 733 00:59:44,400 --> 00:59:47,060 But here we'll say our bound is 200. 734 00:59:47,060 --> 00:59:59,570 The trick to doing it in O of n is we make another array called counts of size BOUND. 735 00:59:59,570 --> 01:00:10,470 So actually, this is a shortcut for--I actually don't know if Clang does this. 736 01:00:11,150 --> 01:00:15,330 But in GCC at the very least--I'm assuming Clang does it too-- 737 01:00:15,330 --> 01:00:18,180 this will just initialize the entire array to be 0s. 738 01:00:18,180 --> 01:00:25,320 So if I don't want to do that, then I could separately do for (int i = 0; 739 01:00:25,320 --> 01:00:31,500 i < BOUND; i++) and counts[i] = 0; 740 01:00:31,500 --> 01:00:35,260 So now everything is initialized to 0. 741 01:00:35,260 --> 01:00:39,570 I iterate over my array, 742 01:00:39,570 --> 01:00:51,920 and what I'm doing is I'm counting the number of each-- Let's go down here. 743 01:00:51,920 --> 01:00:55,480 We have 4, 15, 16, 50, 8, 23, 42, 108. 744 01:00:55,480 --> 01:01:00,010 What I'm counting is the number of occurrences of each of those elements. 745 01:01:00,010 --> 01:01:03,470 Let's actually add a couple more in here with some repeats. 746 01:01:03,470 --> 01:01:11,070 So the value we have here, the value of that is going to be array[i]. 747 01:01:11,070 --> 01:01:14,850 So val could be 4 or 8 or whatever. 748 01:01:14,850 --> 01:01:18,870 And now I'm counting how many of that value I've seen, 749 01:01:18,870 --> 01:01:21,230 so counts[val]++; 750 01:01:21,230 --> 01:01:29,430 After this is done, counts is going to look something like 0001. 751 01:01:29,430 --> 01:01:42,190 Let's do counts[val]-- BOUND + 1. 752 01:01:42,190 --> 01:01:48,230 >> Now that's just to account for the fact that we're starting from 0. 753 01:01:48,230 --> 01:01:50,850 So if 200 is going to be our largest number, 754 01:01:50,850 --> 01:01:54,720 then 0 to 200 is 201 things. 755 01:01:54,720 --> 01:02:01,540 So counts, it'll look like 00001 because we have one 4. 756 01:02:01,540 --> 01:02:10,210 Then we'll have 0001 where we'll have a 1 in the 8th index of count. 757 01:02:10,210 --> 01:02:14,560 We'll have a 2 in the 23rd index of count. 758 01:02:14,560 --> 01:02:17,630 We'll have a 2 in the 42nd index of count. 759 01:02:17,630 --> 01:02:21,670 So we can use count. 760 01:02:34,270 --> 01:02:44,920 So num_of_item = counts[i]. 761 01:02:44,920 --> 01:02:52,540 And so if num_of_item is 2, that means we want to insert 2 of the number i 762 01:02:52,540 --> 01:02:55,290 into our sorted array. 763 01:02:55,290 --> 01:03:02,000 So we need to keep track of how far we are into the array. 764 01:03:02,000 --> 01:03:05,470 So index = 0. 765 01:03:05,470 --> 01:03:09,910 Array-- I'll just write it. 766 01:03:16,660 --> 01:03:18,020 Counts-- 767 01:03:19,990 --> 01:03:28,580 array[index++] = i; 768 01:03:28,580 --> 01:03:32,490 Is that what I want? I think that's what I want. 769 01:03:35,100 --> 01:03:38,290 Yes, this looks good. Okay. 770 01:03:38,290 --> 01:03:43,050 So does everyone understand what the purpose of my counts array is? 771 01:03:43,050 --> 01:03:48,140 It is counting the number of occurrences of each of these numbers. 772 01:03:48,140 --> 01:03:51,780 Then I am iterating over that counts array, 773 01:03:51,780 --> 01:03:57,190 and the ith position in the counts array 774 01:03:57,190 --> 01:04:01,930 is the number of i's that should appear in my sorted array. 775 01:04:01,930 --> 01:04:06,840 That's why counts of 4 is going to be 1 776 01:04:06,840 --> 01:04:11,840 and counts of 8 is going to be 1, counts of 23 is going to be 2. 777 01:04:11,840 --> 01:04:16,900 So that's how many of them I want to insert into my sorted array. 778 01:04:16,900 --> 01:04:19,200 Then I just do that. 779 01:04:19,200 --> 01:04:28,960 I am inserting num_of_item i's into my sorted array. 780 01:04:28,960 --> 01:04:31,670 >> Questions? 781 01:04:32,460 --> 01:04:43,100 And so again, this is linear time since we are just iterating over this once, 782 01:04:43,100 --> 01:04:47,470 but it's also linear in what this number happens to be, 783 01:04:47,470 --> 01:04:50,730 and so it heavily depends on what your bound is. 784 01:04:50,730 --> 01:04:53,290 With a bound of 200, that's not that bad. 785 01:04:53,290 --> 01:04:58,330 If your bound is going to be 10,000, then that's a little worse, 786 01:04:58,330 --> 01:05:01,360 but if your bound is going to be 4 billion, that's completely unrealistic 787 01:05:01,360 --> 01:05:07,720 and this array is going to have to be of size 4 billion, which is unrealistic. 788 01:05:07,720 --> 01:05:10,860 So that's that. Questions? 789 01:05:10,860 --> 01:05:13,270 [inaudible student response] >>Okay. 790 01:05:13,270 --> 01:05:15,710 I realized one other thing when we were going through. 791 01:05:17,980 --> 01:05:23,720 I think the problem was in Lucas's and probably every single one we've seen. 792 01:05:23,720 --> 01:05:26,330 I completely forgot. 793 01:05:26,330 --> 01:05:31,040 The only thing I wanted to comment on is that when you're dealing with things like indices, 794 01:05:31,040 --> 01:05:38,320 you never really see this when you're writing a for loop, 795 01:05:38,320 --> 01:05:41,120 but technically, whenever you're dealing with these indices, 796 01:05:41,120 --> 01:05:45,950 you should pretty much always deal with unsigned integers. 797 01:05:45,950 --> 01:05:53,850 The reason for this is when you're dealing with signed integers, 798 01:05:53,850 --> 01:05:56,090 so if you have 2 signed integers and you add them together 799 01:05:56,090 --> 01:06:00,640 and they end up too big, then you end up with a negative number. 800 01:06:00,640 --> 01:06:03,410 So that's what integer overflow is. 801 01:06:03,410 --> 01:06:10,500 >> If I add 2 billion and 1 billion, I end up with negative 1 billion. 802 01:06:10,500 --> 01:06:15,480 That's how integers work on computers. 803 01:06:15,480 --> 01:06:17,510 So the problem with using-- 804 01:06:17,510 --> 01:06:23,500 That's fine except if low happens to be 2 billion and up happens to be 1 billion, 805 01:06:23,500 --> 01:06:27,120 then this is going to be negative 1 billion and then we're going to divide that by 2 806 01:06:27,120 --> 01:06:29,730 and end up with negative 500 million. 807 01:06:29,730 --> 01:06:33,760 So this is only an issue if you happen to be searching through an array 808 01:06:33,760 --> 01:06:38,070 of billions of things. 809 01:06:38,070 --> 01:06:44,050 But if low + up happens to overflow, then that's a problem. 810 01:06:44,050 --> 01:06:47,750 As soon as we make them unsigned, then 2 billion plus 1 billion is 3 billion. 811 01:06:47,750 --> 01:06:51,960 3 billion divided by 2 is 1.5 billion. 812 01:06:51,960 --> 01:06:55,670 So as soon as they're unsigned, everything is perfect. 813 01:06:55,670 --> 01:06:59,900 And so that's also an issue when you're writing your for loops, 814 01:06:59,900 --> 01:07:03,940 and actually, it probably does it automatically. 815 01:07:09,130 --> 01:07:12,330 It will actually just yell at you. 816 01:07:12,330 --> 01:07:21,610 So if this number is too big to be in just an integer but it would fit in an unsigned integer, 817 01:07:21,610 --> 01:07:24,970 it will yell at you, so that's why you never really run into the issue. 818 01:07:29,150 --> 01:07:34,820 You can see that an index is never going to be negative, 819 01:07:34,820 --> 01:07:39,220 and so when you're iterating over an array, 820 01:07:39,220 --> 01:07:43,970 you can almost always say unsigned int i, but you don't really have to. 821 01:07:43,970 --> 01:07:47,110 Things are going to work pretty much just as well. 822 01:07:48,740 --> 01:07:50,090 Okay. [whispers] What time is it? 823 01:07:50,090 --> 01:07:54,020 The last thing I wanted to show--and I'll just do it really quick. 824 01:07:54,020 --> 01:08:03,190 You know how we have #define so we can #define MAX as 5 or something? 825 01:08:03,190 --> 01:08:05,940 Let's not do MAX. #define BOUND as 200. That's what we did before. 826 01:08:05,940 --> 01:08:10,380 That defines a constant, which is just going to be copied and pasted 827 01:08:10,380 --> 01:08:13,010 wherever we happen to write BOUND. 828 01:08:13,010 --> 01:08:18,189 >> So we can actually do more with #defines. 829 01:08:18,189 --> 01:08:21,170 We can #define functions. 830 01:08:21,170 --> 01:08:23,410 They're not really functions, but we'll call them functions. 831 01:08:23,410 --> 01:08:36,000 An example would be something like MAX(x, y) is defined as (x < y ? y : x). Okay. 832 01:08:36,000 --> 01:08:40,660 So you should get used to ternary operator syntax, 833 01:08:40,660 --> 01:08:49,029 but is x less than y? Return y, else return x. 834 01:08:49,029 --> 01:08:54,390 So you can see you can make this a separate function, 835 01:08:54,390 --> 01:09:01,399 and the function could be like bool MAX takes 2 arguments, return this. 836 01:09:01,399 --> 01:09:08,340 This is one of the most common ones I see done like this. We call them macros. 837 01:09:08,340 --> 01:09:11,790 This is a macro. 838 01:09:11,790 --> 01:09:15,859 This is just the syntax for it. 839 01:09:15,859 --> 01:09:18,740 You can write a macro to do whatever you want. 840 01:09:18,740 --> 01:09:22,649 You frequently see macros for debugging printfs and stuff. 841 01:09:22,649 --> 01:09:29,410 So a type of printf, there are special constants in C like underscore LINE underscore, 842 01:09:29,410 --> 01:09:31,710 2 underscores LINE underscore, 843 01:09:31,710 --> 01:09:37,550 and there's also I think 2 underscores FUNC. That might be it. Something like that. 844 01:09:37,550 --> 01:09:40,880 Those things will be replaced with the name of the function 845 01:09:40,880 --> 01:09:42,930 or the number of the line that you're on. 846 01:09:42,930 --> 01:09:48,630 Frequently, you write debugging printfs that down here I could then just write 847 01:09:48,630 --> 01:09:54,260 DEBUG and it will print the line number and function that I happen to be in 848 01:09:54,260 --> 01:09:57,020 that it encountered that DEBUG statement. 849 01:09:57,020 --> 01:09:59,550 And you can also print other things. 850 01:09:59,550 --> 01:10:05,990 So one thing you should watch out for is if I happen to #define DOUBLE_MAX 851 01:10:05,990 --> 01:10:11,380 as something like 2 * y and 2 * x. 852 01:10:11,380 --> 01:10:14,310 So for whatever reason, you happen to do that a lot. 853 01:10:14,310 --> 01:10:16,650 So make it a macro. 854 01:10:16,650 --> 01:10:18,680 This is actually broken. 855 01:10:18,680 --> 01:10:23,050 I would call it by doing something like DOUBLE_MAX(3, 6). 856 01:10:23,050 --> 01:10:27,530 So what should be returned? 857 01:10:28,840 --> 01:10:30,580 [student] 12. 858 01:10:30,580 --> 01:10:34,800 Yes, 12 should be returned, and 12 is returned. 859 01:10:34,800 --> 01:10:43,350 3 gets replaced for x, 6 gets replaced for y, and we return 2 * 6, which is 12. 860 01:10:43,350 --> 01:10:47,710 Now what about this? What should be returned? 861 01:10:47,710 --> 01:10:50,330 [student] 14. >>Ideally, 14. 862 01:10:50,330 --> 01:10:55,290 The issue is that how hash defines work, remember it's a literal copy and paste 863 01:10:55,290 --> 01:11:00,160 of pretty much everything, so what this is going to be interpreted as 864 01:11:00,160 --> 01:11:11,270 is 3 less than 1 plus 6, 2 times 1 plus 6, 2 times 3. 865 01:11:11,270 --> 01:11:19,780 >> So for this reason you almost always wrap everything in parentheses. 866 01:11:22,180 --> 01:11:25,050 Any variable you almost always wrap in parentheses. 867 01:11:25,050 --> 01:11:29,570 There are cases where you don't have to, like I know that I don't need to do that here 868 01:11:29,570 --> 01:11:32,110 because less than is pretty much always just going to work, 869 01:11:32,110 --> 01:11:34,330 although that might not even be true. 870 01:11:34,330 --> 01:11:41,870 If there is something ridiculous like DOUBLE_MAX(1 == 2), 871 01:11:41,870 --> 01:11:49,760 then that's going to get replaced with 3 less than 1 equals equals 2, 872 01:11:49,760 --> 01:11:53,460 and so then it's going to do 3 less than 1, does that equal 2, 873 01:11:53,460 --> 01:11:55,620 which is not what we want. 874 01:11:55,620 --> 01:12:00,730 So in order to prevent any operator precedence problems, 875 01:12:00,730 --> 01:12:02,870 always wrap in parentheses. 876 01:12:03,290 --> 01:12:07,700 Okay. And that's it, 5:30. 877 01:12:08,140 --> 01:12:12,470 If you have any questions on the pset, let us know. 878 01:12:12,470 --> 01:12:18,010 It should be fun, and the hacker edition also is much more realistic 879 01:12:18,010 --> 01:12:22,980 than the hacker edition of last year's, so we hope that a lot of you try it. 880 01:12:22,980 --> 01:12:26,460 Last year's was very overwhelming. 881 01:12:28,370 --> 01:12:30,000 >> [CS50.TV]